Posts Tagged ‘search’

Notes on Barter, Privacy, Data, & the Meaning of “Free”

It’s been an interesting few weeks:

  • Facebook’s upcoming $100-billion IPO has users wondering why owners get all the money while users provide all the assets.
  • Google’s revision of privacy policies has users thinking that something important has changed even though they don’t know what.
  • Google has used a loophole in Apple’s browser to gather data about iPhone users.
  • Apple has allowed app developers to download users’ address books.
  • And over in one of EDUCAUSE’s online discussion groups, the offer of a free book has somehow led security officers to do linguistic analysis of the word “free” as part of a privacy argument.

Lurking under all, I think, are the unheralded and misunderstood resurgence of a sometimes triangular barter economy, confusion about different revenue models, and, yes, disagreement what the word “free” means.

Let’s approach the issue obliquely, starting, in the best academic tradition, with a small-scale research problem. Here’s the hypothetical question, which I might well have asked back when I was a scholar of student choice: Is there a relationship between selectivity and degree completion at 4-year colleges and universities?

As a faculty member in the late 1970s, I’d have gone to the library and used reference tools to locate articles or reports on the subject. If I were unaffiliated and living in Chicago (which I wasn’t back then), I might have gone to the Chicago Public Library, found in its catalog a 2004 report by Laura Horn, and have had that publication pulled from closed-stack storage so I could read it.

By starting with that baseline, of course, I’m merely reminiscing. These days I can obtain the data myself, and do some quick analysis. I know the relevant data are in the Integrated Postsecondary Education Data System (IPEDS). And those IPEDS data are available online, so I can

(a) download data on 2010 selectivity, undergraduate enrollment, and bachelor’s degrees awarded for the 2,971 US institutions that grant four-year degree and import those data into Excel,

(b) eliminate the 101 system offices and such missing relevant data, the 1,194 that granted fewer than 100 degrees, the 15 institutions reporting suspiciously high degree/enrollment rates, the one that reported no degrees awarded (Miami-Dade College, in case you’re interested), and the 220 that reported no admit rate, and then

(c) for the remaining 1,440 colleges and universities, create a graph of degree completion (somewhat normalized) as a function of selectivity (ditto).

The graph doesn’t tell me much–scatter plots rarely do for large datasets–but a quick regression analysis tells me there’s a modestly positive relationship: 1% higher selectivity (according to my constructed index) translates on average into 1.4% greater completion (ditto). The download, data cleaning, graphing, and analysis take me about 45 minutes all told.

Or I might just use a search engine. When I do that, using “degree completion by selectivity” as the search term, a highly-ranked Google result takes me to an excerpt from a College Board report.

Curiously, that report tells me that “…selectivity is highly correlated with graduation rates,” which is a rather different conclusion than IPEDS gave me. The footnotes help explain this: the College Board includes two-year institutions in its analysis, considers only full-time, first-time students, excludes returning students and transfers, and otherwise chooses its data in ways I didn’t.

The difference between my graph and the College Board’s conclusion is excellent fodder for a discussion of how to evaluate what one finds online — in the quote often (but perhaps mistakenly) attributed to Daniel Patrick Moynihan, “Everyone is entitled to his own opinion, but not his own facts.” Which gets me thinking about one of the high points in my graduate studies, a Harvard methodology seminar wherein Mike Smith, who was eventually to become US Undersecretary of Education, taught Moynihan what regression analysis is, which in turn reminds me of the closet full of Scotch at the Joint Center for Urban Studies kept full because Moynihan required that no meeting at the Joint go past 4pm without a bottle of Scotch on the table. But I digress.

Since I was logged in with my Google account when I did the search, some of the results might even have been tailored to what Google had learned about me from previous searches. At the very least, the information was tailored to previous searches from the computer I used here in my DC office.

Which brings me to the linguistic dispute among security officers.

A recent EDUCAUSE webinar presenter, during Data Privacy Month, was Matt Ivester, creator of JuicyCampus and author of lol…OMG!: What Every Student Needs to Know About Online Reputation Management, Digital Citizenship and Cyberbullying.

“In honor of Data Privacy Day,” the book’s website announced around the same time, “the full ebook of lol…OMG! (regularly $9.99) is being made available for FREE!” Since Ivester was going to be a guest presenter for EDUCAUSE, we encouraged webinar participants to avail themselves of this offer and to download the book.

One place we did that was in a discussion group we host for IT security professionals. A participant in that discussion group immediately took Ivester to task:

…you can’t download the free book without logging in to Amazon. And, near as I can tell, it’s Kindle- or Kindle-apps-only. In honor of Data Privacy Day. The irony, it drips.

“Pardon the rant,” another participant responded, “but what is the irony here?” Another elaborated:

I intend to download the book but, despite the fact that I can understand why free distribution is being done this way, I still find it ironic that I must disclose information in order to get something that’s being made available at no charge in honor of DPD.

The discussion grew lively, and eventually devolved into a discussion of the word “free”. If one must disclose personal information in order to download a book at no monetary cost, is the book “free”?

If words like “free”, “cost”, and “price” refer only to money, the answer is Yes. But money came into existence only to simplify barter economies. In a sense, today’s Internet economy involves a new form of barter that replaces money: If we disclose information about ourselves, then we receive something in return; conversely, vendors offer “free” products in order to obtain information about us.

In a recent post, Ed Bott presented graphs illustrating the different business models behind Microsoft, Apple, and Google. According to Bott, Microsoft is selling software, Apple is selling hardware, and Google is selling advertising.

More to the point here, Microsoft and Apple still focus on traditional binary transactions, confined to themselves and buyers of their products.

Google is different. Google’s triangle trade (which Facebook also follows) offers “free” services to individuals, collects information about those individuals in return, and then uses that information to tailor advertising that it then sells to vendors in return for money. In the triangle, the user of search results pays no money to Google, so in that limited sense it’s “free”. Thus the objection in the Security discussion group: if one directly exchanges something of value for the “free” information, then it’s not free.

Except for my own time, all three answers to my “How does selectivity relate to degree completion?” question were “free”, in the sense I paid no money explicitly for them. All of them cost someone something. But not all no-cost-to-the-user online data is funded through Google-like triangles.

In the case of the Chicago Public Library, my Chicago property taxes plus probably some federal and Illinois grants enabled the library to acquire, catalog, store, and retrieve the Horn report. They also built the spectacular Harold Washington Library where I’d go read it.

In the case of IPEDS, my federal tax dollars paid the bill.

In both cases, however, what I paid was unrelated to how much I used the resources, and involved almost no disclosure of my identity or other attributes.

In contrast, the “free” search Google provided involved my giving something of value to Google, namely something about my searches. The same was true for the Ivester fans who downloaded his “free” book from Amazon.

Not that there’s anything wrong with that, as Jerry Seinfeld might say: by allowing Google and Amazon to tailor what they show me based on what they know about me, I get search results or purchase suggestions that are more likely to interest me. That is, not only does Google get value from my disclosure; I also get value from what Google does with that information.

The problem–this is what takes us back to security–is twofold.

  • First, an awful lot of users don’t understand how the disclosure-for-focus exchange works, in large part because the other party to the exchange isn’t terribly forthright about it. Sure, I can learn why Google is displaying those particular ads (that’s the “Why these ads?” link in tiny print atop the right column in search results), and if I do that I discover that I can tailor what information Google uses. But unless I make that effort the exchange happens automatically, and each search gets added to what Google will use to customize my future ads.
  • Second, and much more problematic, the entities that collect information about us increasingly share what they know. This varies depending whether they’ve learned about us directly through things like credit applications or indirectly through what we search for on the Web, what we purchase from vendors like Amazon, or what we share using social media like Facebook or Twitter. Some companies take pains to assure us they don’t share what they know, but in many cases initial assurances get softened over time (or, as appears to have happened with Apple, are violated through technical or process failures). This is routinely true for Facebook, and many seem to believe it’s what’s behind the recent changes in Google’s privacy policy.

Indeed, companies like Acxiom are in the business of aggregating data about individuals and making them available. Data so collected can help banks combat identity theft by enabling them to test whether credit applicants are who they claim to be. If they fall into the wrong hands, however, the same data can enable subtle forms of redlining or even promote identity theft.

Vendors collecting data about us becomes a privacy issue whose substance depends on whether

  • we know what’s going on,
  • data are kept and/or shared, and
  • we can opt out.

Once we agree to disclose in return for “free” goods, however, the exchange becomes a security issue, because the same data can enable impersonation. It becomes a policy issue because the same data can enable inappropriate or illegal activity.

The solution to all this isn’t turning back the clock — the new barter economy is here to stay. What we need are transparency, options, and broad-based educational campaigns to help people understand the deal and choose according to their preferences.

As either Stan Delaplane or Calvin Trillin once observed about “market price” listings on restaurant menus (or didn’t — I’m damned if I can find anything authoritative, or for that matter any mention whatsoever of this, but  know I read it), “When you learn for the first time that the lobster you just ate cost $50, the only reasonable response is to offer half”.

Unfortunately, in today’s barter economy we pay the price before we get the lobster…

GoTo, Gas Pedals, & Google: What Students Should Know, and Why That’s Not What We Teach Them

In the 1980s I began teaching a course in BASIC programming in the Harvard University Extension, part of an evening Certificate of Advanced Study program for working students trying to get ahead. Much to my surprise, students immediately filled the small assigned lecture hall to overflowing, and nearly overwhelmed my lone teaching assistant.

Within two years, the course had grown to 250+ students. They spread throughout the second-largest room in the Harvard Science Center (Lecture Hall C– the one with the orange seats, for those of you who have been there). I now had a dozen TAs, so I was in effect not only teaching the BASIC course, but also leading a seminar on the pedagogical challenge of teaching non-technical students how to write structured programs in a language that heretically allowed “GoTo” statements.

Computer Literacy?

There’s nothing very interesting or exciting about learning to program in BASIC. Although I flatter myself a good teacher, even my best efforts to render the material engaging – for example, assignments that variously involved having students act out various roles in Stuart Madnick’s deceptively simple Little Man Computer system, automating Shirley Ellis‘s song The Name Game, and modeling a defined-benefit pension system – in no way explained the course’s popularity.

So what was going on? I asked students why they were taking my course. Most often, they said something about “computer literacy”. That’s a useful (if linguistically confused) term, but in this case a misleading one.

If the computer becomes important, the analogy seems to run, then the ability to use a computer becomes important, much as the spread of printed material made reading and writing important. So far so good. For the typical 1980s employee, however, using computers in business and education centered on applications like word processors, spreadsheets, charts, and maybe statistical packages. Except for those within the computer industry, it rarely involved writing code in high-level languages.

BASIC programming thus had little direct relevance to the “computer literacy” students actually needed. The print era made reading and writing important  for the average worker and citizen. But only printers needed adeptness with the technologies of paper, ink, composition (in the Linotype sense), and presses. That’s why the analogy fails: programming, by the 1980s, was about making computer applications, not using them. That’s the opposite of what students actually needed.

Yet clearly students viewed the ability to program in BASIC – even “Shirley Shirley bo-birley…” – as somehow relevant to the evolving challenges of their jobs. If BASIC programming wasn’t directly relevant to actual computer literacy, why did they believe this? Two explanations of its indirect importance suggest themselves:

  • Perhaps ability to program was an accessible indicator of more relevant yet harder-to-measure competence. Employers might have been using programming ability, however irrelevant directly, as a shortcut measure to evaluate and sort job applicants or promotion candidates. (This is essentially a recasting of Lester Thurow‘s “job queues” theory about the relationship between educational attainment and hiring, namely that educational attainment signals the ability to learn quickly rather than provides direct training.) Applicants or employees who believed this was happening would thus perceive programming ability as a way to make themselves appear attractive, even though the skill was actually irrelevant.
  • Perhaps students learned to program simply to gain confidence that they could cope with the computer age.

I propose a third explanation:

  • As technology evolves, generations that experience the evolution tend to believe it important for the next generation to understand what came before, and advise students accordingly.

That is, we who experience technological change believe that competence with current technology benefits from understanding prior technology – a technological variant of George Santayana’s aphorism “Those who cannot remember the past are condemned to repeat it” – and send myriad direct and indirect messages to our successors and students that without historical understanding one cannot be fully competent.

Shifting Gears

My father taught me to drive on the family’s 1955 Chevy station wagon, a six-cylinder car with a three-speed, non-synchromesh, stalk-mounted-shifter manual transmission and power nothing. After a few rough sessions learning to get the car moving without bucking and stalling, to turn and shift at the same time, and to double-clutch and downshift while going downhill, I became a pretty good driver.

But my father, who had learned to drive on a Model T Ford with a planetary transmission and separate throttle and spark-advance controls, remained skeptical of my ability. He was always convinced that since I didn’t understand that latter distinction, I really wasn’t operating the car as well as I might. (Today’s “accelerator”, if I understand it correctly, combines the two functions: it tells the engine to spin faster, which is what the spark-advance lever did, and then feeds it the necessary fuel mixture, which was the throttle’s function.)

Years later it came time for our son’s first driving lesson. We were in our automatic-transmission Toyota Camry, equipped with power steering and brakes, on a not-yet-opened Cape Cod subdivision’s newly paved streets. Apparently forgetting how irrelevant the throttle/spark distinction had been to my learning to drive, I delivered a lecture on what was going on in the automatic transmission – why it didn’t need a clutch, how it was deciding when to shift gears, and so forth. Our son listened patiently, and then rapidly learned to drive the Camry very well without any regard to what I’d explained. My lecture had absolutely no effect on his competence (at least not until several years later, I like to believe, when he taught himself to drive a friend’s four-in-the-floor VW).

Technological Instruction

Which brings me to the present, and the challenge of preparing today’s students for tomorrow’s technological workplaces. What should be our advice to them be, either explicitly – in the form of direct instruction or requirements – or implicitly, in the form of contextual guidance such as induced so many students to take my BASIC course? In particular, how can we break away from the generational tendency to emphasize how we got here rather than where we’re going?

I don’t propose to answer that question fully here, but rather to sketch, though two examples, how a future-oriented perspective might differ from a generational one. The first example is cloud services, and the second example is online information.

Cloud Services

I started writing this essay on my DC office computer. I’m typing these words on an old laptop I keep in my DC apartment, and I’ll probably finish it on my traveling computer or perhaps on my Chicago home computer. A big problem ensues: How do I keep these various copies synchronized? My answer is a service called Dropbox, which copies documents I save to its central servers and then disseminates them automatically to all my other computers and even my phone. What I need is to have the same document up to date wherever I’m working. Dropbox achieves this by synchronizing multiple copies of the same documents across multiple computers and other devices.

Alternatively, I might gotten what I need– having the same document up to date wherever I’m working– by drafting this post as a Google or Live document. Rather than editing different synchronized copies of the document, I’d actually have been editing the same remote document from different computers rather than synchronizing local copies among those computers.

My instincts are that this difference between synchronized and remote documents is important, something that I, as an educator, should be sure the next generation understands. When my son asks about how to work across different machines, my inclination is to explain the difference between the options, how one is giving way to the other, and so forth. Is that valid, or is this the same generational fallacy that led my father to explain throttles and spark advance or me to explain clutches and shifting?

Online Information

When I came to the history quote above, I couldn’t remember its precise wording or who wrote it. That’s what the Internet is for, right? Finding information?

I typed “those who ignore the past are doomed”, which was the partial phrase I remembered, into Google’s search box. Among the first page of hits, the first time I tried this, were links to answers.com, wikiquote.org, answers.google.com, wikipedia.org, and www.nku.edu. The first four of those pointed me to the correct quote, usually giving the specific source including edition and page. The last, from a departmental web page at Northern Kentucky University, blithely repeated the incorrect quote (but at least ascribed it to Santayana). One of the sources (answers.com) pointed to an earlier, similar quote from Edmund Burke. The Wikipedia entry reminded me that the quote is often incorrectly ascribed to Plato.

I then typed the same search into Bing’s search box. Many links on its first page of results were the same as Google’s — answers.com and wikiquotes — but there were more links to political comments (most of them embodying incorrect variations on the quote), and one link to a conspiracy-theorist page linking the Santayana quote to George Orwell’s “He who controls the present, controls the past. He who controls the past, controls the future”.

It wasn’t hard for me to figure out which search results to heed and which to ignore. The ability to screen search results and then either to choose which to trust or to refine the search is central to success in today’s networked world. What’s the best way to inculcate that skill in those who will need it?

I’ve been working in IT since before the Digital Equipment Corporation‘s Altavista, in its original incarnation, became the first Web search engine. The methods different search services use to locate and rank information have always been especially interesting. The early Altavista ranked pages based on how many times search words appeared in them – a method so obviously manipulable (especially by sneaking keywords into non-displayed parts of Web pages) that it rapidly gave way to more robust approaches. The links one gets from Google or Bing today come partly from some very sophisticated ranking said to be based partly on user behavior (such as whether a search seems to have succeeded) and partly on links among sites (this was Google’s original innovation, called PageRank) – but also, quite openly and separately, from advertisers paying to have their sites displayed when users search for particular terms.

Here again the generational issue arises. Obviously we want to teach future generations how to search effectively, and how to evaluate the quality and reliability of the information their searches yield. But do we do this by explaining the evolution of search and ranking algorithms – the generational approach based on the preceding paragraph – or by teaching more generally, as bibliographic instructors in libraries have long done, how to cross-reference, assess, and evaluate information whatever its form?

Understanding throttles and spark advance did not help me become a better driver, understanding BASIC probably didn’t help prepare my Harvard students for their future workplaces, and explaining diverse cloud mechanisms and search algorithms isn’t the best way for us to maximize our students’ technological competence. Much as I love explaining things, I think the essence of successful technological teaching is to focus on the future, on the application and consequences of technology rather than its origins.

That doesn’t mean that we should eschew the importance of history, but rather than history does not suffice as a basis for technological instruction. It’s easier to explain the past than to anticipate the future, but that last, however risky and uncertain and detached from our personal histories, is our job.