Archive for February, 2012

Notes on Barter, Privacy, Data, & the Meaning of “Free”

It’s been an interesting few weeks:

  • Facebook’s upcoming $100-billion IPO has users wondering why owners get all the money while users provide all the assets.
  • Google’s revision of privacy policies has users thinking that something important has changed even though they don’t know what.
  • Google has used a loophole in Apple’s browser to gather data about iPhone users.
  • Apple has allowed app developers to download users’ address books.
  • And over in one of EDUCAUSE’s online discussion groups, the offer of a free book has somehow led security officers to do linguistic analysis of the word “free” as part of a privacy argument.

Lurking under all, I think, are the unheralded and misunderstood resurgence of a sometimes triangular barter economy, confusion about different revenue models, and, yes, disagreement what the word “free” means.

Let’s approach the issue obliquely, starting, in the best academic tradition, with a small-scale research problem. Here’s the hypothetical question, which I might well have asked back when I was a scholar of student choice: Is there a relationship between selectivity and degree completion at 4-year colleges and universities?

As a faculty member in the late 1970s, I’d have gone to the library and used reference tools to locate articles or reports on the subject. If I were unaffiliated and living in Chicago (which I wasn’t back then), I might have gone to the Chicago Public Library, found in its catalog a 2004 report by Laura Horn, and have had that publication pulled from closed-stack storage so I could read it.

By starting with that baseline, of course, I’m merely reminiscing. These days I can obtain the data myself, and do some quick analysis. I know the relevant data are in the Integrated Postsecondary Education Data System (IPEDS). And those IPEDS data are available online, so I can

(a) download data on 2010 selectivity, undergraduate enrollment, and bachelor’s degrees awarded for the 2,971 US institutions that grant four-year degree and import those data into Excel,

(b) eliminate the 101 system offices and such missing relevant data, the 1,194 that granted fewer than 100 degrees, the 15 institutions reporting suspiciously high degree/enrollment rates, the one that reported no degrees awarded (Miami-Dade College, in case you’re interested), and the 220 that reported no admit rate, and then

(c) for the remaining 1,440 colleges and universities, create a graph of degree completion (somewhat normalized) as a function of selectivity (ditto).

The graph doesn’t tell me much–scatter plots rarely do for large datasets–but a quick regression analysis tells me there’s a modestly positive relationship: 1% higher selectivity (according to my constructed index) translates on average into 1.4% greater completion (ditto). The download, data cleaning, graphing, and analysis take me about 45 minutes all told.

Or I might just use a search engine. When I do that, using “degree completion by selectivity” as the search term, a highly-ranked Google result takes me to an excerpt from a College Board report.

Curiously, that report tells me that “…selectivity is highly correlated with graduation rates,” which is a rather different conclusion than IPEDS gave me. The footnotes help explain this: the College Board includes two-year institutions in its analysis, considers only full-time, first-time students, excludes returning students and transfers, and otherwise chooses its data in ways I didn’t.

The difference between my graph and the College Board’s conclusion is excellent fodder for a discussion of how to evaluate what one finds online — in the quote often (but perhaps mistakenly) attributed to Daniel Patrick Moynihan, “Everyone is entitled to his own opinion, but not his own facts.” Which gets me thinking about one of the high points in my graduate studies, a Harvard methodology seminar wherein Mike Smith, who was eventually to become US Undersecretary of Education, taught Moynihan what regression analysis is, which in turn reminds me of the closet full of Scotch at the Joint Center for Urban Studies kept full because Moynihan required that no meeting at the Joint go past 4pm without a bottle of Scotch on the table. But I digress.

Since I was logged in with my Google account when I did the search, some of the results might even have been tailored to what Google had learned about me from previous searches. At the very least, the information was tailored to previous searches from the computer I used here in my DC office.

Which brings me to the linguistic dispute among security officers.

A recent EDUCAUSE webinar presenter, during Data Privacy Month, was Matt Ivester, creator of JuicyCampus and author of lol…OMG!: What Every Student Needs to Know About Online Reputation Management, Digital Citizenship and Cyberbullying.

“In honor of Data Privacy Day,” the book’s website announced around the same time, “the full ebook of lol…OMG! (regularly $9.99) is being made available for FREE!” Since Ivester was going to be a guest presenter for EDUCAUSE, we encouraged webinar participants to avail themselves of this offer and to download the book.

One place we did that was in a discussion group we host for IT security professionals. A participant in that discussion group immediately took Ivester to task:

…you can’t download the free book without logging in to Amazon. And, near as I can tell, it’s Kindle- or Kindle-apps-only. In honor of Data Privacy Day. The irony, it drips.

“Pardon the rant,” another participant responded, “but what is the irony here?” Another elaborated:

I intend to download the book but, despite the fact that I can understand why free distribution is being done this way, I still find it ironic that I must disclose information in order to get something that’s being made available at no charge in honor of DPD.

The discussion grew lively, and eventually devolved into a discussion of the word “free”. If one must disclose personal information in order to download a book at no monetary cost, is the book “free”?

If words like “free”, “cost”, and “price” refer only to money, the answer is Yes. But money came into existence only to simplify barter economies. In a sense, today’s Internet economy involves a new form of barter that replaces money: If we disclose information about ourselves, then we receive something in return; conversely, vendors offer “free” products in order to obtain information about us.

In a recent post, Ed Bott presented graphs illustrating the different business models behind Microsoft, Apple, and Google. According to Bott, Microsoft is selling software, Apple is selling hardware, and Google is selling advertising.

More to the point here, Microsoft and Apple still focus on traditional binary transactions, confined to themselves and buyers of their products.

Google is different. Google’s triangle trade (which Facebook also follows) offers “free” services to individuals, collects information about those individuals in return, and then uses that information to tailor advertising that it then sells to vendors in return for money. In the triangle, the user of search results pays no money to Google, so in that limited sense it’s “free”. Thus the objection in the Security discussion group: if one directly exchanges something of value for the “free” information, then it’s not free.

Except for my own time, all three answers to my “How does selectivity relate to degree completion?” question were “free”, in the sense I paid no money explicitly for them. All of them cost someone something. But not all no-cost-to-the-user online data is funded through Google-like triangles.

In the case of the Chicago Public Library, my Chicago property taxes plus probably some federal and Illinois grants enabled the library to acquire, catalog, store, and retrieve the Horn report. They also built the spectacular Harold Washington Library where I’d go read it.

In the case of IPEDS, my federal tax dollars paid the bill.

In both cases, however, what I paid was unrelated to how much I used the resources, and involved almost no disclosure of my identity or other attributes.

In contrast, the “free” search Google provided involved my giving something of value to Google, namely something about my searches. The same was true for the Ivester fans who downloaded his “free” book from Amazon.

Not that there’s anything wrong with that, as Jerry Seinfeld might say: by allowing Google and Amazon to tailor what they show me based on what they know about me, I get search results or purchase suggestions that are more likely to interest me. That is, not only does Google get value from my disclosure; I also get value from what Google does with that information.

The problem–this is what takes us back to security–is twofold.

  • First, an awful lot of users don’t understand how the disclosure-for-focus exchange works, in large part because the other party to the exchange isn’t terribly forthright about it. Sure, I can learn why Google is displaying those particular ads (that’s the “Why these ads?” link in tiny print atop the right column in search results), and if I do that I discover that I can tailor what information Google uses. But unless I make that effort the exchange happens automatically, and each search gets added to what Google will use to customize my future ads.
  • Second, and much more problematic, the entities that collect information about us increasingly share what they know. This varies depending whether they’ve learned about us directly through things like credit applications or indirectly through what we search for on the Web, what we purchase from vendors like Amazon, or what we share using social media like Facebook or Twitter. Some companies take pains to assure us they don’t share what they know, but in many cases initial assurances get softened over time (or, as appears to have happened with Apple, are violated through technical or process failures). This is routinely true for Facebook, and many seem to believe it’s what’s behind the recent changes in Google’s privacy policy.

Indeed, companies like Acxiom are in the business of aggregating data about individuals and making them available. Data so collected can help banks combat identity theft by enabling them to test whether credit applicants are who they claim to be. If they fall into the wrong hands, however, the same data can enable subtle forms of redlining or even promote identity theft.

Vendors collecting data about us becomes a privacy issue whose substance depends on whether

  • we know what’s going on,
  • data are kept and/or shared, and
  • we can opt out.

Once we agree to disclose in return for “free” goods, however, the exchange becomes a security issue, because the same data can enable impersonation. It becomes a policy issue because the same data can enable inappropriate or illegal activity.

The solution to all this isn’t turning back the clock — the new barter economy is here to stay. What we need are transparency, options, and broad-based educational campaigns to help people understand the deal and choose according to their preferences.

As either Stan Delaplane or Calvin Trillin once observed about “market price” listings on restaurant menus (or didn’t — I’m damned if I can find anything authoritative, or for that matter any mention whatsoever of this, but  know I read it), “When you learn for the first time that the lobster you just ate cost $50, the only reasonable response is to offer half”.

Unfortunately, in today’s barter economy we pay the price before we get the lobster…