Archive for February, 2012

Notes on Barter, Privacy, Data, & the Meaning of “Free”

It’s been an interesting few weeks:

  • Facebook’s upcoming $100-billion IPO has users wondering why owners get all the money while users provide all the assets.
  • Google’s revision of privacy policies has users thinking that something important has changed even though they don’t know what.
  • Google has used a loophole in Apple’s browser to gather data about iPhone users.
  • Apple has allowed app developers to download users’ address books.
  • And over in one of EDUCAUSE’s online discussion groups, the offer of a free book has somehow led security officers to do linguistic analysis of the word “free” as part of a privacy argument.

Lurking under all, I think, are the unheralded and misunderstood resurgence of a sometimes triangular barter economy, confusion about different revenue models, and, yes, disagreement what the word “free” means.

Let’s approach the issue obliquely, starting, in the best academic tradition, with a small-scale research problem. Here’s the hypothetical question, which I might well have asked back when I was a scholar of student choice: Is there a relationship between selectivity and degree completion at 4-year colleges and universities?

As a faculty member in the late 1970s, I’d have gone to the library and used reference tools to locate articles or reports on the subject. If I were unaffiliated and living in Chicago (which I wasn’t back then), I might have gone to the Chicago Public Library, found in its catalog a 2004 report by Laura Horn, and have had that publication pulled from closed-stack storage so I could read it.

By starting with that baseline, of course, I’m merely reminiscing. These days I can obtain the data myself, and do some quick analysis. I know the relevant data are in the Integrated Postsecondary Education Data System (IPEDS). And those IPEDS data are available online, so I can

(a) download data on 2010 selectivity, undergraduate enrollment, and bachelor’s degrees awarded for the 2,971 US institutions that grant four-year degree and import those data into Excel,

(b) eliminate the 101 system offices and such missing relevant data, the 1,194 that granted fewer than 100 degrees, the 15 institutions reporting suspiciously high degree/enrollment rates, the one that reported no degrees awarded (Miami-Dade College, in case you’re interested), and the 220 that reported no admit rate, and then

(c) for the remaining 1,440 colleges and universities, create a graph of degree completion (somewhat normalized) as a function of selectivity (ditto).

The graph doesn’t tell me much–scatter plots rarely do for large datasets–but a quick regression analysis tells me there’s a modestly positive relationship: 1% higher selectivity (according to my constructed index) translates on average into 1.4% greater completion (ditto). The download, data cleaning, graphing, and analysis take me about 45 minutes all told.

Or I might just use a search engine. When I do that, using “degree completion by selectivity” as the search term, a highly-ranked Google result takes me to an excerpt from a College Board report.

Curiously, that report tells me that “…selectivity is highly correlated with graduation rates,” which is a rather different conclusion than IPEDS gave me. The footnotes help explain this: the College Board includes two-year institutions in its analysis, considers only full-time, first-time students, excludes returning students and transfers, and otherwise chooses its data in ways I didn’t.

The difference between my graph and the College Board’s conclusion is excellent fodder for a discussion of how to evaluate what one finds online — in the quote often (but perhaps mistakenly) attributed to Daniel Patrick Moynihan, “Everyone is entitled to his own opinion, but not his own facts.” Which gets me thinking about one of the high points in my graduate studies, a Harvard methodology seminar wherein Mike Smith, who was eventually to become US Undersecretary of Education, taught Moynihan what regression analysis is, which in turn reminds me of the closet full of Scotch at the Joint Center for Urban Studies kept full because Moynihan required that no meeting at the Joint go past 4pm without a bottle of Scotch on the table. But I digress.

Since I was logged in with my Google account when I did the search, some of the results might even have been tailored to what Google had learned about me from previous searches. At the very least, the information was tailored to previous searches from the computer I used here in my DC office.

Which brings me to the linguistic dispute among security officers.

A recent EDUCAUSE webinar presenter, during Data Privacy Month, was Matt Ivester, creator of JuicyCampus and author of lol…OMG!: What Every Student Needs to Know About Online Reputation Management, Digital Citizenship and Cyberbullying.

“In honor of Data Privacy Day,” the book’s website announced around the same time, “the full ebook of lol…OMG! (regularly $9.99) is being made available for FREE!” Since Ivester was going to be a guest presenter for EDUCAUSE, we encouraged webinar participants to avail themselves of this offer and to download the book.

One place we did that was in a discussion group we host for IT security professionals. A participant in that discussion group immediately took Ivester to task:

…you can’t download the free book without logging in to Amazon. And, near as I can tell, it’s Kindle- or Kindle-apps-only. In honor of Data Privacy Day. The irony, it drips.

“Pardon the rant,” another participant responded, “but what is the irony here?” Another elaborated:

I intend to download the book but, despite the fact that I can understand why free distribution is being done this way, I still find it ironic that I must disclose information in order to get something that’s being made available at no charge in honor of DPD.

The discussion grew lively, and eventually devolved into a discussion of the word “free”. If one must disclose personal information in order to download a book at no monetary cost, is the book “free”?

If words like “free”, “cost”, and “price” refer only to money, the answer is Yes. But money came into existence only to simplify barter economies. In a sense, today’s Internet economy involves a new form of barter that replaces money: If we disclose information about ourselves, then we receive something in return; conversely, vendors offer “free” products in order to obtain information about us.

In a recent post, Ed Bott presented graphs illustrating the different business models behind Microsoft, Apple, and Google. According to Bott, Microsoft is selling software, Apple is selling hardware, and Google is selling advertising.

More to the point here, Microsoft and Apple still focus on traditional binary transactions, confined to themselves and buyers of their products.

Google is different. Google’s triangle trade (which Facebook also follows) offers “free” services to individuals, collects information about those individuals in return, and then uses that information to tailor advertising that it then sells to vendors in return for money. In the triangle, the user of search results pays no money to Google, so in that limited sense it’s “free”. Thus the objection in the Security discussion group: if one directly exchanges something of value for the “free” information, then it’s not free.

Except for my own time, all three answers to my “How does selectivity relate to degree completion?” question were “free”, in the sense I paid no money explicitly for them. All of them cost someone something. But not all no-cost-to-the-user online data is funded through Google-like triangles.

In the case of the Chicago Public Library, my Chicago property taxes plus probably some federal and Illinois grants enabled the library to acquire, catalog, store, and retrieve the Horn report. They also built the spectacular Harold Washington Library where I’d go read it.

In the case of IPEDS, my federal tax dollars paid the bill.

In both cases, however, what I paid was unrelated to how much I used the resources, and involved almost no disclosure of my identity or other attributes.

In contrast, the “free” search Google provided involved my giving something of value to Google, namely something about my searches. The same was true for the Ivester fans who downloaded his “free” book from Amazon.

Not that there’s anything wrong with that, as Jerry Seinfeld might say: by allowing Google and Amazon to tailor what they show me based on what they know about me, I get search results or purchase suggestions that are more likely to interest me. That is, not only does Google get value from my disclosure; I also get value from what Google does with that information.

The problem–this is what takes us back to security–is twofold.

  • First, an awful lot of users don’t understand how the disclosure-for-focus exchange works, in large part because the other party to the exchange isn’t terribly forthright about it. Sure, I can learn why Google is displaying those particular ads (that’s the “Why these ads?” link in tiny print atop the right column in search results), and if I do that I discover that I can tailor what information Google uses. But unless I make that effort the exchange happens automatically, and each search gets added to what Google will use to customize my future ads.
  • Second, and much more problematic, the entities that collect information about us increasingly share what they know. This varies depending whether they’ve learned about us directly through things like credit applications or indirectly through what we search for on the Web, what we purchase from vendors like Amazon, or what we share using social media like Facebook or Twitter. Some companies take pains to assure us they don’t share what they know, but in many cases initial assurances get softened over time (or, as appears to have happened with Apple, are violated through technical or process failures). This is routinely true for Facebook, and many seem to believe it’s what’s behind the recent changes in Google’s privacy policy.

Indeed, companies like Acxiom are in the business of aggregating data about individuals and making them available. Data so collected can help banks combat identity theft by enabling them to test whether credit applicants are who they claim to be. If they fall into the wrong hands, however, the same data can enable subtle forms of redlining or even promote identity theft.

Vendors collecting data about us becomes a privacy issue whose substance depends on whether

  • we know what’s going on,
  • data are kept and/or shared, and
  • we can opt out.

Once we agree to disclose in return for “free” goods, however, the exchange becomes a security issue, because the same data can enable impersonation. It becomes a policy issue because the same data can enable inappropriate or illegal activity.

The solution to all this isn’t turning back the clock — the new barter economy is here to stay. What we need are transparency, options, and broad-based educational campaigns to help people understand the deal and choose according to their preferences.

As either Stan Delaplane or Calvin Trillin once observed about “market price” listings on restaurant menus (or didn’t — I’m damned if I can find anything authoritative, or for that matter any mention whatsoever of this, but  know I read it), “When you learn for the first time that the lobster you just ate cost $50, the only reasonable response is to offer half”.

Unfortunately, in today’s barter economy we pay the price before we get the lobster…

Impact of “Adult” and Generic Top-Level Internet Domains on Colleges and Universities

(This is a copy of one of my EDUCAUSE blog posts)

Internet domains in the new “adult” .xxx domain recently became available. So did arbitrary generic top-level domains (gTLDs) beyond the existing .com, .net, .org, .edu, .gov, and so forth. Both initiatives affect higher education. The effects of these initiatives thus far have been modest, but they have been entirely negative. So far as we know, no college or university has benefited from either initiative. Rather, institutions have been exposed to risk and incurred costs without receiving any value in return. On behalf of its members, EDUCAUSE proposes that procedures for issuing and managing generic top-level domains be tightened to reduce their unintended negative effects on colleges and universities.

I discussed the initiatives themselves more fully in an August 2011 post. Now that the initiatives are fully launched, this post provides some additional information and recommendations. I comment first on the risks arising from the .xxx domain, then on the costs institutions have incurred to mitigate those risks, and finally on some issues arising around generic top-level domains. I conclude with a few recommendations for ICANN and gTLD registrars, and one for colleges and universities.

Risks from the .xxx domain

Colleges and universities typically have .edu domains, and use these for their official business. In addition, many institutions have claimed relevant .com, .org, .biz, .info, or .net domains. Stanford University, for example, uses “stanford.edu” for its Web presence, but it also has licensed “stanford.com” and “stanford.org”. Similarly, many institutions have claimed relevant domains in selected country top-level domains (cTLDs) such as .us, .mx, .uk, or .cn, typically those where the institution has branch campuses. The goals in these cases typically  have been simply to avoid confusion.

The .xxx domain does more than simply increase the number of top-level domains that might lead to confusion. Institutions worry that purveyors of adult material might explicitly seek to market their wares by associating those wares with college or university names, much as Playboy magazine once did with its “Women of the Ivy League” and similar features, and that this might reflect negatively on a college or university’s reputation. That is, the risks introduced by the .xxx domain go well beyond those already arising from other top-level domains.

The risk is not hypothetical. As Hawaii News Now reported in early February,

The University of Hawaii is demanding the operator of a pornographic web site stop using the school’s name or face legal action. The web site, called universityofhawaii.xxx, claims to feature what it describes as “hot nude Hawaiian college girls.”  It is full of graphic pictures of men and women having sex on beaches and at other tropical locations.

This is precisely the kind of embarrassment many institutions worried about.

Before the new domain .xxx went live, institutions had the opportunity to block use of their identity in .xxx domains through a so-called “Sunrise B” registration — but only if the institution’s identity (name, team name, nickname, etc.) had been trademarked, and the identity to be blocked precisely matched the trademark. Once the new domain went live, Sunrise B registrations were no longer available, and the only recourse for an institution was to register the potentially offending domain itself once the “Landrush” and “General Availability” periods began– or, if the domain had already been registered, to persuade the registrant to reassign or relinquish it.

Sunrise B, regular registration, and persuasion all entail costs, which brings me to the next section.

Costs to Mitigate .xxx Risks

As I wrote above, institutions could have filed Sunrise B registrations for .xxx domains, and a few institutions did so successfully. Typically a successful Sunrise B registration cost $199 for ten years — but the fee was the same even if the registration was unsuccessful. Some institutions tried to obtain Sunrise B registrations for non-trademarked names, but this did not succeed. One college I’ll call “Alpha” paid $1,000 in an attempt to register five names, only three of which corresponded to registered trademarks. The registrar approved only the trademarked three, but did not refund the $400 Alpha had paid for the other two. Another college, “Beta”, paid $199 for one successful Sunrise B registration, and then obtained General Availability registrations for four others at $99/year per domain.

It’s interesting to note that the cost of a .xxx registration varies from registrar to registrar, from a reported low of $79/year to a reported high of $103; also, some registrars offered only 10-year Sunrise B registrations, while others offered a perpetual option.

An informal survey of EDUCAUSE members found that successful Sunrise B (trademark blocking) registrations varied from none to a high of 22, whereas Landrush and General Availability registrations varied from none to 11. The typical response was 1-3 Sunrise B registrations and about 4 regular registrations. A quick Web search finds myriad other instances of colleges and universities registering .xxx domains.

The names being registered or blocked typically are variations on the institution’s name plus variations on team names. Most institutions report that the process for registering .xxx domains is straightforward and efficient. Although most institutions complained that they should not have to pay to defend their names, few complained about the actual amount of the fees.

Domain squatters — individuals or entities who register domains with the intention of  reselling rather than using them — have been a long-time problem. In the Hawaii case, it’s reported that the entity that registered uhawaii.xxx demanded $100,000 to relinquish it. We have reports of at other institutions being approached by .xxx squatters, but in each case the institution simply refused to deal with the squatter.

Generic Top-Level Domains

Although the idea of a generic top-level domain in a college’s or university’s name is appealing, the logistics of applying for and managing one have kept most institutions from pursuing this option. As one colleague put it,

There are two problems.  First, I have been unable to find a third party to do the registrar function for us (and we are unable to do it ourselves). It seems no one has yet figured out that there is a business opportunity in doing this. Also, the application itself needs to be a multi-hundred page submission to meet the requirements of the guidebook.  I’m actually hoping that will change over time for trademark holders.  If I hold the trademark for [institution name], I don’t see why I need to answer most of the questions in the guidebook.

Unless these two issues are addressed, it is unlikely most colleges or universities will pursue their own gTLD.

Recommendations

Other than the Hawaii case, the new .xxx and gTLD initiatives have mostly caused colleges and universities to divert administrative effort and funds to blocking or registering domains. Even so, we believe that ICANN could impose some simple requirements on new domains such as .xxx that would greatly reduce problems for higher education without materially complicating matters for registrars in those gTLDs.

  1. Automatically impose a Sunrise B block on any domain within a gTLD that corresponds to a registered trademark. That is, if “alphagroup” is a registered trademark, then, for example, the registrar for the .xxx domain should automatically refuse to issue alphagroup.xxx to any entity other than the trademark holder. The simplest way to achieve this would be to require that applicants for a domain affirm, under penalty of perjury, that they have searched the relevant trademark databases and that the domain name they seek does not conflict with any registered trademark. The registrar should then be required to randomly spot-audit some fraction of applications to ensure that affirmations are valid.
  2. Automatically impose a Sunrise B block on any domain name within a gTLD that corresponds to an domain within the .edu, .gov, .mil, or any other similarly regulated gTLDs. That is, if there is already a domain bigstate.edu, then the registrar for the .xxx domain, for example, should reject an application for bigstate.xxx, and similarly for other gTLDs.
  3. For gTLDs designated for potentially offensive material, such as .xxx, impose a waiting period between application and registration during which the application is public and other entities may object to the registration of a particular domain. If someone objects formally to the registration, invoke an arbitration or mediation process to resolve the dispute in a timely way.
  4. For gTLD applications, reject any gTLD suffix that conflicts with a registered trademark unless it is being sought by the trademark holder.

If these requirements had been imposed on the .xxx domain, most of its negative effects on colleges and universities would have been mitigated. Some institutions would still have wanted to claim some .xxx domains as a defensive strategy, but at least they would not have been required to devote extra effort and money to defending names already trademarked.

This leads to one important recommendation for colleges and universities:

  1. Colleges and universities should wherever possible trademark the official name of their institution, the variations on that name and nicknames in common use, and do the same for team names, named schools, departments, institutes, and so forth, and distinctive mottos or slogans.

EDUCAUSE will be continuing to monitor this situation, and to file comments and make recommendations that might produce progress.