A while back I wrote here about hyphens, and some related usage issues. Since then I’ve taken that line of commentary over into my LinkedIn posts, and I’ll update this post periodically with the relevant links. Here’s what they are so far:
Archive for the ‘Uncategorized’ Category
The so-called “star wars” campuses of the mid-1980s (Brown, Carnegie Mellon, Dartmouth, and MIT) invented (or at least believe they invented–IT folklore runs rampant) much of what we take for granted and appreciate today in daily electronic life: single signon, secure authentication, instant messaging, cloud storage, interactive online help, automatic updates, group policy, and on and on.
They also invented things we appreciate less. One of those is online harassment, which takes many forms.
Early in my time as MIT’s academic-computing head, harassment seemed to be getting worse. Partly this was because the then-new Athena computing environment interconnected students in unprecedentedly extensive ways, and partly because the Institute approached harassment purely as a disciplinary matter–that is, trying to identify and punish offenders.
Those cases rarely satisfied disciplinary requirements, so few complaints resulted in disciplinary proceedings. Fewer still led to disciplinary action, and of course all of that was confidential.
Working with Mary Rowe, who was then the MIT “Ombuds“, we developed a different approach. Rather than focus on evidence and punishment, we focused on two more general goals: making it as simple as possible for victims of harassment to make themselves known, and persuading offenders to change their behavior.
The former required a reporting and handling mechanism that would work discreetly and quickly. The latter required something other than threats.
Satisfying the first requirement was relatively simple. We created an email alias (firstname.lastname@example.org) to receive and handle harassment (and, in due course, other) complaints. Email sent to that address went to a small number of senior IT and Ombuds staff, collectively known as the Stopits. The duty Stopit–often me–responded promptly to each complaint, saying that we would do what we could to end the harassment.
We publicized Stopit widely online, in person, and with posters. In the poster and other materials, we gave three criteria for harassment:
- Did the incident cause stress that affected your ability, or the ability of others, to work or study?
- Was it unwelcome behavior?
- Would a reasonable person of your gender/race/religion subjected to this find it unacceptable?”
Anyone who felt in danger, we noted, should immediately communicate with campus police or the dean on call, and we also gave contact information for other hotlines and resources. Otherwise, we asked that complainants share whatever specifics they could with us, and promised discretion under most circumstances.
To satisfy the second requirement, we had to persuade offenders to stop–a very different goal, and this is the key point, from bringing them to justice. MIT is a laissez-faire, almost libertarian place, where much that would be problematic elsewhere is tolerated, and where there is a high bar to formal action.
As I wrote in an MIT Faculty Newsletter article at the time, we knew that directly accusing offenders would trigger demands for proof and long, futile arguments about the subtle difference between criticism and negative comments–which are common and expected at the Institute–and harassment. Prosecution wouldn’t address the problem.
And so we came up with the so-called “UYA” note.
“Someone using your account…”, the note began, and then went on to describe the alleged behavior. “If you did not do this,” the note went on, “…then quite possibly someone has managed to access your account without permission, and you should take immediate steps to change your password and not share it with anyone.” The note then concluded by saying “If the incident described was indeed your doing, we ask that you avoid such incidents in the future, since they can have serious disciplinary or legal consequences”.
Almost all recipients of UYA notes wrote back to say that their accounts had indeed been compromised, and that they had changed their passwords to make sure their accounts would not be used this way again. In virtually all such cases, the harassment then ceased.
Did we believe that most harassment involved compromised accounts, and that the alleged offenders were innocent? Of course not. In many cases we could see, in logs, that the offender was logged in and doing academic work at the very workstation and time whence the offending messages originated. But the UYA note gave offenders a way to back off without confession or concession. Most offenders took advantage of that. Our goal was to stop the harassment, and mostly the UYA note achieved that.
There was occasional pushback, usually the offender arguing that the incident was described accurately but did not constitute harassment. Here again, though, the offending behavior almost always ceased. And in a few cases there was pushback of the “yeah, it’s me, and you can’t make me stop” variety. In those, the Stopits referred the incident into MIT’s disciplinary process. And usually, regardless of whether the offender was punished, the harassment stopped.
So Stopit and UYA notes worked.
Looking back, though, they neglected some important issues, and those remain problematic. In fact, the two teaching cases I mentioned in the Faculty Newsletter article and have used in myriad class discussions since–Judy and Michael–reflect two such issues: the difference between harassment and a hostile work environment, and jurisdictional ambiguity.
Judy Hamilton complains that images displayed on monitors in a public computing facility make it impossible for her to work comfortably. This really isn’t harassment, since the offending behavior isn’t directed at her. Rather, the offender’s behavior made it uncomfortable for Judy to work even though the offender was unaware of Judy or her reaction.
The UYA note worked: the offender claimed that he’d done nothing wrong, and that he had every right to display whatever images he chose so long as they weren’t illegal, but nevertheless he chose to stop.
But it was not correct to suggest that he was harassing Judy, as we did at the time. Most groups that have discussed this case over the years come to that conclusion, and instead say this should have been handled as a hostile-work-environment case. It’s an important distinction to keep in mind.
Michael Zareny, on the other hand, is interacting directly with Jack Oiler, and there’s really no work environment involved. Jack feels harassed, but it’s not clear Michael’s behavior satisfies the harassment criteria. Jack appears to be annoyed, rather than impaired, by Michael’s comments. In any case the interaction between the two would be deemed unfortunate, rather than unacceptable, by many of Jack’s peers.
Or, and this is a key point, the interaction would be seen that way by Jack’s peers at MIT. There’s an old Cambridge joke: At Harvard people are nice to you and don’t mean it, and MIT people aren’t nice to you and don’t mean it. The cultural norms are different. What is unacceptable to someone at Harvard might not be to someone at MIT. So arises the first jurisdictional ambiguity.
In the event, the Michael situation turned out to be even more complicated. When Kim tried to send a UYA note to Michael, it turned out that there was no Michael Zareny at MIT. Rather, it turned out that Michael Zareny was a student elsewhere, and his sole MIT connection was interacting with Jack Oiler in an the newsgroup.
There thus wasn’t much Kim could do, especially since Michael’s own college declined to take any action because the problematic behavior hadn’t involved its campus or IT.
The point to all this is straightforward, and it’s relevant beyond the issue of harassment. In today’s interconnected world, it’s rare for problematic online behavior to occur within the confines of a single institution. As a result, taking effective action generally requires various entities to act consistently and collaboratively to gather data from complainants and dissuade offenders.
Yet the relevant policies are rarely consistent from campus to campus, let alone between campuses and ISPs, corporations, or other outside entities. And although campuses are generally willing to collaborate, this often proves difficult for FERPA, privacy, and other reasons.
It’s clear, especially with all the recent attention to online bullying and intimidation, that harassment and similarly antisocial behavior remain a problem for online communities. It’s hard to see how this will improve unless campuses and other institutions work together. If they don’t do that, then external rules–which most of us would prefer to avoid–may well make it a legal requirement.
“It’s one of the real black marks on the history of higher education, ” Leon Botstein, the long-time President of Bard College, recently told The New Yorker’s Alice Gregory, “that an entire industry that’s supposedly populated by the best minds in the country … is bamboozled by a third-rate news magazine.” He was objecting, of course, to the often criticized but widely influential rankings of colleges and universities by US News & World Reports.
Two stories, and a cautionary note.
Seeing Wired magazine‘s annual “wired campus” rankings in the same way Botstein viewed those from US News, some years ago several of us college and university CIOs conspired to disrupt Wired‘s efforts. As I later wrote, the issue wasn’t that some campuses had different (and perhaps better or worse) IT than others. Rather, for the most part these differences bore little relevance to the quality of those campuses’ education or the value they provided to students.
We persuaded almost 100 key campuses to withhold IT data from Wired. After meeting with us to see whether compromise was possible (it wasn’t) and an abortive attempt to bypass campus officials and gather data directly from students, the magazine discontinued its ratings. Success.
But, as any good pessimist knows, every silver lining has a cloud. Wired had published not only summary ratings, but also, to its credit, the data (if not the calculations) upon which the ratings were based. Although the ratings were questionable, and some of the data seemed suspect, the latter nevertheless had some value. Rather than look at ratings, someone at Campus A could look and see how A’s reported specific activity compared to its peer Campus B’s.
Partly to replace the data Wired had gathered and made available, and so extend A’s ability to see what B was doing, EDUCAUSE started the Core Data Survey (now the Core Data Service, CDS). This gathered much of the same information Wired had, and more. (Disclosure: I served on the committee that helped EDUCAUSE design the initial CDS, and revised it a couple of years later, and have long been a supporter of the effort.)
Unlike Wired, EDUCAUSE does not make individual campus data publicly available. Rather, participating campuses can compare their own data to those of all or subsets of other campuses, using whatever data and comparison algorithm they think appropriate. I can report from personal experience that this is immensely useful, if only because it stimulates and focuses discussions among campuses that appear to have made different choices.
But back to Botstein. EDUCAUSE doesn’t just make CDS data available to participating campuses. It also uses CDS data to develop and publish “Free IT Performance Metrics,” which it describes as “Staffing, financials, and services data [campuses] can use for modifications, enhancements, and strategic planning.” The heart of Botstein’s complaint about US News & World Reports isn’t that the magazine is third rate–that’s simply Botstein being Botstein–but rather that US News believes the same rating algorithm can be validly used to compare campuses.
Which raises the obvious question: Might EDUCAUSE-developed “performance metrics” fall into that same trap? Are there valid performance metrics for IT that are uniformly applicable across higher education?
Many campuses have been bedeviled and burned by McKinseys, BCGs, Accentures, Bains, PWCs, and other management consultants. These firms often give CFOs, Provosts, and Presidents detailed “norms” and “standards” for things like number of users per help-desk staffer, the fraction of operating budgets devoted to IT, or laptop-computer life expectancy. These can then become targets for IT organizations, CIOs, or staff in budget negotiations or performance appraisal.
Some of those “norms” are valid. But many of them involve inappropriate extrapolation from corporate or other different environments, or implicitly equate all campus types. Language is important: “norms,” “metrics,” “benchmarks,” “averages,” “common”, “typical,” and “standards” don’t mean the same thing. So far EDUCAUSE has skirted the problem, but it needs to be careful to avoid asserting uniform validity when there’s no evidence for it.
A second story illustrates a different, more serious risk. A few years ago a major research university–I’ll call it Lake Desert University or LDU–was distressed about its US News ranking. To LDU’s leaders, faculty, and students the ranking seemed much too low: Lake Desert generally ranked higher elsewhere.
A member of the provost’s staff–Pat, let’s say–was directed to figure out what was wrong. Pat spent considerable time looking at US News data and talking to its analysts. An important component of the US News ranking algorithm, Pat learned, was class size. The key metric was the fraction of campus-based classes with enrollments smaller than 20.
Pat, a graduate of LDU, knew that there were lots of small classes at Lake Desert–the university’s undergraduate experience was organized around tutorials with 4-5 students–and so it seemed puzzling that LDU wasn’t being credited for that. Delving more deeply, Pat found the problem. Whoever had completed LDU’s US News questionnaire had read the instructions very literally, decided that “tutorials” weren’t “classes”, and so excluded them from the reporting counts. Result: few small classes, and a poor US News ranking.
US News analysts told Pat that tutorials should have been counted as classes. The following year, Lake Desert included them. Its fraction-of-small-classes metric went up substantially. Its ranking jumped way up. The Provost sent Pat a case of excellent French wine.
In LDU’s case, understanding the algorithm and looking at the survey responses unearthed a misunderstanding. Correcting this involved no dishonesty (although some of LDU’s public claims about the “improvement” in its ranking neglected to say that the improvement had resulted from data reclassification rather than substantive progress).
But not all cases are as benign as LDU’s . As I wrote above, there were questions not only about Wired‘s ranking algorithm, but about some of the data campuses provided. Lake Desert correcting its survey responses in consultation with analysts is one thing; a campus misrepresenting its IT services to get a higher ranking is another. But it can be hard to distinguish the two.
Auditing is one way to address this problem, but audits are expensive and difficult. Publishing individual responses is another–both Wired and US News have done this, and EDUCAUSE shares them with survey respondents–but that only corrects the problem if respondents spend time looking at other responses, and are willing to become whistleblowers when they find misrepresentation. Most campuses don’t have the time to look at other campuses’ responses, or the willingness to call out their peers.
If survey responses are used to create ratings, and those ratings become measures of performance, then those whose performance is being measured have incentive to tailor their survey responses accordingly. If the tailoring involves just care within the rules, that’s fine. But if it involves stretching or misrepresenting the truth, it’s not.
More generally, it’s important to closely connect the collection of data to their evaluative use. Who reports, should decide.
(…with apologies to Susan Sontag, of course.)
Visiting the trade show at the EDUCAUSE conference requires strategy. At one time it was simple: collect every pen being given away (having some conversations with vendors in the process), so that back home the kid could give them to his friends at school. Kid grew up, though, and there came “No more pens, Dad, please.”
After that I usually walked around with Ira Fuchs, who had an excellent eye for the interestingly novel product. But Ira hasn’t been attending, so I’ve taken to observing two things: how vendors staff their booths, and what they give away–the swag.
The interesting thing about staffing is what it tells us vendors assume about higher-education IT, and especially what they assume about our procurement decisions. I track two variables: whether booths are staffed by people who know something about the product and higher education, and whether they’re chosen for reasons other than expertise.
This year the booth staff seem reasonably attuned to product and customer, and, with the exception of some game barkers, two people dressed up as giant blue bears, and two women dressed like 1950s flight attendants, most of them pretty much looked like the attendees, except with logos on their shirts.
To be be more precise, the place wasn’t full of what are sometimes called Demo Dollies, attractive young women with no product knowledge deployed on the assumption that they will attract men to their booths (and therefore on the assumption that men are making the key decisions). That there aren’t many of them is good, since a few years back things were quite different, reaching a nadir with the infamous catwomen. We don’t want industry thinking of higher education as a market easily influenced by Demo Dollies.
The interesting thing about swag–the stuff that vendors give away–is that it tells us something about the resources vendors are committing to higher education, the resources they think are available from higher education, or both. There are two dimensions to swag: how swanky it is, and how creative it is.
I spent some time on this year’s tradeshow floor looking for swag that rose above the commonplace, and here’s what struck me: there wasn’t much. There were lots of pens (which I’m still not allowed to bring home), lots of candy, and lots of small USB thumb drives, all of course bearing vendor logos. I count those as neither swanky nor creative.
The growing swag sector is stuff made out of foam or soft plastic. This includes baseballs, footballs, various kinds of phone-propper-uppers, can holders, and a few creatures and cartoon characters. Some of this related in some way to the vendor’s product or slogan or brand, but most of it didn’t. The foam stuff was mildly creative, except it’s less and less so each year; there was lots more of that this year than last.
There were keychain carabiners (which I always look for, since I keep leaving them in rental cars–and this year only two vendors had them), earphones, t-shirts (remarkably few of those compared to previous years, when they were ubiquitous), USB chargers, corkscrews, can openers, pens that light up, baseball caps, and kitchen utensils (my personal favorite, I think). Several vendors told me the one to get was a jump rope with blinking handles, but I couldn’t find it. Next year.
(I’ve uploaded photos of the distinctive swag to an album on my Facebook account.)
…here’s the thing. That most of the available swag was low-end and uncreative may disappoint those who take lots home for friends or family or colleagues or whomever. It also may mean that vendors selling to higher education aren’t as flush as they once were, or think we aren’t; both of those are probably somewhat true, and neither is especially good news.
Combined with the dearth of Demo Dollies, though, I see the situation somewhat more positively. It seems to me that even though they may be less flush, this year’s vendors are taking the higher-education market seriously, using knowledgeable staff rather than artifice to engage customers, who may also be less flush, and sell product wisely.
That, as Martha Stewart would say, is a good thing!
Kim reread the message from Jack Oiler:
My concern is about one Michael Zareny, who is using his University identity to post comments in Reddit and elsewhere and to send messages with extremely derogatory claims about gay men. Normally I would be most solidly against censorship, but if similar remarks about the immorality of Jews or Blacks were made, they would probably be illegal. I have tried at great length to reason with MZ, but his prejudices seem to be beyond reason. He was previously using an account elsewhere before he moved to the University. I am disappointed to see Zareny’s trash emanating from the University. I also think that if the hate laws covered gender orientation, he would be in violation of the law.
Could you please respond to my plea? As I said, I am very uncomfortable with censorship of any form, but MZ has been going on for more than three years now, and his views are quite beyond rational comment. I have suggested that we take the debate to philosophical journals instead of the Internet (he suggested the same thing, but shows no signs of doing so, despite my having published papers on issues underlying the topic), since there are some established standards there. He has made unsubstantiated remarks about my character and relations with my students, that if I might consider taking legal action over.
This, Kim thought, was going to be a tough one. Kim went to the keyboard computer, fired up Reddit, and went looking for Oiler and Zareny.
© 2013 Gregory A Jackson
This case is to promote discussion, not to document good or poor handling of a situation. All names have been changed.
“All I really wanted to do last night,” Kim complained to a friend, “was find out how badly the Red Sox had lost in the afternoon game.”
Then, the way these things unfold on the Internet, one thing led to another. I started looking at blog posts about the game, then at game pictures random people had posted on Facebook, and all of a sudden I was looking at a tagged, time-stamped picture of Bobby there eating a hot dog at the game—a picture taken at the very same time I’m pretty sure Bobby was supposedly clocked in here at work on campus.
“I don’t know whether I should mention this to Bobby’s supervisor—in fact, I don’t know whether I’m allowed to mention this at all, or whether the supervisor could act on it if I did. So I decided to come ask you what to do, and whether we have any college policies that might help.”
© 2013 Gregory A Jackson
This case is to promote discussion, not to document good or poor handling of a situation. All names have been changed.
“You can’t possibly be serious!,” Jamie shouted, near the end of a long meeting, one that no one thought would turn out so controversial. “Are you saying that privacy is more important than security?” The issue was whether to put surveillance cameras in the campus’s parking-lot stairwells, in the hope that the cameras would discourage loitering and assaults—or at least make incidents easier to investigate and prosecute.
When the campus security office proposed installing the cameras, with the EVP’s support, it listed hundreds of campuses that had done so without incident. But a small group of faculty was raising concerns, and the provost was listening. Alex, the dissidents’ leader, had made several points, calmly at first, but with increasing vehemence. “What if,” Alex said, “the cameras aren’t just used for security?”
What if they record union employees arriving or leaving at times that don’t correspond to when they clocked in and out? What if the cameras are where a faculty member routinely leaves with someone other than a spouse, and the aggrieved spouse’s divorce lawyer demands access to all of the camera tapes just in case they contain evidence of infidelity? What if the feds demand access for some reason I can’t even imagine?
Jamie had tried to point out that none of this was likely, and that people sneaking around doing things they shouldn’t be doing perhaps don’t deserve privacy protections. Jamie even suggested, thinking it was a compromise, that perhaps the city police might install and monitor the cameras instead of the campus security office.
To Jamie’s surprise, that set off an even stronger reaction from Alex: “It sounds an awful lot like Big Brother to me. I hadn’t even thought about cops having access, and that makes me all the more opposed.”
© 2013 Gregory A Jackson
This case is to promote discussion, not to document good or poor handling of a situation. All names have been changed.
Is there a computer cluster somewhere where someone can be safe from pornography and harassment? I’m sick of this.
Kim, the University’s Director of Academic Computing, knew from a conversation with the University Ombudswoman what Judy Hamilton was complaining about: she had gone into a public computing cluster and sat down next to a male student whose screen was displaying a graphic image of a sexual act. Judy had asked the student to remove the image, since it was interfering with her ability to work comfortably, and he’d refused—loudly and contentiously. After a shouting match, Judy left to find someplace else to work. She complained to friends, and to the Ombudswoman, who sent her to Kim.
Kim knew that hard-core displays such as had offended Judy were relatively rare, but that other offending images—nudes, for example, and even animal-experiment photos from a server maintained by an outspoken faculty member. Many students would quietly remove offending images when someone else complained, but others would refuse, citing free speech. “I like this stuff and it helps me keep working,” a male student had written Kim in another instance. “Why,” the student had concluded, “is my work less important than hers?”
The University’s policies forbade harassment, but not pornography. The harassment policy probably applied to Judy’s case, Kim thought, but its remedies fell short of what Judy wanted: for Kim and the University to forbid the display of pornographic images, and perhaps to enforce the ban technologically. Kim would need to define “pornographic,” which was not necessary under the University’s current policy. Then again, Kim needed a definition “harassment,” which didn’t appear any easier.
Kim perceived two tasks: to respond to Judy’s message, and to decide whether the University needed better or different policies to deal with her situation.
© 2013 Gregory A Jackson
I’m at loose ends after graduating. The Dean for Student Affairs, whom I’ve gotten to know through a year of complicated political and educational advocacy, wants to know more about MIT‘s nascent pass/fail experiment, under which first-year students receive written rather than graded evaluations of their work.
MIT being MIT, “know more” means data: the Dean wants quantitative analysis of patterns in the evaluations. I’m hired to read a semester’s worth, assign each a “Usefulness” score and a “Positiveness” score, and then summarize the results statistically.
Two surprises. First, Usefulness turns out to be much higher than anyone had expected–mostly because evaluations contain lots of “here’s what you can do to improve” advice, rather than lots of terse “you would have gotten a B+” comments, as had been predicted. Second, Positiveness distributes remarkably as grades had for the pre-pass/fail cohort, rather than skewing higher, as had been predicted. Even so, many faculty continue to believe both predictions (that is, they think written evaluations are both generally useless and inappropriately positive).
A byproduct of the assignment is my first exposure to MIT’s glass-house computer facility, an IBM 360 located in the then-new Building 39. In due course I learn that Jay Forrester, an MIT faculty member, had patented the use of 3-D arrays of magnetic cores for computer memory (the read-before-write use of cores, which enabled Forrester’s breakthrough, had been patented by An Wang, another faculty member, of the eponymous calculators and word processors). IBM bought Wang’s patent, but not Forrester’s, and after protracted legal action eventually settled with Forrester in 1964 for $13-million.
According to MIT mythology, under the Institute’s intellectual-property policy half of the settlement came to the Institute, and that money built Building 39. Only later do I wonder whether the Forrester/IBM/39 mythology is true. But not for long: never let truth stand in the way of a good story.
Not just because mythology often involves memorable, simple stories, belief in mythology is durable. This is important because belief so heavily drives behavior. That belief resists even data-driven contradiction–data analysis rarely yields memorable, simple stories–is one reason analytics so often prove curiously ineffective in modifying institutional behavior.
Two examples, both involving the messy question of copyright infringement by students and what, if anything, campuses should do about it.
I’m having lunch with a very smart, experienced, and impressive senior officer from an entertainment-industry association, whom I’ll call Stan. The only reason universities invest heavily in campus networks, Stan tells me, is to enable students to download and share ever more copyright-infringing movies, TV shows, and music. That’s why campuses remain major distributors of “pirated” entertainment, he says, and therefore why it’s appropriate to subject higher education generally to regulations and sanctions such as the “peer to peer” regulations from the 2008 Higher Education Opportunity Act.
That Stan believes this results partly from a rhetorical problem with high-performance networks, such as the research networks within and interconnecting colleges and universities. High-performance networks–even those used by broadcasters–usually are engineered to cope with peak loads. Since peaks are occasional, most of the time most network capacity goes unused. If one doesn’t understand this–as Stan doesn’t–then one assumes that the “unused” capacity is in fact being used, but for purposes not being disclosed.
And, as it happens, there’s mythology to fill in the gap: According to a 2005 MPAA study, Stan tells me, higher education accounts for almost half of all copyright infringement. So MPAA, and therefore Stan, knows what campuses aren’t telling us: they’re upgrading campus networks to enable infringement.
But Stan is wrong. There are two big problems with his belief.
First, shortly after MPAA asserted, both publicly and in letters to campus presidents, that 44% of all copyright infringement emanates from college campuses, which is where Stan’s “almost half” comes from, MPAA learned that its data contractor had made a huge arithmetic error. The correct estimate should have been more like 10-15%. But the corrected estimate was never publicized as extensively as the erroneous one: the errors that statisticians make live after them; the corrections are oft interred with their bones.
Second, if Stan’s belief is correct, then there should be little difference among campuses in the incidence of copyright infringement, at least among campuses with research-capable networking. Yet this isn’t the case. As I’ve found researching three years of data on the question, the distribution of detected infringement is highly skewed. Most campuses are responsible for little or no distribution of infringing material, presumably because they’re using Packetlogic, Palo Alto firewalls, or similar technologies to manage traffic. Conversely, a few campuses account for the lion’s share of detected infringement.
So there are ample data and analytics contradicting Stan’s belief, and none supporting it. But his belief persists, and colors how he engages the issues.
I’m having dinner with the CIO from an eminent research university; I’ll call her Samantha, and her campus Helium (the same name it has in the infringement-data post I cited above). We’re having dinner just as I’m completing my 2013 study, in which Helium has surpassed Hydrogen as the largest campus distributor of copyright-infringing movies, TV shows, and music.
In fact, Helium accounts for 7% of all detected infringement from the 5,000 degree-granting colleges and universities in the United States. I’m thinking that Samantha will want to know this, that she will try to figure out what Helium is doing–or not doing–to stand out as such a sore thumb among peer campuses, and perhaps make some policy or practice changes to bring Helium into closer alignment.
But no: Samantha explains to me that the data are entirely inaccurate. Most of the infringement notices Helium receives are duplicates, she tells me, and in any case the only reason Helium receives so many is that the entertainment industry intentionally targets Helium in its detection and notification processes. Since the data are wrong, she says, there’s no need to change anything at Helium.
I offer to share detailed data with Helium’s network-security staff so that they can look more closely at the issue, but Samantha declines the offer. Nothing changes, and in 2014 Helium is again one of the top recipients of infringement notices (although Hydrogen regains the lead it had held in 2012).
The data Samantha declines to see tell an interesting story, though. The vast majority of Helium’s notices, it turns out, are associated with eight IP addresses. That is, each of those eight IP addresses is cited in hundreds of notices, which may account for Samantha’s comment about “duplicates”. Here’s what’s interesting: the eight addresses are consecutive, and they each account for about the same number of notices. That suggests technology at work, not individuals.
As in Stan’s case, it helps to know something about how campus networks work. Lots of traffic distributed evenly across a small number of IP addresses sounds an awful lot like load balancing, so perhaps those addresses are the front end to some large group of users. “Front end to some large group of users” sounds like an internal network using Network Address Translation (NAT) for its external connections.
NAT issues numerous internal IP addresses to users, and then technologically translates those internal addresses traceably into a much smaller set of external addresses. Most campuses use NAT to conserve their limited allocation of external IP addresses, especially for their campus wireless networks. NAT logs, if kept properly, enable campuses to trace connections from insiders to outside and vice versa, and so to resolve those apparent “duplicates”.
So although it’s true that there are lots of duplicate IP addresses among the notices Helium receives, this probably stems from Helium’s use of NAT on its campus wireless. Helium’s data are not incorrect. If Helium were to manage NAT properly, it could figure out where the infringement is coming from, and address it.
Samantha’s belief that copyright holders target specific campuses, like Stan’s that campuses expand networks to encourage infringement, has a source–in this case, a presentation some years back from an industry association to a group of IT staff from a score of research universities. (I attended this session.) Back then, we learned, the association did target campuses, not out of animus, but simply as a data-collection mechanism. The association would choose a campus, look for infringing material being published from the campus’s network, send notices, and then move on to another campus.
Since then, however, the industry had changed its methodology, in large part because the BitTorrent protocol replaced earlier ones as the principal medium for download-based infringement. Because of how BitTorrent works, the industry’s methodology shifted from searching particular networks to searching BitTorrent indexes for particularly popular titles and then seeing which networks were making those titles available.
I spent lots of time recently with the industry’s contractors looking closely at that methodology. It appears to treat campus networks equivalently to each other and to commercial networks, and so it’s unlikely that Helium was being targeted as Samantha asserted.
If Samantha had taken the infringement data to her security staff, they probably would have discovered the same thing I did, and either used NAT data to identify offenders, or perhaps to justify policy changes for the wireless network. Same goes for exploring the methodology. But instead Samantha relied on her belief that the data were incorrect and/or targeted
Promoting Analytic Effectiveness
Because of Stan’s and Samantha’s belief in mythology, their organizations’ behavior remains largely uninformed by analytics and data.
A key tenet in decision analysis holds that information has no value (other than the intrinsic value of knowledge) unless the decisions an individual or an institution have before them will turn out differently depending on the information. That is, unless decisions depend on the results of data analysis, it’s not worth collecting or analyzing data.
Colleges, universities, and other academic institutions have difficulty accepting this, since the intrinsic value of information is central to their existence. But what’s valuable intrinsically isn’t necessarily valuable operationally.
Generic praise for “data-based decision making” or “analytics” won’t change this. Neither will post-hoc documentation that decisions are consistent with data. Rather, what we need are good, simple stories that will help mythology evolve: case studies of how colleges and universities have successfully and prospectively used data analysis to change their behavior for the better. Simply using data analysis doesn’t suffice, and neither does better behavior: we need stories that vividly connect the two.
Ironically, the best way to combat mythology is with–wait for it–mythology…
Under certain provisions from the Digital Millennium Copyright Act, copyright holders send a “notification of claimed infringement” (sometimes called a “DMCA” or “takedown” notice) to Internet service providers, such as college or university networks, when they find infringing material available from the provider’s network. I analyzed counts of infringement notices from the four principal senders to colleges and universities over three time periods (Nov 2011-Oct 2012, Feb/Mar 2013, and Feb/Mar 2014).
In all three periods, most campuses received no notices, even campuses with dormitories. Among campuses receiving notices, the distribution is highly skewed: a few campuses account for a disproportionately large fraction of the notices. Five campuses consistently top the distribution in each year, but beyond these there is substantial fluctuation from year to year.
The volume of notices sent to campuses varies somewhat positively with their size, although some important and interesting exceptions keep the correlation small. The incidence of detected infringement varies strongly with how residential campuses are. It varies less predictably with proxy measures of student-body affluence.
I elaborate on these points below.
The estimated total number of notices for the twelve months ending October 2012 was 243,436. The actual number of notices in February/March 2013 was 39,753, and the corresponding number a year later was 20,278.
The general pattern was the same in each time period.
- According to the federal Integrated Postsecondary Education Data Service (IPEDS), from which I obtained campus attributes, there are 4,904 degree-granting campuses in the United States. Of these, over 80% received no infringement notices in any of the three time periods.
- 90% of infringement notices went to campuses with dormitories.
- Of the 801 institutions that received at least one notice in one period, 607 received at least one notice in two periods, and 437 did so in all three. The distribution was highly skewed among the campuses that received at least one infringement notice. The top two recipients in each period were the same: they alone accounted for 12% of all notices in 2012, and 10% in 2013 and 2014.
- In 2012, 10 institutions accounted for a third of all notices, and 41 accounted for two thirds. In 2013, the distribution was only a little less skewed: 22 institutions accounted for a third of all notices, and 94 accounted for two thirds. In 2014, 22 institutions also accounted for a third of all notices, and 99 accounted for two thirds.
In 2014, just 590 of the 4,904 campuses received infringement notices in 2014. Here is a breakdown by institutional control and type:
Here are the same data, this time broken down by campus size and residential character (using dormitory beds per enrolled student to measure the latter; the categories are quintiles):
About a third of all notices went to very large campuses in the middle residential quintile. In keeping with the classic Pareto ratio, the largest 20% of campuses account for 80% of all notices (and enroll ¾ of all students). Although about half of the largest group is nonresidential (mostly community colleges, plus some state colleges), only a few of them received notices.
The top two among the 100 campuses that received the most notices in Feb/Mar 2014 received over 1,000 notices each in the two months. The next highest campus received 615. As the graph below shows, the top 100 campuses accounted for two thirds of the notices; the next 600 campuses accounted for the remaining third (click on this graph, or the others below, to see it full size):
Below is a more detailed distribution for the top 30 recipient campuses, with comparisons to 2012 and 2013 data. To enable valid comparison, this chart shows the fraction of notices received by each campus in each year, rather than the total. The solid red bars are the campus’s 2014 share, and the lighter blue and green bars are the 2012 and 2013 shares. The hollow bar for each campus is the incidence of detected infringement, defined as the number of 2014 notices per thousand headcount students.
As in earlier analyses, there is an important distinction between campuses whose high volume of notices stems largely from their size, and those where it stems from a combination of size and incidence—that is, the ratio of notices received to enrollment.
In the graph, Carbon and Nitrogen are examples of the former: they are both very large public urban universities enrolling over 50,000 students, but with relatively low incidence of around 7 notices per thousand students. They stand in marked contrast to incidences of 20-60 notices per thousand students at Lithium, Boron, Neon, Magnesium, Aluminum, and Silicon, each of which enrolls 10-25,000 students—all private except Aluminum.
Changes over Time
The overall volume of infringement notices varies from time to time depending on how much effort copyright holders devote to searching for infringement (effort costs money), and to a lesser extent based on which titles they use to seed searches. The volume of notices sent to campuses varies accordingly. However, the distribution of notices across campuses should not be affected by the total volume. To analyze trends, therefore, it is important to use a metric independent of total volume.
As in the preceding section, I used the fraction of all campus notices each campus received for each period. The top two campuses were the same in all three years: Hydrogen was highest in 2012 and 2014, and Helium was highest in 2013.
Only five campuses received at least 1.5% of all notices in more than one year:
These campuses consistently stand at the top of the list, account for a substantial fraction of all infringement notices, and except for Beryllium have incidence over 20. As I argue below, it makes sense for copyright holders to engage them directly, to help them understand how different they are from their peers, and perhaps to persuade them to better “effectively combat” infringement from their networks by adopting policies and practices from their low-incidence peers.
Aside from these five campuses, there is great year-to-year variation in how many notices campuses receive. Below, for example, is a similar graph for the approximately 50 campuses receiving 0.5%-1.5% of all notices in at least one of the three years. Such year-to-year variation makes engagement much more difficult to target efficiently and much less likely to have discernible effects.
All else equal, if infringement is the same across campuses and campuses take equally effective measures to prevent it from reaching the Internet, then the volume of detected infringement should generally vary with campus size. That this is only moderately the case implies that student behavior varies from campus to campus and/or that campuses’ “effectively combat” measures are different and have different effects.
Here are data for the 100 campuses receiving the most infringement notices in 2014:
It appears visually that the overall correlation between campus size and notice volume is modest (and indeed r=0.29) because such a large volume of notices went to Hydrogen and Helium, which are not the largest campuses.
However, the correlation is slightly lower if those two campuses are omitted. This is because Lithium has the next highest volume, yet is of average size, and Manganese, the largest campus in the group, with over 70,000 students, had very low incidence of 2 notices per thousand students. (I’ve spoken at length with the CIO and network-security head at Manganese, and learned that its anti-infringement measures comprise a full array of policies and practices: blocking of peer-to-peer protocols at the campus border, with well-established exception procedures; active followthrough on infringement notices received; and direct outreach to students on the issue.)
If students live on campus, then typically their network connection is through the campus network, their detectable infringement is attributed to the campus, and that’s where the infringement notice goes. If students live off campus, then they do not use the campus network, and infringement notices go to their ISP. This is why most infringement notices go to campuses with dorms, even though the behavior of their students probably resembles that of their nonresidential peers.
For the same reason, we might expect that residentially intensive campuses (measured by the ratio of dormitory beds to total enrollment) would have a higher incidence of detectable infringement, all else equal, than less residential campuses. Here are data for the 100 campuses receiving the most infringement notices:
The relationship is positive, as expected, and relatively strong (r=.58). It’s important, though, to remember that this relationship between campus attributes (residential intensity and the incidence of detected infringement) does not necessarily imply a relationship between student attributes such as living in dorms and distributing infringing material. Drawing inferences about individuals from data about groups is the “ecological fallacy.”
One hears arguments that infringement varies with affluence, that is, that students with less money are more likely to infringe. There’s no way to assess that directly with these data, since they do not identify individuals. However, IPEDS campus data include the fraction of students receiving Federal grant aid, which varies inversely with income. The higher this fraction, the less affluent, on average, the student body should be. So it’s interesting to see how infringement (measured by incidence rather than volume) varies with this metric:
The relationship is slightly negative (r=-.12), in large part because of Polonium, a small private college with few financial-aid recipients that received 83 notices per 1000 students in 2014. (Its incidence was similar in 2012, but much lower in 2013.) Even without Polonium, however, the relationship is small.
For the same reason, we might expect a greater incidence of detected infringement on less expensive campuses. The data:
Once again the relationship is the opposite (r=.54), largely because most campuses have both low tuition and low incidence.
Following the 2012 and 2013 studies, I communicated directly with IT leaders at several campuses with especially high volumes of infringement notices. All save one (Hydrogen) of these interactions were informative, and several appear to have influenced campus policies and practices for the better.
- Helium. Almost all of Helium’s notices are associated with a small, consecutive group of IP addresses, presumably the external addresses for a NAT-mediated campus wireless network. I learned from discussions with Helium’s CIO that the university does not retain NAT logs long enough to identify wireless users when infringement notices are received; as a result, few infringement notices reach offenders, and so they have little impact directly or indirectly. Helium apparently understands and recognizes the problem, but replacing its wireless logging systems is not a high priority project.
- Hydrogen. Despite diverse direct, indirect, and political efforts to engage IT leaders at Hydrogen, I was never able to open discussions with them. I do not understand why the university receives so many notices (unlike Helium’s, they are not concentrated), and was therefore unable to provide advice to the campus. It is also unclear whether the notices sent to Hydrogen are associated with its small-city main campus or with its more urban branch campus.
- Krypton. Krypton used to provide guests up to 14 days of totally unrestricted and anonymous use of its wired and wireless networks. I believe that this led to its high rate of detected infringement. More recently, Krypton implemented a separate guest wireless network, which is still anonymous but apparently is either more restricted or is routed to an external ISP. I believe that this change is why Krypton is no longer in the top 20 group in 2014. (Krypton still offers unrestricted 14-day access to its wired network.)
- Lithium. The network-security staff at Lithium told me that there are plans to implement better filtering and blocking on their network, but that implementation has been delayed.
- Nitrogen. Nitrogen enrolls over 50,000 students, more than almost any other campus. As I pointed out above, although Nitrogen’s infringement notice counts are substantial, they are actually relatively low when adjusted for enrollment.
- Gallium. I discussed Gallium’s high infringement volume with its CIO in early 2013. She appeared to be surprised that the counts were so high, and that they were not all associated with Gallium affiliate campuses, as the university had previously believed. Although the CIO was noncommittal about next steps, it appears that something changed for the better.
- Palladium. The Palladium CIO attended a Symposium I hosted in March 2013, and while there he committed to implementing better controls at the University. The CIO appears to have followed through on this commitment.
- No Alias. Although it doesn’t appear in the graph, No Alias is an interesting story. It ranked very high in the 2012 study. NA, it turns out, provides exit connections for the Tor network, which means that some traffic that appears to originate at NA in fact originates from anonymous users elsewhere. Most of NA’s 2012 notices were associated with the Tor connections, and I suggested to NA’s security officer that perhaps No Alias might impose some modest filters on those. It appears that this may have happened, and may be why NA dropped out of the top group.
I also interacted with several other campuses that ranked high in 2013. In many of these conversations I was able to point IT staff to specific problems or opportunities, such as better configuring firewalls. Most of these campuses moved out of the top group.
The 2014 DMCA notice data reinforce earlier implications (from both data and direct interactions) for campus/industry interactions. Copyright holders should interact directly with the few institutions that rank consistently high, and with large residential institutions that rank consistently low. In addition, copyright holders should seek opportunities to better understand how best to influence student behavior, both during and after college.
Conversely, campuses that receive disproportionately many notices, and so give higher education a bad reputation with regard to copyright infringement, should consult peers at the other end of the distribution, and identify reasonable ways to improve their policies and practices.