Posts Tagged ‘“copyright infringement”’

Mythology, Belief, Analytics, & Behavior

MIT_Building_10_and_the_Great_Dome,_Cambridge_MAI’m at loose ends after graduating. The Dean for Student Affairs, whom I’ve gotten to know through a year of complicated political and educational advocacy, wants to know more about MIT‘s nascent pass/fail experiment, under which first-year students receive written rather than graded evaluations of their work.

MIT being MIT, “know more” means data: the Dean wants quantitative analysis of patterns in the evaluations. I’m hired to read a semester’s worth, assign each a “Usefulness” score and a “Positiveness” score, and then summarize the results statistically.

Two surprises. First, Usefulness turns out to be much higher than anyone had expected–mostly because evaluations contain lots of “here’s what you can do to improve” advice, rather than lots of terse “you would have gotten a B+” comments, as had been predicted. Second, Positiveness distributes remarkably as grades had for the pre-pass/fail cohort, rather than skewing higher, as had been predicted. Even so, many faculty continue to believe both predictions (that is, they think written evaluations are both generally useless and inappropriately positive).

20120502161716-1_0A byproduct of the assignment is my first exposure to MIT’s glass-house computer facility, an IBM 360 located in the then-new Building 39. In due course I learn that Jay Forrester, an MIT faculty member, had patented the use of 3-D arrays of magnetic cores for computer memory (the read-before-write use of cores, which enabled Forrester’s breakthrough, had been patented by An Wang, another faculty member, of the eponymous calculators and word processors). IBM bought Wang’s patent, but not Forrester’s, and after protracted legal action eventually settled with Forrester in 1964 for $13-million.

According to MIT mythology, under the Institute’s intellectual-property policy half of the settlement came to the Institute, and that money built Building 39. Only later do I wonder whether the Forrester/IBM/39 mythology is true. But not for long: never let truth stand in the way of a good story.

Not just because mythology often involves memorable, simple stories, belief in mythology is durable. This is important because belief so heavily drives behavior. That belief resists even data-driven contradiction–data analysis rarely yields memorable, simple stories–is one reason analytics so often prove curiously ineffective in modifying institutional behavior.

Two examples, both involving the messy question of copyright infringement by students and what, if anything, campuses should do about it.

44%

laurelI’m having lunch with a very smart, experienced, and impressive senior officer from an entertainment-industry association, whom I’ll call Stan. The only reason universities invest heavily in campus networks, Stan tells me, is to enable students to download and share ever more copyright-infringing movies, TV shows, and music. That’s why campuses remain major distributors of “pirated” entertainment, he says, and therefore why it’s appropriate to subject higher education generally to regulations and sanctions such as the “peer to peer” regulations from the 2008 Higher Education Opportunity Act.

That Stan believes this results partly from a rhetorical problem with high-performance networks, such as the research networks within and interconnecting colleges and universities. High-performance networks–even those used by broadcasters–usually are engineered to cope with peak loads. Since peaks are occasional, most of the time most network capacity goes unused. If one doesn’t understand this–as Stan doesn’t–then one assumes that the “unused” capacity is in fact being used, but for purposes not being disclosed.

And, as it happens, there’s mythology to fill in the gap: According to a 2005 MPAA study, Stan tells me, higher education accounts for almost half of all copyright infringement. So MPAA, and therefore Stan, knows what campuses aren’t telling us: they’re upgrading campus networks to enable infringement.

But Stan is wrong. There are two big problems with his belief.

MPAAFirst, shortly after MPAA asserted, both publicly and in letters to campus presidents, that 44% of all copyright infringement emanates from college campuses, which is where Stan’s “almost half” comes from, MPAA learned that its data contractor had made a huge arithmetic error. The correct estimate should have been more like 10-15%. But the corrected estimate was never publicized as extensively as the erroneous one: the errors that statisticians make live after them; the corrections are oft interred with their bones.

Second, if Stan’s belief is correct, then there should be little difference among campuses in the incidence of copyright infringement, at least among campuses with research-capable networking. Yet this isn’t the case. As I’ve found researching three years of data on the question, the distribution of detected infringement is highly skewed. Most campuses are responsible for little or no distribution of infringing material, presumably because they’re using Packetlogic, Palo Alto firewalls, or similar technologies to manage traffic. Conversely, a few campuses account for the lion’s share of detected infringement.

So there are ample data and analytics contradicting Stan’s belief, and none supporting it. But his belief persists, and colors how he engages the issues.

Targeting

imagesOKVW44NDI’m having dinner with the CIO from an eminent research university; I’ll call her Samantha, and her campus Helium (the same name it has in the infringement-data post I cited above). We’re having dinner just as I’m completing my 2013 study, in which Helium has surpassed Hydrogen as the largest campus distributor of copyright-infringing movies, TV shows, and music.

In fact, Helium accounts for 7% of all detected infringement from the 5,000 degree-granting colleges and universities in the United States. I’m thinking that Samantha will want to know this, that she will try to figure out what Helium is doing–or not doing–to stand out as such a sore thumb among peer campuses, and perhaps make some policy or practice changes to bring Helium into closer alignment.

But no: Samantha explains to me that the data are entirely inaccurate. Most of the infringement notices Helium receives are duplicates, she tells me, and in any case the only reason Helium receives so many is that the entertainment industry intentionally targets Helium in its detection and notification processes. Since the data are wrong, she says, there’s no need to change anything at Helium.

I offer to share detailed data with Helium’s network-security staff so that they can look more closely at the issue, but Samantha declines the offer. Nothing changes, and in 2014 Helium is again one of the top recipients of infringement notices (although Hydrogen regains the lead it had held in 2012).

The data Samantha declines to see tell an interesting story, though. The vast majority of Helium’s notices, it turns out, are associated with eight IP addresses. That is, each of those eight IP addresses is cited in hundreds of notices, which may account for Samantha’s comment about “duplicates”. Here’s what’s interesting: the eight addresses are consecutive, and they each account for about the same number of notices. That suggests technology at work, not individuals.

image0021083244899217As in Stan’s case, it helps to know something about how campus networks work. Lots of traffic distributed evenly across a small number of IP addresses sounds an awful lot like load balancing, so perhaps those addresses are the front end to some large group of users. “Front end to some large group of users” sounds like an internal network using Network Address Translation (NAT) for its external connections.

NAT issues numerous internal IP addresses to users, and then technologically translates those internal addresses traceably into a much smaller set of external addresses. Most campuses use NAT to conserve their limited allocation of external IP addresses, especially for their campus wireless networks. NAT logs, if kept properly, enable campuses to trace connections from insiders to outside and vice versa, and so to resolve those apparent “duplicates”.

So although it’s true that there are lots of duplicate IP addresses among the notices Helium receives, this probably stems from Helium’s use of NAT on its campus wireless. Helium’s data are not incorrect. If Helium were to manage NAT properly, it could figure out where the infringement is coming from, and address it.

Samantha’s belief that copyright holders target specific campuses, like Stan’s that campuses expand networks to encourage infringement, has a source–in this case, a presentation some years back from an industry association to a group of IT staff from a score of research universities. (I attended this session.) Back then, we learned, the association did target campuses, not out of animus, but simply as a data-collection mechanism. The association would choose a campus, look for infringing material being published from the campus’s network, send notices, and then move on to another campus.

utorrent-facebook-mark-850-transparentSince then, however, the industry had changed its methodology, in large part because the BitTorrent protocol replaced earlier ones as the principal medium for download-based infringement. Because of how BitTorrent works, the industry’s methodology shifted from searching particular networks to searching BitTorrent indexes for particularly popular titles and then seeing which networks were making those titles available.

I spent lots of time recently with the industry’s contractors looking closely at that methodology. It appears to treat campus networks equivalently to each other and to commercial networks, and so it’s unlikely that Helium was being targeted as Samantha asserted.

If Samantha had taken the infringement data to her security staff, they probably would have discovered the same thing I did, and either used NAT data to identify offenders, or perhaps to justify policy changes for the wireless network. Same goes for exploring the methodology. But instead Samantha relied on her belief that the data were incorrect and/or targeted

Promoting Analytic Effectiveness

Because of Stan’s and Samantha’s belief in mythology, their organizations’ behavior remains largely uninformed by analytics and data.

decision-treeA key tenet in decision analysis holds that information has no value (other than the intrinsic value of knowledge) unless the decisions an individual or an institution have before them will turn out differently depending on the information. That is, unless decisions depend on the results of data analysis, it’s not worth collecting or analyzing data.

Colleges, universities, and other academic institutions have difficulty accepting this, since the intrinsic value of information is central to their existence. But what’s valuable intrinsically isn’t necessarily valuable operationally.

Generic praise for “data-based decision making” or “analytics” won’t change this. Neither will post-hoc documentation that decisions are consistent with data. Rather, what we need are good, simple stories that will help mythology evolve: case studies of how colleges and universities have successfully and prospectively used data analysis to change their behavior for the better. Simply using data analysis doesn’t suffice, and neither does better behavior: we need stories that vividly connect the two.

Ironically, the best way to combat mythology is with–wait for it–mythology…

Revisiting IT Policy #2: Campus DMCA Notices

Under certain provisions from the Digital Millennium Copyright Act, copyright holders send a “notification of claimed infringement” (sometimes called a “DMCA” or “takedown” notice) to Internet service providers, such as college or university networks, when they find infringing material available from the provider’s network. I analyzed counts of infringement notices from the four principal senders to colleges and universities over three time periods (Nov 2011-Oct 2012, Feb/Mar 2013, and Feb/Mar 2014).

In all three periods, most campuses received no notices, even campuses with dormitories. Among campuses receiving notices, the distribution is highly skewed: a few campuses account for a disproportionately large fraction of the notices. Five campuses consistently top the distribution in each year, but beyond these there is substantial fluctuation from year to year.

The volume of notices sent to campuses varies somewhat positively with their size, although some important and interesting exceptions keep the correlation small. The incidence of detected infringement varies strongly with how residential campuses are. It varies less predictably with proxy measures of student-body affluence.

I elaborate on these points below.

Patterns

The estimated total number of notices for the twelve months ending October 2012 was 243,436. The actual number of notices in February/March 2013 was 39,753, and the corresponding number a year later was 20,278.

The general pattern was the same in each time period.

  • According to the federal Integrated Postsecondary Education Data Service (IPEDS), from which I obtained campus attributes, there are 4,904 degree-granting campuses in the United States. Of these, over 80% received no infringement notices in any of the three time periods.
  • 90% of infringement notices went to campuses with dormitories.
  • Of the 801 institutions that received at least one notice in one period, 607 received at least one notice in two periods, and 437 did so in all three. The distribution was highly skewed among the campuses that received at least one infringement notice. The top two recipients in each period were the same: they alone accounted for 12% of all notices in 2012, and 10% in 2013 and 2014.
  • In 2012, 10 institutions accounted for a third of all notices, and 41 accounted for two thirds. In 2013, the distribution was only a little less skewed: 22 institutions accounted for a third of all notices, and 94 accounted for two thirds. In 2014, 22 institutions also accounted for a third of all notices, and 99 accounted for two thirds.

Campus Type

In 2014, just 590 of the 4,904 campuses received infringement notices in 2014. Here is a breakdown by institutional control and type:

Capture

Here are the same data, this time broken down by campus size and residential character (using dormitory beds per enrolled student to measure the latter; the categories are quintiles):

Capture2

About a third of all notices went to very large campuses in the middle residential quintile. In keeping with the classic Pareto ratio, the largest 20% of campuses account for 80% of all notices (and enroll ¾ of all students). Although about half of the largest group is nonresidential (mostly community colleges, plus some state colleges), only a few of them received notices.

Campus Distributions

The top two among the 100 campuses that received the most notices in Feb/Mar 2014 received over 1,000 notices each in the two months. The next highest campus received 615. As the graph below shows, the top 100 campuses accounted for two thirds of the notices; the next 600 campuses accounted for the remaining third (click on this graph, or the others below, to see it full size):

image001

Below is a more detailed distribution for the top 30 recipient campuses, with comparisons to 2012 and 2013 data. To enable valid comparison, this chart shows the fraction of notices received by each campus in each year, rather than the total. The solid red bars are the campus’s 2014 share, and the lighter blue and green bars are the 2012 and 2013 shares. The hollow bar for each campus is the incidence of detected infringement, defined as the number of 2014 notices per thousand headcount students.

image003

As in earlier analyses, there is an important distinction between campuses whose high volume of notices stems largely from their size, and those where it stems from a combination of size and incidence—that is, the ratio of notices received to enrollment.

In the graph, Carbon and Nitrogen are examples of the former: they are both very large public urban universities enrolling over 50,000 students, but with relatively low incidence of around 7 notices per thousand students. They stand in marked contrast to incidences of 20-60 notices per thousand students at Lithium, Boron, Neon, Magnesium, Aluminum, and Silicon, each of which enrolls 10-25,000 students—all private except Aluminum.

Changes over Time

The overall volume of infringement notices varies from time to time depending on how much effort copyright holders devote to searching for infringement (effort costs money), and to a lesser extent based on which titles they use to seed searches. The volume of notices sent to campuses varies accordingly. However, the distribution of notices across campuses should not be affected by the total volume. To analyze trends, therefore, it is important to use a metric independent of total volume.

As in the preceding section, I used the fraction of all campus notices each campus received for each period. The top two campuses were the same in all three years: Hydrogen was highest in 2012 and 2014, and Helium was highest in 2013.

Only five campuses received at least 1.5% of all notices in more than one year:

image005

These campuses consistently stand at the top of the list, account for a substantial fraction of all infringement notices, and except for Beryllium have incidence over 20. As I argue below, it makes sense for copyright holders to engage them directly, to help them understand how different they are from their peers, and perhaps to persuade them to better “effectively combat” infringement from their networks by adopting policies and practices from their low-incidence peers.

Aside from these five campuses, there is great year-to-year variation in how many notices campuses receive. Below, for example, is a similar graph for the approximately 50 campuses receiving 0.5%-1.5% of all notices in at least one of the three years. Such year-to-year variation makes engagement much more difficult to target efficiently and much less likely to have discernible effects.

image007

Relationships

Size

All else equal, if infringement is the same across campuses and campuses take equally effective measures to prevent it from reaching the Internet, then the volume of detected infringement should generally vary with campus size. That this is only moderately the case implies that student behavior varies from campus to campus and/or that campuses’ “effectively combat” measures are different and have different effects.

Here are data for the 100 campuses receiving the most infringement notices in 2014:

image009

It appears visually that the overall correlation between campus size and notice volume is modest (and indeed r=0.29) because such a large volume of notices went to Hydrogen and Helium, which are not the largest campuses.

However, the correlation is slightly lower if those two campuses are omitted. This is because Lithium has the next highest volume, yet is of average size, and Manganese, the largest campus in the group, with over 70,000 students, had very low incidence of 2 notices per thousand students. (I’ve spoken at length with the CIO and network-security head at Manganese, and learned that its anti-infringement measures comprise a full array of policies and practices: blocking of peer-to-peer protocols at the campus border, with well-established exception procedures; active followthrough on infringement notices received; and direct outreach to students on the issue.)

Residence

If students live on campus, then typically their network connection is through the campus network, their detectable infringement is attributed to the campus, and that’s where the infringement notice goes. If students live off campus, then they do not use the campus network, and infringement notices go to their ISP. This is why most infringement notices go to campuses with dorms, even though the behavior of their students probably resembles that of their nonresidential peers.

For the same reason, we might expect that residentially intensive campuses (measured by the ratio of dormitory beds to total enrollment) would have a higher incidence of detectable infringement, all else equal, than less residential campuses. Here are data for the 100 campuses receiving the most infringement notices:

image011

The relationship is positive, as expected, and relatively strong (r=.58). It’s important, though, to remember that this relationship between campus attributes (residential intensity and the incidence of detected infringement) does not necessarily imply a relationship between student attributes such as living in dorms and distributing infringing material. Drawing inferences about individuals from data about groups is the “ecological fallacy.”

Affluence

One hears arguments that infringement varies with affluence, that is, that students with less money are more likely to infringe. There’s no way to assess that directly with these data, since they do not identify individuals. However, IPEDS campus data include the fraction of students receiving Federal grant aid, which varies inversely with income. The higher this fraction, the less affluent, on average, the student body should be. So it’s interesting to see how infringement (measured by incidence rather than volume) varies with this metric:

image013

The relationship is slightly negative (r=-.12), in large part because of Polonium, a small private college with few financial-aid recipients that received 83 notices per 1000 students in 2014. (Its incidence was similar in 2012, but much lower in 2013.) Even without Polonium, however, the relationship is small.

For the same reason, we might expect a greater incidence of detected infringement on less expensive campuses. The data:

image015

Once again the relationship is the opposite (r=.54), largely because most campuses have both low tuition and low incidence.

Campus Interactions

Following the 2012 and 2013 studies, I communicated directly with IT leaders at several campuses with especially high volumes of infringement notices. All save one (Hydrogen) of these interactions were informative, and several appear to have influenced campus policies and practices for the better.

  • Helium. Almost all of Helium’s notices are associated with a small, consecutive group of IP addresses, presumably the external addresses for a NAT-mediated campus wireless network. I learned from discussions with Helium’s CIO that the university does not retain NAT logs long enough to identify wireless users when infringement notices are received; as a result, few infringement notices reach offenders, and so they have little impact directly or indirectly. Helium apparently understands and recognizes the problem, but replacing its wireless logging systems is not a high priority project.
  • Hydrogen. Despite diverse direct, indirect, and political efforts to engage IT leaders at Hydrogen, I was never able to open discussions with them. I do not understand why the university receives so many notices (unlike Helium’s, they are not concentrated), and was therefore unable to provide advice to the campus. It is also unclear whether the notices sent to Hydrogen are associated with its small-city main campus or with its more urban branch campus.
  • Krypton. Krypton used to provide guests up to 14 days of totally unrestricted and anonymous use of its wired and wireless networks. I believe that this led to its high rate of detected infringement. More recently, Krypton implemented a separate guest wireless network, which is still anonymous but apparently is either more restricted or is routed to an external ISP. I believe that this change is why Krypton is no longer in the top 20 group in 2014. (Krypton still offers unrestricted 14-day access to its wired network.)
  • Lithium. The network-security staff at Lithium told me that there are plans to implement better filtering and blocking on their network, but that implementation has been delayed.
  • Nitrogen. Nitrogen enrolls over 50,000 students, more than almost any other campus. As I pointed out above, although Nitrogen’s infringement notice counts are substantial, they are actually relatively low when adjusted for enrollment.
  • Gallium. I discussed Gallium’s high infringement volume with its CIO in early 2013. She appeared to be surprised that the counts were so high, and that they were not all associated with Gallium affiliate campuses, as the university had previously believed. Although the CIO was noncommittal about next steps, it appears that something changed for the better.
  • Palladium. The Palladium CIO attended a Symposium I hosted in March 2013, and while there he committed to implementing better controls at the University. The CIO appears to have followed through on this commitment.
  • No Alias. Although it doesn’t appear in the graph, No Alias is an interesting story. It ranked very high in the 2012 study. NA, it turns out, provides exit connections for the Tor network, which means that some traffic that appears to originate at NA in fact originates from anonymous users elsewhere. Most of NA’s 2012 notices were associated with the Tor connections, and I suggested to NA’s security officer that perhaps No Alias might impose some modest filters on those. It appears that this may have happened, and may be why NA dropped out of the top group.

I also interacted with several other campuses that ranked high in 2013. In many of these conversations I was able to point IT staff to specific problems or opportunities, such as better configuring firewalls. Most of these campuses moved out of the top group.

And So…

The 2014 DMCA notice data reinforce earlier implications (from both data and direct interactions) for campus/industry interactions. Copyright holders should interact directly with the few institutions that rank consistently high, and with large residential institutions that rank consistently low. In addition, copyright holders should seek opportunities to better understand how best to influence student behavior, both during and after college.

Conversely, campuses that receive disproportionately many notices, and so give higher education a bad reputation with regard to copyright infringement, should consult peers at the other end of the distribution, and identify reasonable ways to improve their policies and practices.

9|4|14 gj-c

 

Notes From (or is it To?) the Dark Side

“Why are you at NBC?,” people ask. “What are you doing over there?,” too, and “Is it different on the dark side?” A year into the gig seems a good time to think about those. Especially that “dark side” metaphor.  For example, which side is “dark”?

This is a longer-than-usual post. I’ll take up the questions in order: first Why, then What, then Different; use the links to skip ahead if you prefer.

Why are you at NBC?

5675955This is the first time I’ve worked at a for-profit company since, let’s see, the summer of 1967: an MIT alumnus arranged an undergraduate summer job at Honeywell‘s Mexico City facility. Part of that summer I learned a great deal about the configuration and construction of custom control panels, especially for big production lines. I think of this every time I see photos of big control panels, such as those at older nuclear plants—I recognize the switch types, those square toggle buttons that light up. (Another part of the summer, after the guy who hired me left and no one could figure out what I should do, I made a 43½-foot paper-clip chain.)

One nice Honeywell perk was an employee discount on a Pentax 35mm SLR with a 40mm and 135mm lenses, which I still have in a box somewhere, and which still works when I replace the camera’s light-meter battery. (The Pentax brand belonged to Honeywell back then, not Ricoh.) Excellent camera, served me well for years, through two darkrooms and a lot of Tri-X film. I haven’t used it since I began taking digital photos, though.

5499942818_d3d9e9929b_nI digress. Except, it strikes me, not really. One interesting thing about digital photos, especially if you store them online and make most of them publicly visible (like this one, taken on the rim of spectacular Bryce Canyon, from my Backdrops collection), is that sometimes the people who find your pictures download them and use them for their own purposes. My photos carry a Creative Commons license specifying that although they are my intellectual property, they can be used for nonprofit purposes so long as they are attributed to me (an option not available, apparently, if I post them on Facebook instead).

So long as those who use my photos comply with the CC license requirement, I don’t require that they tell me, although now and then they do. But if people want to use one of my photos commercially, they’re supposed to ask my permission, and I can ask for a use fee. No one has done that for me—I’m keeping the day job—but it’s happened for our son.

dmcaI hadn’t thought much about copyright, permissions, and licensing for personal photos (as opposed to archival, commercial, or institutional ones) back when I first began dealing with “takedown notices” sent to the University of Chicago under the Digital Millennium Copyright Act (DMCA). There didn’t seem to be much of a parallel between commercialized intellectual property, like the music tracks that accounted for most early DMCA notices, and my photos, which I was putting online mostly because it was fun to share them.

Neither did I think about either photos or music while serving on a faculty committee rewriting the University’s Statute 18, the provision governing patents in the University’s founding documents.

sealThe issues for the committee were fundamentally two, both driven somewhat by the evolution of “textbooks”.

First, where is the line between faculty inventions, which belong to the University (or did at the time), and creations, which belong to creators—between patentable inventions and copyrightable creations, in other words? This was an issue because textbooks had always been treated as creations, but many textbooks had come to include software (back then, CDs tucked into the back cover), and software had always been treated as an invention.

Second, who owns intellectual property that grows out of the instructional process? Traditionally, the rights and revenues associated with textbooks, even textbooks based on University classes, belonged entirely to faculty members. But some faculty members were extrapolating this tradition to cover other class-based material, such as videos of lectures. They were personally selling those materials and the associated rights to outside entities, some of which were in effect competitors (in some cases, they were other universities!).

fathomAs you can see by reading the current Statute 18, the faculty committee really didn’t resolve any of this. Gradually, though, it came to be understood  that textbooks, even textbooks including software, were still faculty intellectual property, whereas instructional material other than that explicitly included in traditional textbooks was the University’s to exploit, sell, or license.

With the latter well established, the University joined Fathom, one of the early efforts to commercialize online instructional material, and put together some excellent online materials. Unfortunately, Fathom, like its first-generation peers, failed to generate revenues exceeding its costs. Once it blew through its venture capital, which had mostly come from Columbia University, Fathom folded. (Poetic justice: so did one of the profit-making institutions whose use of University teaching materials prompted the Statute 18 review.)

Gradually this all got me interested in the thicket of issues surrounding campus online distribution and use of copyrighted materials and other intellectual property, and especially the messy question how campuses should think about copyright infringement occurring within and distributed from their networks. The DMCA had established the dual principles that (a) network operators, including campuses, could be held liable for infringement by their network users, but (b) they could escape this liability (find “safe harbor”) by responding appropriately to complaints from copyright holders. Several of us research-university CIOs worked together to develop efficient mechanisms for handling and responding to DMCA notices, and to help the industry understand those and the limits on what they might expect campuses to do.

heoaAs one byproduct of that, I found myself testifying before a Congressional committee. As another, I found myself negotiating with the entertainment industry, under US Education Department auspices, to develop regulations implementing the so-called “peer to peer” provisions of the Higher Education Opportunity Act of 2008.

That was one of several threads that led to my joining EDUCAUSE in 2009. One of several initiatives in the Policy group was to build better, more open communications between higher education and the entertainment industry with regard to copyright infringement, DMCA, and the HEOA requirements.

hero-logo-edxI didn’t think at the time about how this might interact with EDUCAUSE’s then-parallel efforts to illuminate policy issues around online and nontraditional education, but there are important relevancies. Through massively open online courses (MOOCs) and other mechanisms, colleges and universities are using the Internet to reach distant students, first to build awareness (in which case it’s okay for what they provide to be freely available) but eventually to find new revenues, that is, to monetize their intellectual property (in which case it isn’t).

music-industryIf online campus content is to be sold rather than given away, then campuses face the same issues as the entertainment industry: They must protect their content from those who would use it without permission, and take appropriate action to deter or address infringement.

Campuses are generally happy to make their research freely available (except perhaps for inventions), as UChicago’s Statute 18 makes clear, provided that researchers are properly credited. (I also served on UChicago’s faculty Intellectual Property Committee, which among other things adjudicated who-gets-credit conflicts among faculty and other researchers.) But instruction is another matter altogether. If campuses don’t take this seriously, I’m afraid, then as goes music, so goes online higher education.

Much as campus tumult and changes in the late Sixties led me to abandon engineering for policy analysis, and quantitative policy analysis led me into large-scale data analysis, and large-scale data analysis led me into IT, and IT led me back into policy analysis, intellectual-property issues led me to NBCUniversal.

Peacock_CleanupI’d liked the people I met during the HEOA negotiations, and the company seemed seriously committed to rethinking its relationships with higher education. I thought it would be interesting, at this stage in my career, to do something very different in a different kind of place. Plus, less travel (see screwup #3 in my 2007 EDUCAUSE award address).

So here I am, with an office amidst lobbyists and others who focus on legislation and regulation, with a Peacock ID card that gets me into the Universal lot, WRC-TV, and 30 Rock (but not SNL), and with a 401k instead of a 403b.

What are you doing over there?

NBCUniversal’s goals for higher education are relatively simple. First, it would like students to use legitimate sources to get online content more, and illegitimate “pirate” sources less. Second, it would like campuses to reduce the volume of infringing material made available from their networks to illegal downloaders worldwide.

477px-CopyrightpiratesMy roles are also two. First, there’s eagerness among my colleagues (and their counterparts in other studios) to better understand higher education, and how campuses might think about issues and initiatives. Second, the company clearly wants to change its approach to higher education, but doesn’t know what approaches might make sense. Apparently I can help with both.

To lay foundation for specific projects—five so far, which I’ll describe briefly below—I looked at data from DMCA takedown notices.

Curiously, it turned out, no one had done much to analyze detected infringement from campus networks (as measured by DMCA notices sent to them), or to delve into the ethical puzzle: Why do students behave one way with regard to misappropriating music, movies, and TV shows, and very different ways with regard to arguably similar options such as shoplifting or plagiarism? I’ve written about some of the underlying policy issues in Story of S, but here I decided to focus first on detected infringement.

riaa-logoIt turns out that virtually all takedown notices for music are sent by the Recording Industry Association of America, RIAA (the Zappa Trust and various other entities send some, but they’re a drop in the bucket).

MPAAMost takedown notices for movies and some for TV are sent by the Motion Picture Association of America, MPAA, on behalf of major studios (again, with some smaller entities such as Lucasfilm wading in separately). NBCUniversal and Fox send out notices involving their movies and TV shows.

sources chartI’ve now analyzed data from the major senders for both a twelve-month period (Nov 2011-Oct 2012) and a more recent two-month period (Feb-Mar 2013). For the more recent period, I obtained very detailed data on each of 40,000 or so notices sent to campuses. Here are some observations from the data:

  • Almost all the notices went to 4-year campuses that have at least 100 dormitory beds (according to IPEDS). To a modest extent, the bigger the campus the more notices, but the correlation isn’t especially large.
  • Over half of all campuses—even of campuses with dorms—didn’t get any notices. To some extent this is because there are lots and lots of very small campuses, and they fly under the infringement-detection radar. But I’ve learned from talking to a fair number of campuses that, much to my surprise, many heavily filter or even block peer-to-peer traffic at their commodity Internet border firewall—usually because the commodity bandwidth p2p uses is expensive, especially for movies, rather than to deal with infringement per se. Outsourced dorm networks also have an effect, but I don’t think they’re sufficiently widespread yet to explain the data.
  • Several campuses have out-of-date or incorrect “DMCA agent” addresses registered at the Library of Congress. Compounding that, it turns out some notice senders use “abuse” or other standard DNS addresses rather than the registered agent addresses.
  • Among campuses that received notices, a few campuses stand out for receiving the lion’s share, even adjusting for their enrollment. For example, the top 100 or so recipient campuses got about three quarters of the total, and a handful of campuses stand out sharply even within that group: the top three campuses (the leftmost blue bars in the graph below) accounted for well over 10% of the notices. (I found the same skewness in the 2012 study.) With a few interesting exceptions (interesting because I know or suspect what changed), the high-notice groups have been the same for the two periods.

utorrent-facebook-mark-850-transparentThe detection process, in general, is that copyright holders choose a list of music, movie, or TV titles they believe likely to be infringed. Their contractors then use BitTorrent tracker sites and other user tools to find illicit sources for those titles.

For the most part the studios and associations simply look for titles that are currently popular in theaters or from legitimate sources. It’s hard to see that process introducing a bias that would affect some campuses so much differently than others. I’ve also spent considerable time looking at how a couple of contractors verify that titles being offered illicitly (that is, listed for download on a BitTorrent tracker site such as The Pirate Bay) are actually the titles being supplied (rather than, say, malware, advertising, or porn), and at how they figure out where to send the resulting takedown notices. That process too seems pretty straightforward and unbiased.

argo-15355-1920x1200Sender choices clearly can influence how notice counts vary from time to time: for example, adding a newly popular title to the search list can lead to a jump in detections and hence notices. But it’s hard to see how the choice of titles would influence how notice counts vary from institution to institution.

This all leads me to believe that takedown notices tell us something incomplete but useful about campus policies and practices, especially at the extremes. The analysis led directly to two projects focused on specific groups of campuses, and indirectly to three others.

Role Model Campuses

Based on the results of the data analysis, I communicated individually with CIOs at 22 campuses that received some but relatively few notices: specifically, campuses that (a) received at least one notice (and so are on the radar) but (b) fewer than 300 and fewer than 20 per thousand student headcount, (c) have at least 7,500 headcount students, and (d) have at least 10,000 dorm beds (per IPEDS) or sufficient dorm beds to house half your headcount. (These are Group 4, the purple bars in the graph below. The solid bars represent total notices sent, and the hollow bars represent incidence, or notices per thousand headcount students. Click on the graph to see it larger.)

I’ve asked each of those campuses whether they’d be willing to document their practices in an open “role models” database developed jointly by the campuses and hosted by a third party such as groups charta higher-education association (as EDUCAUSE did after the HEOA regulations took effect). The idea is to make a collection of diverse effective practices available to other campuses that might want to enhance their practices.

High Volume Campuses

Separately, I communicated privately with CIOs at 13 campuses that received exceptionally many notices, even adjusting for their enrollment (Group 1, the blue bars in the graph). I’ve looked in some detail at the data for those campuses, some large and some small, and in some cases that’s led to suggestions.

For example, in a few cases I discovered that virtually all of a high-volume campus’s notices were split evenly among a small number of consecutive IP addresses. In those cases, I’ve suggested that those IP addresses might be the front-end to something like a campus wireless network. Filtering or blocking p2p (or just BitTorrent) traffic on those few IP addresses (or the associated network devices) might well shrink the campus’s role as a distributor without affecting legitimate p2p or BitTorrent users (who tend to be managing servers with static addresses).

Symposia

Back when I was at EDUCAUSE, we worked with NBCUniversal to host a DC meeting between senior campus staff from a score of campuses nationwide and some industry staff closely involved with the detection and notification for online infringement. The meeting was energetic and frank, and participants from both sides went away with a better sense of the other’s bona fides and seriousness. This was the first time campus staff had gotten a close look at the takedown-notice process since a Common Solutions Group meeting in Ann Arbor some years earlier; back then the industry’s practices were much less refined.

university-st-thomas-logo-white croppedBased on the NBCUniversal/EDUCAUSE experience, we’re organizing a series of regional “Symposia” along these lines on campuses in various cities across the US. The objectives are to open new lines of communication and to build trust. The invitees are IT and student-affairs staff from local campuses, plus several representatives from industry, especially the groups that actually search for infringement on the Internet. The first was in New York, the second in Minneapolis, the third will be in Philadelphia, and others will follow in the West, the South, and elsewhere in the Midwest.

Research

We’re funding a study within a major state university system to gather two kinds of data. Initially the researchers are asking each campus to describe the measures it takes to “effectively combat” copyright infringement: its communications with students, its policies for dealing with violations, and the technologies it uses. The data from the first phase will help enhance a matrix we’ve drafted outlining the different approaches taken by different campuses, complementing what will emerge from the “role models” project.

Based on the initial data, the researchers and NBCUniversal will choose two campuses to participate in the pilot phase of the Campus Online Education Initiative (which I’ll describe next). In advance of that pilot, the researchers will gather data from a sample of students on each campus, asking about their attitudes toward and use of illicit and legitimate online sources for music, movies, and video. They’ll then repeat that data collection after the pilot term.

Campus Online Entertainment Initiative

Last but least in neither ambition nor complexity, we’re crafting a program that will attempt to address both goals I listed earlier: encouraging campuses to take effective steps to reduce distribution of infringing material from their networks, and helping students to appreciate (and eventually prefer) legitimate sources for online entertainment.

maxresdefaultWorking with Universal Studios and some of its peers, we’ll encourage students on participating campuses to use legitimate sources by making a wealth of material available coherently and attractively—through a single source that works across diverse devices, and at a substantial discount or with similar incentives.

Participating campuses, in turn, will maintain or implement policies and practices likely to shrink the volume of infringing material available from their networks. In some cases the participating campuses will already be like those in the “role models” group; in others they’ll be “high volume” or other campuses willing to  adopt more effective practices.

I’m managing these projects from NBCUniversal’s Washington offices, but with substantial collaboration from company colleagues here, in Los Angeles, and in New York; from Comcast colleagues in Philadelphia; and from people in other companies. Interestingly, and to my surprise, pulling this all together has been much like managing projects at a research university. That’s a good segue to the next question.

Is it different on the dark side?

IMG_1224Newly hired, I go out to WRC, the local NBC affiliate in Washington, to get my NBCUniversal ID and to go through HR orientation. Initially it’s all familiar: the same ID photo technology, the same RFID keycard, the same ugly tile and paint on the hallways, the same tax forms to be completed by hand.

But wait: Employee Relations is next door to the (now defunct) Chris Matthews Show. And the benefits part of orientation is a video hosted by Jimmy Fallon and Brian Williams. And there’s the possibility of something called a “bonus”, whatever that is.

Around my new office, in a spiffy modern building at 300 New Jersey Avenue, everyone seems to have two screens. That’s just as it was in higher-education IT. But wait: here one of them is a TV. People watch TV all day as they work.

Toto, we’re not in higher education any more.

IMG_1274It’s different over here, and not just because there’s a beautiful view of the Capitol from our conference rooms. Certain organizational functions seem to work better, perhaps because they should and in the corporate environment can be implemented by decree: HR processes, a good unified travel arrangement and expense system, catering, office management. Others don’t: there’s something slightly out of date about the office IT, especially the central/individual balance and security, and there’s an awful lot of paper.

Some things are just different, rather than better or not: the culture is heavily oriented to face-to-face and telephone interaction, even though it’s a widely distributed organization where most people are at their desks most of the time. There’s remarkably little email, and surprisingly little use of workstation-based videoconferencing. People dress a bit differently (a maitre d’ told me, “that’s not a Washington tie”).

But differences notwithstanding, mostly things feel much the same as they did at EDUCAUSE, UChicago, and MIT.

tiny NBCUniversal_violet_1030Where I work is generally happy, people talk to one another, gossip a bit, have pizza on Thursdays, complain about the quality of coffee, and are in and out a lot. It’s not an operational group, and so there’s not the bustle that comes with that, but it’s definitely busy (especially with everyone around me working on the Comcast/Time Warner merger). The place is teamly, in that people work with one another based on what’s right substantively, and rarely appeal to authority to reach decisions. Who trusts whom seems at least as important as who outranks whom, or whose boss is more powerful. Conversely, it’s often hard to figure out exactly how to get something done, and lots of effort goes into following interpersonal networks. That’s all very familiar.

MIT_Building_10_and_the_Great_Dome,_Cambridge_MAI’d never realized how much like a research university a modern corporation can be. Where I work is NBCUniversal, which is the overarching corporate umbrella (“Old Main”, “Mass Hall”, “Building 10”, “California Hall”, “Boulder”) for 18 other companies including news, entertainment, Universal Studios, theme parks, the Golf Channel, Telemundo (which are remarkably like schools and departments in their varied autonomy).

Meanwhile NBCUniversal is owned by Comcast—think “System Central Office”. Sure, these are all corporate entities, and they have concrete metrics by which to measure success: revenue, profit, subscribers, viewership, market share. But the relationships among organizations, activities, and outcomes aren’t as coherent and unitary as I’d expected.

Dark or Green?

So, am I on the dark side, or have I left it behind for greener pastures? Curiously, I hear both from my friends and colleagues in higher education: Some of them think my move is interesting and logical, some think it odd and disappointing. Curioser still, I hear both from my new colleagues in the industry: Some think I was lucky to have worked all those decades in higher education, while others think I’m lucky to have escaped. None of those views seems quite right, and none seems quite wrong.

The point, I suppose, is that simple judgments like “dark” and “greener” underrepresent the complexity of organizational and individual value, effectiveness, and life. Broad-brush characterizations, especially characterizations embodying the ecological fallacy, “…the impulse to apply group or societal level characteristics onto individuals within that group,” do none of us any good.

It’s so easy to fall into the ecological-fallacy trap; so important, if we’re to make collective progress, not to.

Comments or questions? Write me: greg@gjackson.us

(The quote is from Charles Ess & Fay Sudweeks, Culture, technology, communication: towards an intercultural global village, SUNY Press 2001, p 90. Everything in this post, and for that matter all my posts, represents my own views, not those of my current or past employers, or of anyone else.)

3|5|2014 11:44a est

Perceived Truths as Policy Paradoxes

imagesThe quote I was going to use to introduce this topic — “You’re entitled to your own opinion, but not to your own facts” — itself illustrates my theme for today: that truths are often less than well founded, and so can turn policy discussions weird.

I’d always heard the quote attributed to Pat Moynihan, an influential sociologist who co-wrote Beyond the Melting Pot with Nathan Glazer, directed the MIT-Harvard Joint Center for Urban Studies shortly before I worked there (and left behind a closet full of Scotch, which stemmed from his perhaps apocryphal rule that no meeting extend beyond 4pm without a bottle on the table), and later served as a widely respected Senator from New York. The collective viziers of Wikipedia have found other attributions for the quote, however. (This has me once again looking for the source of “There go my people, I must go join them, for I am their leader,” supposedly Mahatma Gandhi but apparently some French general — but I digress.). The quote will need to stand on its own.

a0157b7d-9976-410d-bba8-6ccf1dbf4c48-The-ACT-Here’s the Scott Jaschik item from Inside Higher Education that triggered today’s Rumination:

A new survey from ACT shows the continued gap between those who teach in high school and those who teach in college when it comes to their perceptions of the college preparation of today’s students. Nearly 90 percent of high school teachers told ACT that their students are either “well” or “very well” prepared for college-level work in their subject area after leaving their courses. But only 26 percent of college instructors reported that their incoming students are either “well” or “very well” prepared for first-year credit-bearing courses in their subject area. The percentages are virtually unchanged from a similar survey in 2009.

This is precisely what Moynihan (or whoever) had in mind: two parties to an important discussion each bearing their own data, and therefore unable to agree on the problem or how to address it. The teachers presumably think the professors have unreasonable expectations, or don’t work very hard to bring their students along; the professors presumably think the teachers aren’t doing their job. Each side therefore believes the problem lies on the other, and has data to prove that. Collaboration is unlikely, progress ditto. This is what Moynihan had observed about the federal social policy process.

5-financial-aid-tips-1The ACT survey reminded me of a similar finding that emerged back when I was doing college-choice research. I can’t locate a citation, but I recall hearing about a study that surveyed students who had been admitted to several different colleges.

The clever wrinkle in the study was that the students received several different survey queries, each purporting to be from one of the colleges to which he or she had been admitted, and each asking the student about the reasons for accepting or declining the admission offer. Here’s what they found: students told the institution they’d accepted that the reason was excellent academic quality, but they told the institutions they’d declined that the reason was better financial aid from the one they’d accepted.

131More recently, I was talking to a colleague in a another media company who was concerned about the volume of copyright infringement on a local campus. According to the company, the campus was hosting a great deal of copyright infringementl, as measured by the volume of requests for infringing material being sent out by BitTorrent. But according to the campus, a scan of the campus network identified very few hosts running the peer-to-peer applications. The colleague thought the campus was blowing smoke, the campus thought the company’s statistics were wrong.

Although these three examples seem similar — parties disagreeing about facts — in fact they’re a bit different.

  • In the teacher/professor example, the different conclusions presumably stem from different (and unshared) definitions of “”prepared for college-level work”.
  • In the accepted/decline example, the different explanations possibly stem from students’ not wanting to offend the declined institution by questioning its quality, or wanting think of their actual choice as good rather than cheap.
  • In the infringement/application case, the different explanations stem from divergent metrics.

compass-badgeWe’ve seen similar issues arise around institutional attributes in higher education. Do ratings like those from US News & World Report gather their own data, for example, or rely on presumably neutral sources such as the National Center for Educational Statistics? This is critical where results have major reputational effects — consider George Washington University’s inflation of class-rank admissions data, and similar earlier issues with Claremont McKenna, Emory, Villanova, and others.

I’d been thinking about this because in my current job it’s quite important to understand patterns of copyright infringement on campuses. It would be good to figure out which campuses seem to have relatively low infringement rates, and to explore and document their policies and practices lest other campuses might benefit. For somewhat different reasons, it would be good to figure out which campuses seem to have relatively high infringement rates, so that they could be encouraged adopt different policies and practices.

But here we run into the accept/decline problem. If the point to data collection is to identify and celebrate effective practice, there are lots of incentives for campuses to participate. But if the point is to identify and pressure less effective campuses, the incentives are otherwise.

Compounding the problem, there are different ways to measure the problem:

  • One can rely on externally generated complaints, whose volume can vary for reasons having nothing to do with the volume of infringement,
  • one can rely on internal assessments of network traffic, which can be inadvertently selective, and/or
  • one can rely on external measures such as the volume of queries to known sources of infringement;

I’m sure there are others — and that’s without getting into the religious wars about copyright, middlemen, and so forth I addressed in an earlier post).

There’s no full solution to this problem. But there are two things that help: collaboration and openness.

  • By “collaboration,” I mean that parties to questions of policy or practice should work together to define and ideally collect data; that way, arguments can focus on substance.
  • By “openness,” I mean that wherever possible raw data, perhaps anonymized, should accompany analysis and advocacy based on those data.

As an example what this means, here are some thoughts for one of my upcoming challenges — figuring out how to identify campuses that might be models for others to follow, and also campuses that should probably follow them. Achieving this is important, but improperly done it can easily come to resemble the “top 25” lists from RIAA and MPAA that became so controversial and counterproductive a few years ago. The “top 25” lists became controversial partly because their methodology was suspect, partly because the underlying data were never available, and partly because they ignored the other end of the continuum, that is, institutions that had somehow managed to elicit very few Digital Millennium Copyright Act (DMCA) notices.

PirateBay_1_NETT_26916dIt’s clear there are various sources of data, even without internal access to campus network data:

  • counts of DMCA notices sent by various copyright holders (some of which send notices methodically, following reasonably robust and consistent procedures, and some of which don’t),
  • counts of queries involving major infringing sites, and/or
  • network volume measures for major infringing protocols.

Those last two yield voluminous data, and so usually require sampling or data reduction of some kind. And not all queries or protocols they follow involve infringement. It’s also clear, from earlier studies, that there’s substantial variation in these counts over time and even across similar campuses.

This means it will be important for my database, if I can create one, to include several different measures, especially counts from different sources for different materials, and to do that over a reasonable period of time. Integrating all this into a single dataset will require lots of collaboration among the providers. Moreover, the raw data necessarily will identify individual institutions, and releasing them that way would probably cause more opposition than support. Clumping them all together would bypass that problem, but also cover up important variation. So it makes much more sense to disguise rather than clump — that is, to identify institutions by a code name and enough attributes to describe them but not to identify them.

It’ll then be important to be transparent: to lay out the detailed methodology used to “rank” campuses (as, for example, US News now does), and to share the disguised data so others can try different methodologies.

big_dataAt a more general level, what I draw from the various examples is this: If organizations are to set policy and frame practice based on data — to become “data-driven organizations,” in the current parlance — then they must put serious effort into the source, quality, and accessibility of data. That’s especially true for “big data,” even though many current “big data” advocates wrongly believe that volume somehow compensates for quality.

If we’re going to have productive debates about policy and practice in connection with copyright infringment or anything else, we need to listen to Moynihan: To have our own opinions, but to share our data.