Recently, I came across this piece on Academia.edu that dovetailed nicely with a lot of things that I’ve been thinking about lately.
A friend asked what we are supposed to do about it, and since I couldn’t come up with a pithy, sarcastic facebook status, here’s my long-winded answer. I should note that I am in no way an expert on any of this, and that there is a ton of better information out there if you’re willing to look for it. This is likely going to be the first post in several related to this topic.
‘Should This Be the Last Thing You Read on Academia.edu?’
The piece is worth reading in is entirety, but essentially, it demonstrates that posting research on Academia.edu is not the equivalent of open-access, particularly because (surprise) Academia.edu is a company. The business model of Academia.edu is not unique, but it’s also not something that I think a lot of academics have paid much attention to:
“The goal is to provide trending research data to R&D institutions that can improve the quality of their decisions by 10-20%. The kind of algorithm that R&D companies are looking for is a ‘trending papers’ algorithm, analogous to Twitter’s trending topics algorithm. A trending papers algorithm would tell an R&D company which are the most impactful papers in a given research area in the last 24 hours, 7 days, 30 days, or any time period. Historically it’s been very difficult to get this kind of data. Scientists have printed papers out, and read them in their labs in un-trackable ways. As scientific activity is moving online, it’s becoming easier to track which papers are getting more attention from the top scientists. There is also an opportunity to make a large economic impact. Around $1 trillion a year is spent on R&D globally: about $200 billion in the academic sector, and about $800 billion in the private sector (pharmaceutical companies, and other R&D companies).”
– Richard Price, CEO of Academia.edu, quoted in Hall.
1) Metadata does not accurately describe anyone
“Certain forms of knowledge and control require a narrowing of vision. The great advantage of such tunnel vision is that it brings into sharp focus certain limited aspects of an otherwise far more complex and unwieldy reality. This very simplification, in turn, makes the phenomenon at the center of the field of vision more legible and hence more susceptible to careful measurement and calculation. Combined with similar observations, an overall, aggregate, synoptic view of a selective reality is achieved, making possible a high degree of schematic knowledge, control, and manipulation.”
(Scott 1998, p.11).
2) Metadata is not actually anonymous
States and institutions conceptualize the world in a certain way and then use that abstraction to draw inferences about individuals. They can say confidently that 80% of 40-year-old Torontonians use the TTC, but aggregated metadata cannot say that Jon Smith uses the TTC. In theory, metadata is anonymous, as it does not (or should not) contain identifiable information. This is one of the big arguments for why we should not be concerned with the collection of metadata.
The problem is that with enough metadata, and with linkability, it is not particularly difficult to connect metadata to individuals. A recent MIT-led study did exactly this. They analyzed “3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely re-identify 90% of individuals.” So while each element of metadata may not uniquely identify you, if there is enough of it, and it has elements in common (such as an IP address), then going from the abstract to the individual is not difficult.
The point here is that institutions do two things – first they create sketches of swathes of groups based on metadata. But then they can also create a sketch of individuals based on their own metadata. An institution thus understands you through two overlapping abstractions. You become a distorted, simplified image – and it is with this distorted, simplified image that states and institutions act.
3) Your labour is being exploited without your knowledge and without remuneration
As citizens, we have helped to pay for the scaffolding that supports a service like Academia.edu, which in turn uses that scaffolding to extract more value from us. It may all be in the abstract, but at the end of the day, our labour is being used to make someone else money. And it is often done without our consent or knowledge.
Metadata & Freedom
The reason why any of this matters is because it negatively impacts the freedom of individuals. The most obvious problem is that of privacy. Massive amounts of metadata and linkability destroy privacy, which last I checked is still a fundamental human right. The State’s use of metadata in the name of national security is particularly problematic, and has been written about extensively elsewhere. But at the end of the day, without the privacy to learn, to read, to look at whatever websites we want, to speak to whoever we want, we are not free. We will self-censor, we will hesitate to discuss certain topics, we will curtail our own freedom.
Relatedly, when institutions prefer a world of dark and light green leaves on trees, we as individuals are systemically encouraged to see the world through that lens. Institutions frame how we understand the world around us, and they frame it in ways that are beneficial to the institutions, not to the citizenry.
A lot of institutions are collecting data about your online activities. The governments of the US and UK collect pretty much every single packet that gets transmitted on the internet. That has terrifying implications. If everything is being collected and stored, then institutions have the capability to retroactively construct narratives of individuals based on metadata. The NSA’s XKEYSCORE program does just that.
So what can be done about it? It is exceptionally difficult to remain completely anonymous and private while online (not impossible though), but it’s not particularly hard to make it more difficult for people and institutions to use your data and your labour for their own gain.
Here are a few suggestions, I’ll get into more in a future post:
1) Constantly remind yourself that free services aren’t free. It is important to remember that data is always being collected about you in order to try to understand you. What is being done with this data is usually unknowable, so you should always ask yourself how necessary this service is to you before you volunteer to allow an institution to collect your data.
2) Disengage. Obviously the best thing to do is to stop using these services, especially the ones that don’t bring you much value. But for a lot of that, that’s not possible or desirable. Even if you do disengage though, the metadata collection doesn’t end. Facebook, for example, has ‘shadow profiles‘ of all its users that contain data that others report about you. More frighteningly, they have ‘shadow profiles’ of people who do not use facebook. So it doesn’t matter if you use facebook or not, they’re collecting data on you. A better solution may be active engagement with an eye to disruption.
3) Participate selectively. In short, don’t participate in ways that are useful to those who are collecting your data. Much of our lives as academics are already on the internet – our affiliations, research interests, etc. So this information is not particularly valuable to an institution like Academia.edu. They extract value by understanding your behaviour, so don’t behave. While it may be valuable for you have a paper posted on Academia.edu for exposure, it only takes a second to find someone’s e-mail address, or to find their publication elsewhere. If you limit your interactions through the service, you help limit the data that is being collected about you.
4) Disrupt. Others’ collection of metadata cannot really be avoided, but you can do a lot to disrupt it and make it more difficult for others to know much about you. For fun, try clicking on random things on your Facebook feed – the targeted ads will change as Facebook tries to incorporate this new behaviour with what they already ‘know’ about you. Do this enough, and you introduce enough noise into their data that the real you fades into the background.
Of course there are a lot of better ways to disrupt and make it more difficult for others to collect data about you. Use a VPN. Block cookies. Use HTTPS. Use an encrypted text messenger. Use a password manager. If you have serious needs for anonymity, use TOR. Oh, and for the love of god, use PGP.
A lot of this seems like it’s overkill, but most of it is extremely simple to integrate, and operates in the background. I’ll write more about these things in future posts.
Disruption is also kind of fun. It allows you to continue to use the services but you also get the satisfaction of knowing that you’re corrupting their clean data of you. I like to imagine that one day, just maybe, some poor data analyst will come across my file and won’t be able to find anything. Then they’ll be sad, and frustrated, and won’t know what to do – it will be difficult for the State to peer into my life.
And that’s exactly how it should be.