Before we get down-and-dirty with these, let’s talk about size. I recently received a call for papers that borrowed (and slightly amended) a quote from Douglas Adams’ The Hitchhikers’ Guide to the Galaxy:
"Data is big. You just won't believe how vastly, hugely, mind-bogglingly big it is"
“Every two days now we create as much information as we did from the dawn of civilization up until 2003.”
This is where some good old analogies come into play. During The Bottom Line on Radio 4 on the 25th September, the CEO of digital advertiser Quantcast Konrad Feldman outlined how his company processes 30 petabytes of data per day. He translated this into human processing power in the following terms: people read approximately 250 words per minute, which equates to 1 kilobyte of data. It would take one person one million years to read a petabyte of data or alternatively, the entire population of the UK more than a week. To return to Eric Schmidt, the amount of information he was discussing is 5,000 times as much.
So we’ve established the amount of data produced by us is astronomical. Universal you might say, in more ways than one. The reasons for this are many but foremost among them are increased access worldwide to the Internet, the prevalence of smart-devices and their associated apps and, more interestingly from a sociological standpoint, the willingness of people to volunteer personal information.
The big data revolution, however, is not simply about the production and harvesting of increasingly vast quantities of data. Raw data can be fairly meaningless, particularly in the quantities outlined here. The value in big data lies in meaning, or rather, generating meaning from the data. In this context, the product of data is information. Have a look at Ackoff’s DIKW pyramid (image below) if you’d like to explore this conceptualisation further (yes, it’s a Wikipedia page – apologies to my university education). This process is what is also referred to as 'data mining' or, situating it in the field of surveillance studies, 'dataveillance'. For big data, the sum is greater than the parts. It is driven above all else by the desire to know, followed in close second by the desire for profit.
Much of these data are not revealing, indeed, they may even be anonymous. However, take the masses of data produced by each of us and start to piece it together like a jigsaw and a much more revealing picture emerges. Not only can advertising and marketing companies predict our consumer behaviour over time with a high degree of accuracy but other information can be ascertained that is highly personalised and equally as accurate: political leanings, religion, pregnancy due dates, health concerns. What is more, as technology continues its rampant progress the type of data extracted from us will inevitably change. Add biometrics to the data cocktail and you’re no longer making a jigsaw, it’s more of a Puzz-3D (remember those?)
So what’s the problem? Arguably big data has many benefits for us. Although I prefer no online adverts at all, tailored advertising is arguably preferable than a stream of irrelevance. However, there is part of me that would rather not be second-guessed. Maybe that is the best price for those bike tyres but often I would rather exercise my own judgement and enjoy the experience of shopping around.
Sociologically, this issue is fascinating. Big data has the potential to improve both our decision making and our rationality. If we move from thinking about consumption to health as an example, there are a number of important questions raised. Big data compiled from an array of body-worn sensors capable of detecting physical, physiological and biochemical signals (from the medical to the commercial, i.e. Nike Fuel bands) could be aggregated to provide a comprehensive picture of individual and public health. These datasets can then be sold to other sectors and as one commentator has noted ‘that is where the fun begins.’ Bounded rationality would gradually be eroded and we would be able to make decisions about our health based on sophisticated algorithmic predictions, drawn from data that we as individuals would never be able to access. The question is to what extent would we be able to justify non-conformity with the information we are provided with?
There are clear consequences here for our understanding of risk. Risk goes hand in hand with (in)security. Certainly, it appears that big data has the potential to tackle insecurities; there are clear benefits to be had if public and private services can be better tailored to our individual needs. Of course, what this presupposes is our consent and participation in the production and subsequent use of big data. If we are to continue along this path, we need to be fully aware of the value of our data. We need to understand what is being done with it and by whom. These issues present us with a whole array of new risks around which to orient ourselves.
Once we start to think along these lines it is impossible to ignore the question of ethics and privacy. I have already suggested that the value of our data should be foremost in our thoughts and by this I do not necessarily mean to refer only to monetary value (although do not underestimate the role of business enterprise in fuelling the big data revolution). The regulatory framework for Internet and other communications, in the UK at least, has failed to keep pace with technological development. The processes of data collection, transmission and analysis are opaque for end-users – even if you do read the privacy policies and terms of agreement rather than absent-mindedly clicking ‘I agree’. As I pointed out earlier, the value in big data is the information and meaning that is generated from the data that we supply or that is extracted from us. The consequence is that information is inferred about us that we may originally have had no intention of sharing. Privacy agreements, with this in mind, are essentially flawed. It isn’t all doom and gloom though; progress is being made in the regulation arena. The proposed General Data Protection Regulation aims to extend controls to foreign companies that process the data of EU citizens.
Nevertheless, I was interested to hear later on in The Bottom Line about the concept of ‘differential privacy’; the idea that by triangulating data across a number of datasets, all of which may be anonymous, the identity of an individual could be ascertained with a 96% degree of certainty. Again, this is evidence that the privacy framework for much of our everyday communications is flawed but equally so, as is our understanding of what is going on behind the scenes. We are constituted by our data and it is not too much to expect that our decisions and life-choices, rather than being laid out for us as the ‘best option’ are in actual fact pre-determined as the ‘only option’. We just don’t see it happening. We are slowly removed from our own decision-making process.
Big data does not have to be a big problem. But I do believe it should raise some big questions. Big data is placed at a very busy intersection of commerce, security, public services and social relationships. As individuals, we are standing at the very centre of it but are in danger of losing sight of the bigger picture at precisely the same time as those with an interest in our data are piecing it together.