Yahoo CEO Marissa Mayer said that “big data” will have a bigger impact than the Internet. Consider how the Internet completely changed our lives. It’s hard to imagine anything, let alone the vague concept of “big data,” having that type of impact.
Yet, if you have read any article the past year on a legal technology issue, you have undoubtedly heard about big data. There’s still a lot of confusion about big data, its power, its potential, and what it means for lawyers. This article is the first in a series that will explore these issues and illustrate why big data really is (and will continue to be) a big deal for the legal profession.
The first step to understanding big data is to define it. Many people think big data just means a lot of data. That’s only partially true. It is generally accepted that big data “refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Yet, at its core, big data is really about data analytics — sophisticated algorithms that are being applied to incomprehensibly large volumes of data. We create a staggering amount of data each day. For several years, computer scientists have been developing more and more powerful ways to harness the incredible volume of data for all sorts of purposes, such as marketing, medical research and business intelligence. This is not a recent phenomenon. The big data revolution is quiet one. It has been going on for years, right under our noses.
Every time we visit a website or send an email, it is likely that some computer somewhere is tracking our movements and adding to a database that contains our online profile. Researchers use these databases, through highly complex mathematical algorithms, to find patterns in data so they can predict future buying preferences and decisions based on our on-line activities. This information is then used to sell highly focused and effective advertising. This type of data analytics has been going on for years, but many of us have been completely oblivious to it.
Big data has become today’s next phenomenon because the science behind data analytics has continued to grow and is now being used to in numerous areas of our lives — more than just advertising. At the same time, our ability to analyze data has improved, the amount of data we create is increasing dramatically, and our ability to store, process and transfer that data has improved tremendously. We have so much data about so many different aspects of the world, and we now have the capacity to store and collect it. This is a dream for big data researchers. They are figuring how to combine and review these immense data sets together. The result is that they are finding patterns in human conduct and nature that would have never been found without the ability to analyze these large data sets.
In the past, in order to discover or research something new, researchers would postulate a theory, gather data to test it, use statistical sampling to extrapolate from that data and then reach a conclusion. But this process has a major limitation: The researcher must pose the questions before the sample data is collected.
Big data is fundamentally changing this process. Rather than creating a theory and gathering sample data to test it, which may in itself skew the results, researchers are gathering massive amounts of data and then looking for patterns and correlations. In doing so, they are letting the data speak for itself. By looking at massive amounts of data objectively (rather than sample data), researchers are now making discoveries that are not limited by human instinct and intuition. Now, this does not mean that big data replaces human instinct or intuition. But sometimes, human instinct and intuition are skewed by the natural desire to figure out why something happened; for example, why a disease starts. Instead of looking for why, however, big data focuses on what — i.e., what is likely to happen next.
In their excellent book, Big Data: A Revolution That Will Transform How We Live and Work, authors Kenneth Cukier and Viktor Mayer-Schoenberger discuss this dichotomy and give a great example of how big data is being used to predict the what in medicine instead of the why. Researchers in Canada used big data to spot infections in premature babies before any overt symptoms appeared. They took 16 vital signs, like heartbeat, body temperature, respiration, and blood-oxygen levels, and turned that into a stream of information with over 1,000 data points per second. Using this data set, they were able to find correlations and connections in the data that helped predict the existence of an infection before it surfaced. Big data doesn’t explain why the infection starts, but it can help predict what is likely to happen next when certain factors are present at the same time.
In this sense, big data is giving researchers a view of the world never seen before. We are moving from a world where data was used to explain or support a discovery to a world where data — its connections and correlations — is the discovery. In this sense, big data is a collision between math and sociology that promises to change the way we see and analyze our world.
Today, businesses of all kinds are using big data to improve customer service, analyze their competition, manage supply-chains, monitor customer markets, follow societal trends, maintain employee relations, target advertising, find emerging markets, and expedite product innovation. Because technology developments are making it easier to derive value from analyzing data, sophisticated businesses are focusing more than ever on data analytics. These analytics are being used to generate revenue. This monetization of data is fueling the big data explosion. And, as we will explore in upcoming articles, this is the main the reason why lawyers need to understand big data.