In Texas, there’s a deeply held belief that if it’s bigger, it’s better. Just look at the 159’ by 71’ big-screen TV in the Cowboys’ new football stadium as a prime example of the prevalent “go big, or go home” mentality. But it’s not just Texas that’s enamored with this “bigger is better” type of thinking. Many IT professionals focusing on the new “big data” craze follow the mantra that if a lot of data is good, even more must be better.
Alignment around the exact definition of “big data” is hard to come by, especially since much of the discussion is being driven by enabling vendors. That said, big data was concisely defined in a recent New York Times article as a “shorthand label that typically means applying the tools of artificial intelligence, like machine learning, to vast new troves of data beyond that captured in standard databases.” Most big data definitions often go on to reference the three Vs: volume, velocity and variety. Yet, often overlooked are the two additional Vs: value and veracity, which are critical in an information governance and legal context. To harmonize the five Vs of big data, it’s important to examine each definition in sequence.
When the five Vs are then looked at in concert and cutting-edge analytical software is applied, the promise of “big data” starts to be revealed. In healthcare, for example, researchers are employing big data analytics to analyze factors in multiple sclerosis to search for personalized treatments. Similarly, healthcare professionals are also mining large genomic databases to find the best ways to treat cancer. Many of these insights are coming from novel data sources (new varieties) like web-browsing data trails, social network communications, sensor data and surveillance content to divine unheard of insights.
And yet, given the relatively narrow range of existing big data use cases (retail trending, advertising insights, healthcare data-mining, etc.) most organizations should still carefully assess the value of information before blindly provisioning another terabyte of storage simply under the auspices that big data insights might be possible. While there are clearly nuggets to be mined in this new, big data era, these analytical insights don’t come without potential costs and risks.