The 5 V’s of Big Data
In this digital age, everyone leaves a trail when they go online.
Our habits, whether related to travel or entertainment, are recorded by the increasing number of devices we use daily. This is what we call: Big Data. Ernst and Young refer to it as: “Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines. It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered to derive real-time business insights relating to consumers, risk, profit, performance, productivity management, and enhanced shareholder value.” And while there is no one definition of Big Data, certain elements are consistent across the different definitions. We call these The 5 V’s of Big Data.
Velocity, Volume, Variety, Veracity, and Value.
Velocity is a measure of how quickly data accumulates.
Data is being generated continuously and endlessly! Information can be processed quickly with the help of near or real-time streaming, local, and cloud-based technologies.
Volume is the scale of the data.
Things that help increase volume include data sources, higher-resolution sensors, and scalable infrastructure.
Variety is the diversity of the data.
Structured data is stored in relational databases (like a spreadsheet), while unstructured data, such as Tweets, blog posts, pictures, numbers, and videos, is not organised or pre-defined. The variety reflects the different sources of data, from machines, people, and processes, both internal and external, to organisations.
Drivers include mobile technologies, social media, wearable technologies, geographic information systems, and video, among other things.
Veracity is the quality and origin of data and its adherence to facts and accuracy.
Attributes of a product include consistency, completeness, integrity, and ambiguity, while its drivers are cost and the need for traceability. With so much data available, it is no wonder the debate rages on about the accuracy of the information in the digital age.
Value is the ability and needs to turn data into usable information.
Value isn't just profit. It may have medical or social benefits and customer, employee, or personal satisfaction. People invest time in understanding Big Data because they can gain value from it.
So, what do 5 V’s look like in action?
Velocity: Every 60 seconds, hours of footage are uploaded to YouTube, generating data. Think about how quickly data accumulates over hours, days, and years.
Volume: There are approximately seven billion people worldwide, most of whom use digital devices. All these devices generate, capture, and store data. That's a lot of bytes -- approximately 2.5 quintillion of them every day - that's the equivalent of 10 million Blu-ray DVD's.
Variety: Think about the various types of data; text, pictures, film, sound and health data from wearable devices.
Veracity: 80% of data is considered to be unstructured and we must devise ways to produce reliable and accurate insights. The data must be categorized, analyzed, and visualized. When this happens, we can create
Value: Data scientists can derive insights from Big Data and address the challenges presented by their massive datasets. Because of the scale of the data being collected and analysed, it means that it’s not feasible to use regular data analysis tools. Fortunately, Apache Spark, Hadoop, and Amazon Web Services (AWS) provide a means to extract, load, analyze, and process big data across distributed computing resources. This helps provide new insights and knowledge for our customers and us. At Spark, we use all three tools, but as a leading partner of AWS - we use our knowledge and expertise in AWS tools, like Quicksight, Athena and Sagemaker, to help analyse data to provide more accurate solutions for our customers.