In this digital age, everyone leaves a trail when they go online. Our habits, whether they're related to travel or entertainment, are recorded by the increasing number of devices that we use daily. This is what we call: Big Data. Ernst and Young refer to it as: “Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines. It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management, and enhanced shareholder value.” And while there is no one definition of Big Data, there are certain elements that are consistent across the different definitions. We call these The 5 V’s of Big Data.
Velocity, Volume, Variety, Veracity, and Value.
Velocity is a measure of how quickly data accumulates.
Data is being generated continuously and endlessly! Information can be processed quickly with the help of near or real-time streaming, local, and cloud-based technologies.
Volume is the scale of the data.
Things that help increase volume include data sources, higher-resolution sensors, and scalable infrastructure.
Variety is the diversity of the data.
Structured data is stored in relational databases (like a spreadsheet), while unstructured data is not organized in a pre-defined way, such as Tweets, blog posts, pictures, numbers, and video. Variety reflects the different sources of data, from machines, people, and processes, both internal and external to organizations.
Drivers include mobile technologies, social media, wearable technologies, geographic information systems, and video, among other things.
Veracity is the quality and origin of data, and its adherence to facts and accuracy.
Attributes of a product include consistency, completeness, integrity, and ambiguity, while its drivers are cost and the need for traceability. With so much data available, it is no wonder the debate rages on about the accuracy of information in the digital age.
Value is the ability and need to turn data into useable information.
Value isn't just profit. It may have medical or social benefits, as well as customer, employee, or personal satisfaction. People invest time to understand Big Data because they can gain value from it.
So, what do 5 V’s look like in action?
Velocity: Every 60 seconds, hours of footage are uploaded to YouTube which is generating data. Think about how quickly data accumulates over hours, days, and years.
Volume: There are approximately seven billion people in the world, and most of them are using digital devices. All these devices generate, capture, and store data. That's a lot of bytes -- approximately 2.5 quintillion of them every day - that's the equivalent of 10 million Blu-ray DVD's.
Variety: Think about the various types of data; text, pictures, film, sound and health data from wearable devices.
Veracity: 80% of data is considered to be unstructured and we must devise ways to produce reliable and accurate insights. The data must be categorized, analyzed, and visualized. When this happens, we can create
Data scientists are able to derive insights from Big Data and address the challenges presented by their massive datasets. Because of the scale of the data being collected and analysed, it means that it’s not feasible to use regular data analysis tools. Fortunately, Apache Spark, Hadoop, and Amazon Web Services (AWS) provide a means to extract, load, analyze and consequently process big data across distributed compute resources. This helps provide new insights and knowledge for us and our customers. At Spark, we use all three tools, but as a leading partner of AWS - we use our knowledge and expertise in AWS’ tools, like Quicksight, Athena and Sagemaker to help analyse data to provide more accurate solutions for our customers.
If you want to learn more about how we use the 5 V’s of Big Data to help provide accurate solutions to our customers, book a discovery workshop and see how Spark can help!🚀