Big Data – A snapshot
“Information is the oil of 21st century, and analytics is the combustion engine” – Peter Sondergaard, Gartner Research
What is Big Data?
“Big Data” is the term for a collection of data sets so large and complex that it becomes
difficult to process using traditional database management tools or data processing
applications. In most enterprise scenarios the volume of data is too big or it moves too
fast or it exceeds current processing capacity.
There are so many challenges around data i.e. capture, analysis, data curation, search,
sharing, storage, transfer, visualization, and information privacy. Accuracy in big data
is a great opportunity that may lead to more confident decision making. And better
decisions mean greater operational efficiency, cost reductions and reduced risks.
Big data analytics is the process of examining large data sets ‘Big Data’ – to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.
Big Data Characteristics:
Data is Data… small or big. What makes data the Big Data?
Volume: Huge data sizes… terabyte to petabytes to zetabytes
We are seeing a progression in data sizes. Historically data used to be generated by workers…in to their computers. Then things evolved to the internet where users can
generate their own data i.e. users are generating data themselves when they use facebook,
linkedin, messenger platforms and other websites etc. Now even machines are generating
data…buildings, city monitors measuring humidity, traffic, temperature, electricity
usage, satellites around the earth monitoring the earth 24 hours and generating data.
This is clear scaling of data…beyond imagination.
Velocity: High speed of data flow, change and processing
Clickstreams and ad impressions capture user behavior at millions of events per second,
high-frequency stock trading reflect market changes in microseconds, Machine to Machine (M2M) processes exchange data among billions of devices, sensors generating real time data and so on.
Variety: Various data sources (social, mobile, M2M, structured & unstructured)
Traditionally we used to manage data as RDBMS (Oracle, SQL server) in structured
manner…Now the sources are different and data types are different. So much variety of
data i.e. spatial data, 3D data, audio and video, and unstructured text (including log
files and social media)
Veracity: Various level of data uncertainty and reliability
Unstructured data is very uncertain and we can not comment on reliability. A user can
behave and generate different type of data in different context and we can not take one
data point or set as absolute we need to cross reference and then consider subjectivity
of data as appropriate.
Why Big Data?
Big Data is reality and inevitable…
– Data is scaling (90% of the data in the world today has been created in the
last two years alone)
– Storage capacity is scaling (We can now talking about terabyte to petabytes to
– Processing power is scaling (Traditionally data used to come to processor for
processing… now processors are being brought to the data)
Tools & Technology:
Several technologies are emerging to handle Big Data problems, ranging from
(1) the open source community;
(2) commercial offerings; and
(3) commercial offerings that extend open source software.
Common technologies used in the context of Big Data Solutions are Hadoop, Spark, Storm,
NoSQL (Cassandra, MongoDB, etc), and Amazon Big Data Suite (S3, EMR, Redshift, Dynamo, etc) to handle different aspects of batch, real-time and predictive processing of big data.
Big Data – Real world applications:
- Real time Customer insight
- Real time Operations/Business insight
- Smarter healthcare
- Personal insight
- Effective Sports performance
- Science and Research
- Machine and Device performance
- Homeland security
- Traffic Control
- Retail solutions
- Proactive Risk Management