Handling Big Data

Introduction

The 21st Century Technological advancements has networked the world into a big web which is generating data at a pace unprecedented and unseen in any previous centuries of human evolution.


This data is generated at a high pace, is distributed, is rich and diverse in content and form, is multimedia, and very tough to manage with any technology of the past.

3V's

The 3 V's that characterize Big Data are as follows:
  • Volume. From 10 GB HDDs in 2000 to 1 TB HDDs in 2018. Social Media Data of Facebook, Instagram, Snapchat, Whatsapp, the mass reach of smart phones, the data being created; the move to IoT connecting everything via sensors. We are looking at data beyond TBs to PetaBytes and ZettaBytes.
  • Velocity. The constant update of individual social life, Real Time update of global events, Businesses capturing data in diverse manners in real time, sometimes breaching privacy, to enrich the user and gain an edge in competition, the velocity of this data is beyond measure. 
  • Variety. It's not just simple text. Big Data is about videos, stories with stickers and emojis, location data, unstructured text, etc. It's all multimedia. RDBMS systems were made for structured table data, with consistency in mind and update once in a while linked to finite storage. But apart from multimedia, today's data is so huge that we are moving to distributed data centers unmanageable by these old Database Systems.

Big Data systems are of two types: Operational Real-Time systems with Dynamic Workload and Analytical systems that perform complex analysis on this real-time data. 
Operational systems deal with real-time requests with aim to low latency in providing results to individual queries from wide audience base of users. The data structures for these generally are: key-value stores, column family stores, graph databases and cloud computing architectures.
Analytical systems are about complex queries which need thorough search in data for insightful results at any given time. Technology like MapReduce help do this and go beyond the limitations of traditional relational databases which suffer from lack of ability to scale.  
So, these systems usually go hand-in-hand while being on opposite spectrum as real-time constraints competes with processing of this huge data for meaningful and impactful results, managing 100 Tbs of data over trillion records with distributed and heterogeneous servers.

Enabling Technologies for Big Data

Cloud Computing

A most common term nowadays and available to everyone. It's an asset to Big Data analytics with its low Cost, Scalable Performance, Providing On-Demand Computing with real-time Speed allowing Productivity on Global level.

Cloud Computing provides services such as: Software as a service (SaaS), Infrastructure-as-a-service (IaaS), Platform as a service (PaaS)

Hadoop

Hadoop is a set of services each tasked to a specific operation. The services are:
1. Distributed File-System
Hadoop Distributed File System (HDFS), is a node network with a live node that controls all other nodes. The main node provides data storage on multiple nodes with division, replication, coordination, and monitoring. These nodes may be heterogeneous and form a linked storage device. 
2. MapReduce
It performs two basic operations- reading data and clustering keys (map), and performing mathematical operations to club the keys from multiple parallel instances from data stream of multiple nodes (reduce).
3. YARN
It manages resources of the systems helping in storing data, monitoring and analyzing system status.

Challenges:
  • Domain and Application Knowledge
  • Autonomous Sources with Distributed and Decentralized Control
  • Huge Data with Heterogeneous and Diverse Dimensionality
  • Mining from Sparse, Uncertain, and Incomplete Data


References:
  • https://www.mongodb.com/big-data-explained
  • https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_exec_summary.ashx
  • https://www.liebertpub.com/doi/pdfplus/10.1089/big.2013.1508
  • http://tarjomefa.com/wp-content/uploads/2017/04/6539-English-TarjomeFa-1.pdf


Comments

  1. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    Online training in USA

    ReplyDelete

Post a Comment