Bonus $100
Promo Codes 2024
Users' Choice
90
89
88
85

Big data is getting bigger

13 Dec 2011
00:00
Read More

There are two very significant ways that our world has changed in the past decade. Firstly, we are more connected. Secondly we are awash with data.

In a planet with 7 billion people there are now 2 billion PCs and upward of 6 billion mobile connections. Besides these connection which we as human beings have, there are now numerous connections to the internet from devices, sensors and actuators. In other words the world is getting more and more instrumented.

There are in excess of 30 billion RFID tags which enable tracking of goods as they move from warehouse, to retail store, sensors on cars and bridges besides cardiac implants in the human body that are constantly sending a stream of data to the network which is known as “the internet of things.” In addition we have the emergence of the Smart Grid with its millions and millions of smart meters that are capable of sensing power loads and appropriately redistributing power and drawing less power during peak hours.

All these devices - be it laptops, cell phones, sensors, RFIDs or smart meters - are sending enormous amounts of data to the network. In other words there is an enormous data overload happening in the networks of today. According to a Cisco report the projected increase in data traffic between 2014 and 2015 is of the order of 200 exabytes. In addition the report states that the total number of devices connected to the network will be twice the world population or around 15 billion.

Fortunately the explosion in data has been accompanied by falling prices in storage and extraordinary increases in processing power. The data that is generated by the devices by the devices, cell phones, PC etc by themselves are useless. However if processed they can provide insights into trends and patterns which can be used to make key decisions.

For example, the data exhaust that comes from a user's browsing trail - also known as the click stream - provide important insights into user behavior, which can be mined to make important decisions. Similarly inputs from social media like Twitter or Facebook provide businesses with key inputs which can be used for making business decisions. Call Detail records that are created for mobile calls can also be a source of user behavior. Data from retail store provide insights into consumer choices. For these to happen the enormous amounts of data has to be analyzed using algorithms to determine statistical trends, patterns and tendencies in the data.

 

It is here that big data enters the picture. Big data enables the management of the 3 V's of data, namely volume, velocity and variety. As mentioned above the volume of data is growing at an exponential rate and should exceed 200 exabytes by 2015. The rate at which the data is generated, or the velocity, is also growing phenomenally given the variety and the number of devices that are connected to the network. Besides, there is a tremendous variety to the data. Data can be structured, semi-structured and unstructured. Logs could be in plain text, CSV, XML, JSON and so on. The issue of 3 V's of data makes Big Data most suited for crunching this enormous proliferation of data at the velocity at which it is generated.

 

Big data: Big Data or Analytics deals with the algorithms that analyze petabytes of data and identify key patterns in them. The patterns that are so identified can be used to make important predictions in the future. For example Big Data has been used by energy companies in identifying key locations for positioning their wind turbines. To identify the precise location requires that petabytes of data be crunched rapidly and appropriate patterns be identified. There are several applications of big data including identifying brand sentiment from social media, to customer behavior from click exhaust to identifying optimal power usage by consumers.

 

The key difference between Big Data and traditional processing methods are that the volume of data that has be processed and the speed with which it has to be processed. As mentioned before the 3 V's of volume, velocity and variety make traditional methods unsuitable for handling this data. In this context, besides the key algorithms of analytics another player is extremely important in Big Data – that is Hadoop. Hadoop is a processing technique that involves tremendous parallelization of the task

 

The Hadoop ecosystem – Hadoop had its origins at Google during its work with the Google's File System (GFS) and the Map Reduce programming paradigm.

 

HDFS and Map-Reduce: Hadoop in essence is the Hadoop Distributed File System (HDFS) and the Map Reduce paradigm. The Hadoop System is made up of thousands of distributed commodity servers. The data is stored in the HDFS in blocks of 64 MB or 128 MB. The data is replicated among two or more servers to maintain redundancy. Since Hadoop is made of regular commodity servers which are prone to failures, fault tolerance is included by design. The Map Reduce Paradigm essentially breaks a job into multiple tasks which are executed in parallel. Initially the “Map” part processes the input data and outputs a pair of tuples. The “Reduce” part then scans the pair of tuples and generates a consolidated output. For e.g. The “map” part could count the number of occurrences of different words in different sets of files and output the words and their count as pairs. The “reduce” would then sum up the counts of the word from the individual 'map' parts and provide the total occurrences of the words in multiple files.

 

 

The Others: The Hadoop Ecosystem is made up of several languages that facilitate and simplify the programming of the Map-reduce paradigm. The chief among these are Pig, Hive, JAQL and Zookeeper.

 

Conclusion:  It is a foregone conclusion that Big Data and Hadoop will take center stage in the not too distant future given the explosion of data and the dire need of being able to glean useful business insights from them. Big Data and its algorithms provide the way for identifying useful pearls of wisdom from otherwise useless data. Big Data is bound to become mission critical in the enterprises of the future

 

Tinniam V Ganesh is an infrastructure architect at IBM India, Global Technology Services. He blogs at http://gigadom.wordpress.com

.

Related content

Rating: 5