In the time it takes you to read this sentence, the volume of data in existence worldwide – both public and private -- will have grown by more than the volume of data contained in all of the books ever written. On YouTube alone, 35 hours of video are uploaded every minute. Unstructured data is growing at a rate of 80% year-on-year.
Data storage has become a major business issue as more of us live our lives online, as companies large and small capture massive volumes of customer information, and as the regulatory framework around data retention tightens.
The benefits of capturing and securely analyzing data should be obvious to all, yet according to a recent IDC report less than 1% of data is analyzed and more than 80% of it remains unprotected. Put simply, big data is too big for all but the largest companies. Traditional storage models are not scalable enough, while the processing power required to quickly make sense of an almost limitless cache of potentially useful information is beyond all but those with the deepest pockets.
To illustrate, nearly 19 million results turn up if you Googe “big data.” Assuming it takes your browser about five seconds to open each link, that’s over 26,000 man-hours to briefly glimpse each page, never mind take in, analyze and act upon the contents in an intelligent way that benefits your business.
This should put into some sort of context the scale of the task of applying big data logic to both public and private sources. That is, to create a large-scale elastic architecture for data-as-a-service (LEADS) that businesses can use to mine and analyze data published on the entire public web.
The objective of LEADS is to build a decentralized DaaS framework that runs on an elastic collection of micro-clouds. LEADS will provide a means to gather, store, and query publicly available data, as well as process this data in real-time.