The term has been in use since the 1990s, with some giving credit to John mashley for popularizing the term. Big data usually includes data with sizes beyond the ability of storage. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many zettabytes of data.
Big data is a problem not a technology . Big data creates problems like where to store it ?
If we get storage device to store big data then gradually the problem of velocity arises. How can we perform input output operations on a data which is generating with a tremendous rate?
Here are the some facts about big data:-
- Facebook:- Every day, we feed Facebook’s data beast with mounds of information. Every one minute, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted.It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Facebook has to maintain all of their data.
- Google:-Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth. This indicates how much data we are creating day by day. And this is going to be increase day by day.
- Interesting fact is that we are only abe to process the 10% of the total available data. Remaining 90% we don’t know how to use it or how to extract information from it .
- Distributed Storage system is a way by which we can store this large amount of data . All the tech giants are following this technique. Distributed Storage system provide us unlimited storage.
- Distributed Storage system resolves our problem of volume of data as well as velocity of data. One of the most used software to achieve distributed Storage system is Hadoop. This works on the master slave topology.
@big data @hadoop