The single point that makes understanding big data so elusive is
that we weren't and still aren't really prepared for big data or its management. 90 percent of
the world's data was generated in the last two years. That's the reason why we
were caught off our guard. There's so much new data produced every day from
social media sites, industrial sensors, satellites, cell phones, photographs,
documents, and much more. Every day our data grows by more than 2.5 quintillion
bytes (2,500,000,000,000,000,000) or just over two billion (2,328,306,436.5)
gigabytes. That data has to be stored somewhere—even temporarily—and sent
through databases and applications for analysis. There's so much new data
piling up that its storage, management, and analysis are overwhelming. This is
why very few really understand big data.
This huge amount of data is why you're hearing so much about big
data and why its understanding is difficult. Data has always been big relative
to our capacity to store, retrieve, analyze, organize, archive, and purge but
now the situation is almost out of our collective control.
As a side note, you might have heard a lot about metadata lately
concerning the private information that the NSA has captured and analyzed.
Metadata is data about data. It's a strange concept but, simply stated,
metadata is a description of your data and you use metadata all the time but
might not realize it. For example, when you snap a digital picture, the
metadata for that picture is the size, date, location, dimensions, pixels, and
so on.
All you have to do to check out metadata for a photo is to right
click the photo file, select Properties, and then select the Details tab. You
can see that metadata also takes up space but is not the data itself. It is
data about data. So we could discuss big metadata as well as big data. So
there's more to data than just the data itself. To clarify, metadata doesn't
make big data big, it makes big data bigger. Now that you have an understanding
of data and metadata, you can now explore what big data is.
Big data is a lot of data. It's more data than we've ever dealt
with before and from more disparate sources. Plus the metadata. It's a lot to
think about. It's a lot to store. It's a lot to analyze. And those are the
major issues of big data. When data becomes so big that its sheer size is the
problem, it is big data. We have data generated from disparate data sources:
cell phones, satellites, electronic sensors, text messages, log files, etc.
Data from so many sources is very complex.
This is big data. You have to collect, store, analyze, organize,
purge, and use the data. It's that process from collection to use to purge that
is the great unknown of big data. Big data is complex and difficult to manage. The
management part of big data is where the lack of understanding comes from.
There are very few people who know how to manage that volume and complexity of
data. Most companies have grown their own pieced together solutions. Each
department usually tries to manage its own data in various forms. What happens
is not only do these companies have huge amounts of disparate data; the data is
stored in disparate locations, and in disparate data technologies. Big data.
Big mess.
An Example of BIG DATA in Action:
Every day in the U.S. some 7 billion shares of stock or other
securities are traded on various financial exchanges, and fully two-thirds of
these trades are executed by computers using algorithms to trade with other
computers, without human participation. Roughly 33,000 discrete trades take place every second on
the New York Stock Exchange and of course they must (and do) take place in a
particular order, one trade at a time, each trade separated from the next by
just a few microseconds. Financial transactions generate an enormous quantity of data to
be stored, processed, and re-processed. But all human activity is now
generating such data at an accelerating rate.
Think of the thousands, millions of selfies, comments and posts and re-sharing on Facebook alone. I am already feeling dizzy.
So are you big enough for BIG DATA ?
No comments:
Post a Comment