Where are You Going to Store All That Big Data?

Big Data has many applications we haven’t even dreamed of yet.

Aug 05 133 min read

Dig In contributor

There's no end to the chatter out there about the power and awesomeness of big data. (Okay, I'm guilty of that as well.) And we're only getting started with things — Big Data has many applications we haven't even dreamed of yet.

But nobody is really talking about where insurance companies are going to put all that data.

The problem is, once data is converted into something meaningful--customer records, insights, communications, analyses--there's more of an onus to keep it around. In fact, there may even be legal requirements (or threats of legal actions) that necessitate holding data for seven years to life.

So, when an organization is dealing with 500 terabytes of data from various sources and for various purposes, guess what? That's at least 100 terabytes of disk space, when allowing for compression. The data needs to be stored on disks or tapes, and still be accessible. Then there's still metadata, or data about the data, on top of that.

So the numbers for disk storage systems, which could also include off-site or cloud-type storage, begin to add up. What is the smart way to handle all this overhead?

Bill Kleyman, writing in Data Center Knowledge, provides an excellent discussion of what needs to be considered when storing all that big data — especially for a distributed, “cloud-ready” environment:

Consider bandwidth. For efficiency, it’s important to calculate bandwidth, Kleyman says, and this requires understanding a number of factors, such as “distance the data has to travel (number of hops), failover requirements, amount of data being transmitted, and the number of users accessing the data concurrently.”

Develop a replication policy. “In some cases, certain types of databases or applications being replicated between storage systems have their own resource needs. Make sure to identify where the information is going and create a solid replication policy.”

Pick the right storage platform. Factors to consider include whether the system can support planned or future utilization, and how easily data can be migrated, and data control mechanisms.

Control the data flow. “Basically, there needs to be consistent visibility in how storage traffic is flowing and how efficiently it’s reaching the destination.”

Use intelligent storage (thin provisioning/deduplication). An intelligent data deduplication strategy will free up immense amounts of space on disks, Kleyman advises. In addition, he adds, “look for controllers which are virtualization-ready. This means that environments deploying technologies like VDI, application virtualization or even simple server virtualization should look for systems which intelligently provision space – without creating unnecessary duplicates.”

Joe McKendrick is an author, consultant and blogger specializing in information technology. This blog originated on Insurance Networking News, a SourceMedia publication.