Big data's hidden problem

Why 'big data' is smaller than 'copy data', and what it means for your business.

By Budd Illic · 7 Jul 2014

By Budd Illic ·
7 Jul 2014

Effectively managing “big data” is a growing challenge for many organisations. But are we missing the real issue? Technology historian George Dyson put it bluntly: "Big data is what happened when the cost of keeping information became less than the cost of throwing it away." But it seems when it comes to big data, Australian organisations are paying more and keeping less.

Recent research shows that big data accounts for only three per cent of the total data storage footprint. However, Forrester’s latest Digital Realty survey found nearly 60 per cent of decision makers in Australia and Singapore plan to increase their datacentre budgets between 5-10 per cent over the next 12 months. So if only three per cent of data stored is ‘big’, what makes up the rest of it? And why are we preparing to spend more to store it?

It turns out that the real problem is data proliferation.

It’s something with which we’re all familiar. You take a photo, save it to your computer, edit it, post it on Facebook, Tweet it, email it to a friend and back it up when you upgrade your computer. So you’ve made several copies of the same photo, saved in different places. At work, when you email a PowerPoint attachment to ten colleagues, the email system saves a copy, and your colleagues may save it to their computers too. A single email shouldn’t gobble up lots of storage space, but the copying of large datasets will quickly amass to petabytes inside the modern enterprise. IDC estimates that 60 per cent of what is stored in data centres is actually copy data – multiple copies of the same thing or outdated versions. The vast majority of stored data are extra copies of production data created by disparate data protection and management tools like backup, disaster recovery, development and testing, and analytics.

According to IDC, global businesses will spend $46 billion to store extra copies of their data this year. This ‘copy data’ glut in data centres costs businesses money, as they store and protect useless copies of an original.

While many IT providers are focused on how to deal with the mountains of data that are produced by this intentional and unintentional copying, far fewer are addressing the root cause of copy data. In the same way that prevention is better than a cure, reducing this weed-like data proliferation should be a priority for businesses.

Enterprise IT leaders tend to have similar key priorities – improving resiliency, increasing agility, and moving toward the Cloud to make their systems more distributed and scalable. Often they are held back by old software and hardware. Copy data virtualisation - freeing organisations’ data from their legacy physical infrastructure just as virtualisation did for servers a decade ago – is likely to be the way forward. If business divisions work on a single physical ‘golden’ copy which can spawn innumerable virtual copies, than copies won’t take up server space.

So despite all the big noise about big data, it’s not going to pose a threat to you just yet – it’s copy data you want to watch out for. The sooner companies reduce the creation of physical copies, the less they will have to spend on storage.

Budd Illic serves as Regional Manager, ANZ, Actifio. Actifio delivers copy data virtualisation, letting customers capture application data, manage it more economically, and use it as they wish.