Friday, June 1, 2012

Economics of Compression


Compression is a mechanism to reduce space required to keep a certain thing or entity. In the world of databases this word assumes a greater meaning. In the information age, data is growing at exponential rate. As we have more avenues to capture data.

With this growth in Data, Organization increasingly wants to draw meaning/value out of this data. That entails - Data Engineers now have to maintain huge volume of data and not only that, the need for performance is also growing.

Algorithms for compression have evolved, providing reduction in space required to maintain data. That’s just the beginning.

Trinity of computing resources namely CPU, Memory and IO plays a vital role in the performance. Out of these three, IO is the weakest link in the chain when it comes to data management performance.

So with reduction in size of data using compression, the need for IO decreases resulting into performance increases. But there will be increase in the total CPU cost, as compression and subsequent decompression will need CPU cycles.

A word of caution - High Cardinality data is difficult to compress and frequent updates at granular level will be slower due to compression overhead.

When in need of Performance one needs to take compression factor into consideration.