Compression is a mechanism to reduce space required to keep
a certain thing or entity. In the world of databases this word assumes a
greater meaning. In the information age, data is growing at exponential rate.
As we have more avenues to capture data.
With this growth in Data, Organization increasingly wants to
draw meaning/value out of this data. That entails - Data Engineers now have to
maintain huge volume of data and not only that, the need for performance is
also growing.
Algorithms for compression have evolved, providing reduction
in space required to maintain data. That’s just the beginning.
Trinity of computing resources namely CPU, Memory and IO
plays a vital role in the performance. Out of these three, IO is the weakest
link in the chain when it comes to data management performance.
So with reduction in size of data using compression, the
need for IO decreases resulting into performance increases. But there will be
increase in the total CPU cost, as compression and subsequent decompression
will need CPU cycles.
A word of caution - High Cardinality data is difficult to
compress and frequent updates at granular level will be slower due to
compression overhead.
When in need of Performance one needs to take compression
factor into consideration.