FILE COMPRESSION TECHNOLOGY
FILE COMPRESSION TECHNOLOGY
I came across another interesting and informative topic across Internet; I like to share with you guys. I have compressed this topic for a blog:-).
Many of us would have come across winzip, winrar, 7-zip and lots of compressing software’s. And we would have wonder how it compress the file of some 500kb to 50kb or much lesser .And we usually don’t know how it compress or what compression algorithm it is using. This topic will give some basics on how file compression really works. Hope it will be interesting.
The first thing we must know is that most types of computer files are fairly redundant. That is the same information is repeated and listed over again and again. What the file compression program does is, it just get rid of this redundancy.
Consider the same example: “ask not what your country can do for you; ask what you can do for your country.”
It consists of 79 characters including the spaces, semi-colon and full-stop. Assume that one character takes one unit of memory. So this statement takes 79 memory units. As I mentioned earlier, we have here some redundancy. We see that the same words repeating again here. If you keep the thing before semi-colon as ‘first half’ and after semi-colon as ‘second half’, we see that the second half can refer from the first half. This is the very basic idea that leads to the idea of file compression.
Most compression programs or software use LZ Adaptive Dictionary based Algorithm to shrink files. Here LZ stands for Lempel and Ziv, the creators of the algorithm. Dictionary refers to cataloging pieces of data. Actually while compressing, a dictionary is made ready and is saved along with the compressed file.
Now let us see how to make a dictionary. The dictionary could be as simple as a numbered list. It is given as below:
|
Our sentence now reads: 1 not 2 3 4 5 6 7 8; 1 2 8 5 6 7 3 4.
Now this sentence takes 37 memory units and the dictionary that is saved along with this takes another 37 memory units (Words and Numbers). So the file size reduces to 74 memory units from 79. Of course, we have not reduced the file size much here. But this is just one sentence. Assume that we have a large document, then these words may repeat many times and using this idea we can reduce the file size to a greater extent.
Actually the compression program sees it quite differently. It does not look for repetitive words, but looks for repetitive patterns. If we consider this then we get a different dictionary.
Have you observed that the text files get compressed a lot and MP3 files won’t get compressed to a greater extent? Can you tell why?
The reason is that we have a high rate of redundancy in text, while MP3 files don’t repeat many patterns. So text files compress well while MP3 can’t be compressed much. If you haven’t noticed this yet, you please try compressing a text file and a MP3 file. Then you will notice the difference in the file size reduction.
If a file has a lot of repeated patterns, the rate of reduction typically increases with file size. Above we have considered only one sentence. So this is actually a small file and we get a little compression here from 79 units to 74 units. But as the file size increases, we will still observe more reduction. Also more effective dictionary results in more effective compression.
Please leave me your comments.
dis s a ‘beautiful’ n informative article….. gud mohan…. keep on update ur blog with these kinds of useful stuffs…
goanna compress the comments…..yet another artistic analysis