Lossless compression algorithm
LZW Algorithm
What is LZW Algorithm?
LZW Algorithm is a universal Lossless
data compression algorithm created by Abraham Lempel , Jacob Ziv and Terry Welch.
This Algorithm is simple to implement and has the potential for very high Throughput in hardware
implementations.
LZW can be made really fast; it grabs a fixed number of bits from input stream, so bit parsing is
very easy. Table lookup is automatic.
[LZW] is a data compression method that takes
advantage of this repetition. The original version of
the method was created by Lempel and Ziv in 1978 (LZ78) and was further refined by Welch in 1984,
hence the LZW acronym. Like any adaptive/dynamic compression method, the idea is to
(1) start with an initial model,
(2) read data piece by piece,
(3) and update the model and encode the data as you go along.
LZW is a "dictionary"-based compression algorithm. This means that instead of tabulating character counts
and building trees (as for Huffman encoding), LZW encodes data by referencing a dictionary. Thus, to encode
a substring, only a single code number, corresponding to that substring's index in the dictionary, needs to
be written to the output file. Although LZW is often explained in the context of compressing text files, it
can be used on any type of file. However, it generally performs best on files with repeated substrings, such as
text files.
##Here I am trying to demonstrate example of compression and decompression of string 'banana_bandana' using LZW algorithm Now trying to create dictionary with index of above individual strings.
Index | Entry |
---|---|
0 | a |
1 | b |
2 | d |
3 | n |
4 | _ |
Compression
Pseudo code
string s;
char ch;
...
s = empty string;
while (there is still data to be read)
{
ch = read a character;
if (dictionary contains s+ch)
{
s = s+ch;
}
else
{
encode s to output file;
add s+ch to dictionary;
s = ch;
}
}
encode s to output file;
code
# Building and initializing the dictionary.
dictionary_size = 256
dictionary = {chr(i): i for i in range(dictionary_size)}
string = "" # String is null.
compressed_data = [] # variable to store the compressed data.
# iterating through the input symbols.
# LZW Compression algorithm
for symbol in data:
string_plus_symbol = string + symbol # get input symbol.
if string_plus_symbol in dictionary:
string = string_plus_symbol
else:
compressed_data.append(dictionary[string])
if(len(dictionary) <= maximum_table_size):
dictionary[string_plus_symbol] = dictionary_size
dictionary_size += 1
string = symbol
if string in dictionary:
compressed_data.append(dictionary[string])
Example
Input | Current String | Encoded output | New Dictionary | Index |
---|---|---|---|---|
b | b | Nothing | none | none |
ba | ba | 1 | ba | 5 |
ban | an | 1,0 | an | 6 |
bana | na | 1,0,3 | na | 7 |
banan | an | - | - | - |
banana | ana | 1,0,3,6 | ana | 8 |
banana_ | a_ | 1,0,3,6,0 | a_ | 9 |
banana_b | _b | 1,0,3,6,0,4 | _b | 10 |
banana_ba | ba | - | - | - |
banana_ban | ban | 1,0,3,6,0,4,5 | ban | 11 |
banana_band | nd | 1,0,3,6,0,4,5,3 | nd | 12 |
banana_banda | da | 1,0,3,6,0,4,5,3,2 | da | 13 |
banana_bandan | an | - | - | - |
banana_bandana | ana | 1,0,3,6,0,4,5,3,2,8 | - | - |
Decompression
Pseudocode
string entry;
char ch;
int prevcode, currcode;
...
prevcode = read in a code;
decode/output prevcode;
while (there is still data to read)
{
currcode = read in a code;
entry = translation of currcode from dictionary;
output entry;
ch = first char of entry;
add ((translation of prevcode)+ch) to dictionary;
prevcode = currcode;
}
##code
maximum_table_size = pow(2,int(n))
file = open(input_file, "rb")
compressed_data = []
next_code = 256
decompressed_data = ""
string = ""
# Reading the compressed file.
while True:
rec = file.read(2)
if len(rec) != 2:
break
(data, ) = unpack('>H', rec)
compressed_data.append(data)
# Building and initializing the dictionary.
dictionary_size = 256
dictionary = dict([(x, chr(x)) for x in range(dictionary_size)])
# iterating through the codes.
# LZW Decompression algorithm
for code in compressed_data:
if not (code in dictionary):
dictionary[code] = string + (string[0])
decompressed_data += dictionary[code]
if not(len(string) == 0):
dictionary[next_code] = string + (dictionary[code][0])
next_code += 1
string = dictionary[code]
Example
Encoded input | Dictionary Translation | Decoded Input | Current String | New Dictionary | Index |
---|---|---|---|---|---|
1 | 1=b | b | - | none | none |
1,0 | 0=a | ba | b | ba | 5 |
1,0,3 | 3=n | ban | a | an | 6 |
1,0,3,6 | 6=an | banan | n | na | 7 |
1,0,3,6,0 | 0=a | banana | an | ana | 8 |
1,0,3,6,0,4 | 4=_ | banana_ | a | a_ | 8 |
1,0,3,6,0,4 | 5=ba | banana_ba | _ | _b | 10 |
1,0,3,6,0,4,5,3 | 3=n | banana_ban | ba | ban | 11 |
1,0,3,6,0,4,5,3 | 2=d | banana_band | n | nd | 12 |
1,0,3,6,0,4,5,3,2,8 | 8=ana | banana_bandana | d | da | 13 |
Join the conversation