LZW Algorithm

What is LZW Algorithm?

    LZW Algorithm is a universal Lossless 
    data compression algorithm created by Abraham Lempel , Jacob Ziv and Terry Welch.

    This Algorithm is simple to implement and has the potential for very high Throughput in hardware 
    implementations.

    LZW can be made really fast; it grabs a fixed number of bits from input stream, so bit parsing is
    very easy. Table lookup is automatic. 

    [LZW] is a data compression method that takes 
    advantage of this repetition. The original version of
    the method was created by Lempel and Ziv in 1978 (LZ78) and was further refined by Welch in 1984,
    hence the LZW acronym. Like any adaptive/dynamic compression method, the idea is to 
        (1) start with an initial model,
        (2) read data piece by piece,
        (3) and update the model and encode the data as you go along. 
    LZW is a "dictionary"-based compression algorithm. This means that instead of tabulating character counts 
    and building trees (as for Huffman encoding), LZW encodes data by referencing a dictionary. Thus, to encode
    a substring, only a single code number, corresponding to that substring's index in the dictionary, needs to
    be written to the output file. Although LZW is often explained in the context of compressing text files, it 
    can be used on any type of file. However, it generally performs best on files with repeated substrings, such as 
    text files.

##Here I am trying to demonstrate example of compression and decompression of string 'banana_bandana' using LZW algorithm Now trying to create dictionary with index of above individual strings.

Index	Entry
0	a
1	b
2	d
3	n
4	_

Compression

Pseudo code

string s;
char ch;
...

s = empty string;
while (there is still data to be read)
{                                    
    ch = read a character;
    if (dictionary contains s+ch)     
    {
        s = s+ch;                     
    }
    else
    {
        encode s to output file;
        add s+ch to dictionary;
        s = ch;
    }
}
encode s to output file;

code

    # Building and initializing the dictionary.
    dictionary_size = 256                   
    dictionary = {chr(i): i for i in range(dictionary_size)}    
    string = ""             # String is null.
    compressed_data = []    # variable to store the compressed data.

    # iterating through the input symbols.
    # LZW Compression algorithm
    for symbol in data:                     
        string_plus_symbol = string + symbol # get input symbol.
        if string_plus_symbol in dictionary: 
            string = string_plus_symbol
        else:
            compressed_data.append(dictionary[string])
            if(len(dictionary) <= maximum_table_size):
                dictionary[string_plus_symbol] = dictionary_size
                dictionary_size += 1
            string = symbol

    if string in dictionary:
        compressed_data.append(dictionary[string])

Example

Input	Current String	Encoded output	New Dictionary	Index
b	b	Nothing	none	none
ba	ba	1	ba	5
ban	an	1,0	an	6
bana	na	1,0,3	na	7
banan	an	-	-	-
banana	ana	1,0,3,6	ana	8
banana_	a_	1,0,3,6,0	a_	9
banana_b	_b	1,0,3,6,0,4	_b	10
banana_ba	ba	-	-	-
banana_ban	ban	1,0,3,6,0,4,5	ban	11
banana_band	nd	1,0,3,6,0,4,5,3	nd	12
banana_banda	da	1,0,3,6,0,4,5,3,2	da	13
banana_bandan	an	-	-	-
banana_bandana	ana	1,0,3,6,0,4,5,3,2,8	-	-

Decompression

Pseudocode

string entry;
char ch;
int prevcode, currcode;
...

prevcode = read in a code;
decode/output prevcode;
while (there is still data to read)
{
    currcode = read in a code;
    entry = translation of currcode from dictionary;
    output entry;
    ch = first char of entry;
    add ((translation of prevcode)+ch) to dictionary;
    prevcode = currcode;
}

##code

    maximum_table_size = pow(2,int(n))
    file = open(input_file, "rb")
    compressed_data = []
    next_code = 256
    decompressed_data = ""
    string = ""

    # Reading the compressed file.
    while True:
        rec = file.read(2)
        if len(rec) != 2:
            break
        (data, ) = unpack('>H', rec)
        compressed_data.append(data)

    # Building and initializing the dictionary.
    dictionary_size = 256
    dictionary = dict([(x, chr(x)) for x in range(dictionary_size)])

    # iterating through the codes.
    # LZW Decompression algorithm
    for code in compressed_data:
        if not (code in dictionary):
            dictionary[code] = string + (string[0])
        decompressed_data += dictionary[code]
        if not(len(string) == 0):
            dictionary[next_code] = string + (dictionary[code][0])
            next_code += 1
        string = dictionary[code]

Example

Encoded input	Dictionary Translation	Decoded Input	Current String	New Dictionary	Index
1	1=b	b	-	none	none
1,0	0=a	ba	b	ba	5
1,0,3	3=n	ban	a	an	6
1,0,3,6	6=an	banan	n	na	7
1,0,3,6,0	0=a	banana	an	ana	8
1,0,3,6,0,4	4=_	banana_	a	a_	8
1,0,3,6,0,4	5=ba	banana_ba	_	_b	10
1,0,3,6,0,4,5,3	3=n	banana_ban	ba	ban	11
1,0,3,6,0,4,5,3	2=d	banana_band	n	nd	12
1,0,3,6,0,4,5,3,2,8	8=ana	banana_bandana	d	da	13

Ref 1 Ref 2 Source Code

Py Data

Lossless compression algorithm

LZW Algorithm

Compression

Pseudo code

code

Example

Decompression

Pseudocode

Example

Post a Comment

Digital Signal Processing

In-Depth Python Topics: Going Beyond the Basics

Git & GitHub

Data Masking

Py Data