Lossless compression algorithm

LZW Algorithm

What is LZW Algorithm?

    LZW Algorithm is a universal Lossless 
    data compression algorithm created by Abraham Lempel , Jacob Ziv and Terry Welch.

    This Algorithm is simple to implement and has the potential for very high Throughput in hardware 
    implementations.

    LZW can be made really fast; it grabs a fixed number of bits from input stream, so bit parsing is
    very easy. Table lookup is automatic. 

    [LZW] is a data compression method that takes 
    advantage of this repetition. The original version of
    the method was created by Lempel and Ziv in 1978 (LZ78) and was further refined by Welch in 1984,
    hence the LZW acronym. Like any adaptive/dynamic compression method, the idea is to 
        (1) start with an initial model,
        (2) read data piece by piece,
        (3) and update the model and encode the data as you go along. 
    LZW is a "dictionary"-based compression algorithm. This means that instead of tabulating character counts 
    and building trees (as for Huffman encoding), LZW encodes data by referencing a dictionary. Thus, to encode
    a substring, only a single code number, corresponding to that substring's index in the dictionary, needs to
    be written to the output file. Although LZW is often explained in the context of compressing text files, it 
    can be used on any type of file. However, it generally performs best on files with repeated substrings, such as 
    text files.

##Here I am trying to demonstrate example of compression and decompression of string 'banana_bandana' using LZW algorithm Now trying to create dictionary with index of above individual strings.

Index Entry
0 a
1 b
2 d
3 n
4 _

Compression

Pseudo code

string s;
char ch;
...

s = empty string;
while (there is still data to be read)
{                                    
    ch = read a character;
    if (dictionary contains s+ch)     
    {
        s = s+ch;                     
    }
    else
    {
        encode s to output file;
        add s+ch to dictionary;
        s = ch;
    }
}
encode s to output file;

code

    # Building and initializing the dictionary.
    dictionary_size = 256                   
    dictionary = {chr(i): i for i in range(dictionary_size)}    
    string = ""             # String is null.
    compressed_data = []    # variable to store the compressed data.

    # iterating through the input symbols.
    # LZW Compression algorithm
    for symbol in data:                     
        string_plus_symbol = string + symbol # get input symbol.
        if string_plus_symbol in dictionary: 
            string = string_plus_symbol
        else:
            compressed_data.append(dictionary[string])
            if(len(dictionary) <= maximum_table_size):
                dictionary[string_plus_symbol] = dictionary_size
                dictionary_size += 1
            string = symbol

    if string in dictionary:
        compressed_data.append(dictionary[string])

Example

Input Current String Encoded output New Dictionary Index
b b Nothing none none
ba ba 1 ba 5
ban an 1,0 an 6
bana na 1,0,3 na 7
banan an - - -
banana ana 1,0,3,6 ana 8
banana_ a_ 1,0,3,6,0 a_ 9
banana_b _b 1,0,3,6,0,4 _b 10
banana_ba ba - - -
banana_ban ban 1,0,3,6,0,4,5 ban 11
banana_band nd 1,0,3,6,0,4,5,3 nd 12
banana_banda da 1,0,3,6,0,4,5,3,2 da 13
banana_bandan an - - -
banana_bandana ana 1,0,3,6,0,4,5,3,2,8 - -

Decompression

Pseudocode

string entry;
char ch;
int prevcode, currcode;
...

prevcode = read in a code;
decode/output prevcode;
while (there is still data to read)
{
    currcode = read in a code;
    entry = translation of currcode from dictionary;
    output entry;
    ch = first char of entry;
    add ((translation of prevcode)+ch) to dictionary;
    prevcode = currcode;
}

##code

    maximum_table_size = pow(2,int(n))
    file = open(input_file, "rb")
    compressed_data = []
    next_code = 256
    decompressed_data = ""
    string = ""

    # Reading the compressed file.
    while True:
        rec = file.read(2)
        if len(rec) != 2:
            break
        (data, ) = unpack('>H', rec)
        compressed_data.append(data)

    # Building and initializing the dictionary.
    dictionary_size = 256
    dictionary = dict([(x, chr(x)) for x in range(dictionary_size)])

    # iterating through the codes.
    # LZW Decompression algorithm
    for code in compressed_data:
        if not (code in dictionary):
            dictionary[code] = string + (string[0])
        decompressed_data += dictionary[code]
        if not(len(string) == 0):
            dictionary[next_code] = string + (dictionary[code][0])
            next_code += 1
        string = dictionary[code]

Example

Encoded input Dictionary Translation Decoded Input Current String New Dictionary Index
1 1=b b - none none
1,0 0=a ba b ba 5
1,0,3 3=n ban a an 6
1,0,3,6 6=an banan n na 7
1,0,3,6,0 0=a banana an ana 8
1,0,3,6,0,4 4=_ banana_ a a_ 8
1,0,3,6,0,4 5=ba banana_ba _ _b 10
1,0,3,6,0,4,5,3 3=n banana_ban ba ban 11
1,0,3,6,0,4,5,3 2=d banana_band n nd 12
1,0,3,6,0,4,5,3,2,8 8=ana banana_bandana d da 13

Ref 1 Ref 2 Source Code

Mausam ... Welcome to WhatsApp chat
How can we help you today?
Type here...