Burrows-Wheeler Transform

Burrows-Wheeler Transform (BWT) does not compress data, but it simplifies the structure of input data and makes more effective further compression. One of the distinctive feature of this method is operation on the block of data (as a rule of size 512kB - 2 mB). The main idea of this method is block sorting which groups symbols with a similar context. Let us consider how BWT works on the input data block ‘abracadabra'. The first step is to create a matrix containing all its possible cyclic permutations. The first row is input string, the second is created by shifting it to the left by one symbol and so on:

abracadabra
bracadabraa
racadabraab
acadabraabr
cadabraabra
adabraabrac
dabraabraca
abraabracad
braabracada
raabracadab
aabracadabr

Then all rows are sorted in accordance with the lexicographic order:

 0 aabracadabr
 1 abraabracad
 2 abracadabra
 3 acadabraabr
 4 adabraabrac
 5 braabracada
 6 bracadabraa
 7 cadabraabra
 8 dabraabraca
 9 raabracadab
10 racadabraab

The last step is to write out the last column and the index of the input string: rdarcaaaabb, 2 - this is a result of the forward BWT transform.

Inverse BTW is performed as follows:

elements of the input string are numbered in ascending order

 0  r
 1  d
 2  a
 3  r
 4  c
 5  a
 6  a
 7  a
 8  a
 9  b
10  b

and sorted in accordance with the lexicographic order:

 2  a
 5  a
 6  a
 7  a
 8  a
 9  b
10  b
 4  c
 1  d
 0  r
 3  r

This index array is a vector of the inverse transform (Inv), the further reconstruction of the string is performed in the following manner:

src[] = "rdarcaaaabb";
Inv[] = {2,5,6,7,8,9,10,4,1,0,3};
index = 2; // index
of the initial string is known from the forward BWT
for( i = 0; i < len; i++ ) {
  index = Inv[index];
  dst[i] = src[index];
}

Submit feedback on this help topic

Copyright © 2000 - 2011, Intel Corporation. All rights reserved.