GenCompress

and GenDistance program computing distance matrix


Table of Contents

  1. What's GenCompress?
  2. What file format for DNA sequence?
  3. Where can I find some references to GenCompress?
  4. Can I get a linux version of GenCompress run on my computer?
  5. Are there any applications employing our GenCompress algorithm?
  6. What's GenDistance?

What's GenCompress ?

GenCompress is an efficient compression algorithm for DNA sequences. Recently, data compression became a tool to retrieve information hidden in genetic sequences. From GenCompress results a shared information distance value is approximately calculated, which measures the relatedness between each pair of DNA sequences. Shared information distance is defined in Kolmogorov complexity theory. GenCompress can be downloaded from here.

Back to Top

What file format for DNA sequence?

A file containing only four lower-case characters of { a, c, g, t} is considered as a DNA sequence by GenCompress. Otherwise, it will be considered as a ASCII file to be compressed.

Back to Top

Where can I find some references to GenCompress ?

Back to Top

Can I get a linux version of GenCompress run on my computer?

Yes. Please click here to get a linux executive version of GenCompress, and here for SunOS-5.7 executive version!

Back to Top

Are there any applications employing our GenCompress algorithm?

We use GenCompress results to approximately compute a distance (similar) measure between DNA sequences based on shared information, then can successfully construct biological evolutionary trees via distance matrix (please refer to our paper published in Bioinformatics). It can also be used to infer phylogeny of English texts (please refer to a paper by Charles Bennett, Ming Li, Bin Ma, Linking chain letters, to appear in Scientific American, accepted May. 2000).

Back to Top
 

What's GenDistance?

GenDistance can calculate a n*n distance matrix for n input sequences, the (i, j) element of which is a distance measure between ith and jth sequences obtained from the above GenCompress program. In theory, this distance matrix should be symmetric. GenDistance program can be downloaded from here.

Back to Top

Any suggestions, comments? Please mail to xinchen@cs.ucr.edu .
Revised: March 23, 2003 .