DNAcompress [1] is a fast and efficient compression algorithm for DNA sequences. In the spirit of Lempel-Ziv compression scheme, it first finds all approximate repeats that would provide compression profits via PatternHunter [2], and then encodes each of them by a pointer to its previous occurrence. A better compression performance average on standard benchmark sequence data was obtained (See the following table, encoding time also included).
| Sequence | Size | Biocompress-2 | GenCompress | CTW+LZ | DNACompress | Encoding time |
| CHMPXX | 121024 | 1.6848 | 1.673 | 1.6690 | 1.6716 | 6.21s |
| CHNTXX | 155844 | 1.6172 | 1.6146 | 1.6129 | 1.6127 | 5.58s |
| HEHCMVCG | 229354 | 1.848 | 1.847 | 1.8414 | 1.8492 | 5.41s |
| HUMDYSTROP | 38770 | 1.9262 | 1.9226 | 1.9175 | 1.9116 | 3.21s |
| HUMGHCSA | 66495 | 1.307 | 1.1048 | 1.0972 | 1.0272 | 7.45s |
| HUMHBB | 73323 | 1.88 | 1.8204 | 1.8082 | 1.7897 | 4.04s |
| HUMDABCD | 58864 | 1.877 | 1.8192 | 1.8218 | 1.7951 | 6.13s |
| HUMHPRTB | 56737 | 1.9066 | 1.8466 | 1.8433 | 1.8165 | 5.08s |
| MPOMTCG | 186608 | 1.9378 | 1.9058 | 1.9000 | 1.8920 | 5.84s |
| PANMTPACGA | 100314 | 1.8752 | 1.8624 | 1.8555 | 1.8556 | 4.22s |
| VACCG | 191737 | 1.7614 | 1.7614 | 1.7616 | 1.7580 | 6.60s |
| average | --- | 1.7837 | 1.7434 | 1.7389 | 1.7254 | ---* |
* DNACompress is implemented by Java. Encoding time is obtained from locally running DNACompress and may vary every time it runs because of Java program.
DNACompress is available online or downloaded from here for local use. Here're two DNA sequences that can be used as input to DNACompress: Humghcsa.seq and Hehcmvcg.seq. Only four lowercase characters of {a, c, g, t} are allowed in the input sequence file, and the suffix of its filename must be .seq .
02/28/2001