• Niko Popitsch and Arndt von Haeseler
    NGC: lossless and lossy compression of aligned high-throughput sequencing data
    Nucl. Acids Res. first published online October 12, 2012 doi:10.1093/nar/gks939
    (html), (pdf)


NGC can be downloaded here (for non-commercial use only!):

Start NGC with
java -jar -Xmx4G ngc-core-0.0.1-standalone.jar <params>


Example: compress a BAM that was aligned to hg19 file with standard compression parameters and 4GB dedicated RAM

java -jar -Xmx4G ngc-core-0.0.1-standalone.jar compress -i data.bam -o data.ngc -r hg19.fa
Please note that you may adapt the stringency settings of the used Picard SAM/BAM parser using the -validationStringency parameter. You may, e.g., set this param to "SILENT" if NGC/Picard complain about "MAPQ not being zero for unmapped reads" and similar format inaccuracies.

Example: decompress the resulting NGC file

java -jar -Xmx4G ngc-core-0.0.1-standalone.jar decompress -i data.ngc -o data-decompressed.bam -r hg19.fa

Example: compress/decompress a NGC file using parameters for (i) various per-base quality quantization schemes, (ii) maximum (bzip2) compression, (iii) qvalue RLE encoding. The read names are dropped and base qualities are preserved at the variant positions provided in the passed VCF file. Finally, the input SAM/BAM is not validated (fastest option). Please refer to the paper for details about the various quality quantization and compression strategies.

java -jar -Xmx4G ngc-core-0.0.1-standalone.jar compress -i data.sam -r hg19.fa -best -q1levels 30,50 -q2levels standard -qvalRleEncoding -truncateNames -variantList list.vcf -validationStringency SILENT
java -jar -Xmx4G ngc-core-0.0.1-standalone.jar decompress -i data.sam.ngc -r hg19.fa

Evaluation data

The following data sets were used in the NGC evaluation. The data was mapped with bwa, unmapped reads were pruned from the data. The resulting BAM files are also linked here: