[an error occurred while processing this directive]

Center for Integrative Bioinformatics Vienna
Max F. Perutz Laboratories
Dr. Bohr Gasse 9
A-1030 Vienna, Austria

	Home
	People
	Publications
	Research
	Teaching
	Software
	Services/Databases

	Max F. Perutz Laboratories
	University of Vienna
	Medical University, Vienna

	Deep Metazoan Phylogeny
	MaBS group
	evolVienna
	Max Perutz Library

Welcome to Bisulfite Sequencing Scorer (BiSS)

Introduction:

BiSS(Bisulfite Sequencing Scorer) is an analysis pipeline for the deep sequencing data after bisulfite conversion (BS-Seq).

Mapping procedure:

using banded Smith-Waterman local alignment algorithm

implementated in the Graphical Processing Unit based package NextGenMap.

special asymetric scoring function

3.5 letter hashing table.

Downtream analysis (methylation calling):

using widely-used binomial test

using a comprehensive adaptive error estimate that accounts for

sequencing errors

erroneous bisulfite conversion

and also wrongly mapped reads.

Implemented in Perl and R scripts.

Quick guide of using latest version:

installation and configuration: please refer to NextGenMap.

using new parameter configuration file as an example example_parameters .

Running instruction

Indexing: ngmc 0 0 read_filename AND ngmc 1000 10000 reference_filename

Running: ngm -c bs_config -r reference_filename -q read_filename -t number_of_threads_used -o output_filename --bs-mapping

Running with paired-end data: ngm -c bs_config -1 read1_filename -2 read2_filename -r reference_filename -o output_filename -t number_of_threads_used --bs-mapping

Availability:

Latest version BiSS-on-NextGenMap:

Please read the User Manual carefully before using NextGenMap the first time or when you upgrade the new version!
The current version of NextGenMap is currently only tested and released under Linux.
!!!IMPORTANT: We are releasing the new version which support multi-core computing for hashing procedure for upgrading the speed. See detail instructions about installation and configuration in the NextGenMap page.

Download:

Download the CUDA Developer Driver for Linux, either 32 or 64 bit.
Download sourcecode for Linux: NextGenMap2BS.zip.
Download executable file for Linux: NextGenMap.
The MAC OS and WINDOWS version will be available soon.
You also need to install programming language R and scripting language Perl if they are not yet installed.
Download the supplementary excel file for the publication

Installation:

Following the instruction for installation and configuration on NVIDIA Accelerated Linux Graphics Driver README and Installation Guide
Extracting the NextGenMap2BS.zip file to the NextGenMap2 folder.
Opening a console/terminal and changing the current folder to the NextGenMap2 folder.
Typing make and pressing enter to install automatically.
The NextGenMap_v0.0.1 executable file will be in the NextGenMap2/bin folder.
You can copy it to your bin folder in order to run it from every folder.
In case of problems, please contact your local system administrators first.

How to use BiSS by an example:

Sample input data:

Reference genome in single/multiple FASTA format, e.g.: example_reference.fas

Bisulfite deep sequencing library in FASTA or FASTQ format, e.g.: example_read.fa (single-end 56bp short read library). Currently, the pair-end version is on testing phase and will be released soon.

Download the sample data to your working folder directory.

For lookup table, you can download the default 12-mer hashing table .

Or create your own k-mer by using a C++ bs_lookup.cpp program.

If you created your own table, you need to compile it first g++ bs_lookup.cpp -o bs_lookup

And then run the program as bs_lookup -k your_k_mer. Two files bs_k-mer_CT.txt and bs_k-mer_GA.txt will be generated at the same folder.

Parameters:

You need to specify the parameters for NextGenMap as well as for BiSS in the same parameter file.
You can download an example parameter file: example_parameters .
The order of parameters shown in the example file, the odd line numbers are for comments (starting with #, for example #wordsize) that briefly explain the parameter, and is followed by that specific parameter value.
Description of parameter file is as below (please pay attention on the bold parameters that are specficic for BiSS!)

Parameters	Description	Default value	Parameters	Description	Default value
#word-size	the k-mer for hashing procedure	12	#Alignment-mode	Alignment-type option (0:BiSS)	0
#SW-match-score	SW match score	4	#BiSS-scoreT-T:	SW match accounts for T-T matches	4
#SW-mismatch-score	SW mismatch score	-2	#BiSS-scoreC-T:	SW match accounts for bisulfite conversion C-T (reference-read)	4
#SW-gap-score(read)	SW penalty for gap in the read	-10	#identity-cutoff(0..1):	the identity cutoff for short read alignment	0
#SW-gap-score(reference)	SW penalty for gap in the read	-10	#k-mer-hits_SWcutoff:	cutoff for SW alignment	2
#expanding-thres/corridor	extension of the SW local alignment	10	#GPU-ID:	for setting up which GPU to use, it has to be unique	0
#N-best-hits	(others: disabled)	1	#Forward/Backward:	+ or - mapping direction	2
#Minimal-Score	SW score cutoff	2	#BS-Seq-protocol:	1: only forward direction (referring Lister et al, 2008), other: both strands	1
#Max-reference-length-per-run	maximal length of reference for each run	10000010	#Multiple-Hits-allowed	maximal multiple mapping allowed	100
#Max-number-of-reads-per-run	maximal number of reads for each run	300000	#Output-format:	(0== NextGenMap, 1== SAM)	0
#K-mer-step-size		2	#BiSS-Watson-lookup	lookup table file for Watson genome	lookup/bs_12-mer_CT.txt
			#BiSS-Crick-lookup	lookup table file for Crick genome	lookup/bs_12-mer_GA.txt

Running BiSS on NextGenMap package

Please make sure that you have all necessary files (parameter file, reference genome, short read library) in the working folder and following the below template.
NextGenMap_v0.0.1 <Parameter File> <Reference Genome> <Short read file>

NextGenMap_v0.0.1 example_parameters example_reference.fas example_read.fa example.output

Output:

The output format (e.g example.output.) is NextGenMap output format as below (We also support to procedure the SAM format (parameter #Output-format) for visualization)

Example Output:

Seq ID: 0 Match ID: 0 Name: ref_W_35047_0:0:0_0/0 forward matches at: 35046 at scaffold: example_reference Score: 224
times: 1 identity: 0.732143 ++ Q_start: 0 Q_end: 0
TTTCTTCTCTCTCTCCTCTTTTGACAACTCCTATACACCAAAATAGGCAATTTAAC
TTTTTTTTCTTTTTTCTTTTTTGATAATTTTTATATATTAAAATAGGTAATTTAAT

For each short read, 4 output lines are reported:

Line 1		Seq ID: NextGenMap ID. Match ID: index for each equal scored alignment. Name: short read name. Strand: mapped strand (forward (+) or backward (-)). matches at: mapped location in reference (0-base coordinates). at scaffold: mapped chromosome. Score: SW local alignment score.
Line 2		times: number of equally best scored alignments. identity: local alignment identity (excluding any insertion or deletion site). ++/+-/-+/--: indicate the genome strand and direction of mapping. Q_start: start mapped location in the short read. Q_end: end mapped location in the short read.
Line 3		Aligned Reference: reference sequence of the alignment.
Line 4		Aligned Read: short read sequence of the alignment.

Downstream analysis

Parsing the NextGenMap output mapping to produce the alignment profile for each cytosine by a Perl script biss_downstream.pl

perl biss_downstream.pl -r <reference_genome> -m <output_alignment> -c <identity_cutoff>
(identity_cutoff is to remove the alignment whose less identity from short read SW alignment excluding C-T(G-A) mismatches.)

perl biss_downstream.pl -r example_reference.fas -m example.output -c 85

example.output gives the alignment profile for cytosines (both strand, e.g example_W.profile and example_C.profile).
It consists of 4 columns:
(1) corresponding genomic coordinate,
(2) total number of mapped reads,
(3) total number of mapped reads with C,
(4) total number of mapped reads with T
at that coordinate.

Below is one example

(1)       (2)     (3)     (4)
1058    0       12     12
1068    0       9       9
1070    0       8       9
1088    1       8       9
1097    7       0       7
1101    2       4       7
1102    4       2       6
1107    1       4       5
1112    2       2       4
1118    1       1       2
1125    2       0       2
1132    1       0       1
1138    0       0       0

Performing the methylation calling test using a methylation_calling.R .

methylation_calling.R --profile <alignment_profile_file> --output <methylation_profile_file> --errrate <default_error_rate> [0.04] --cov <minimal_coverage>
( is the parameter for the error of binomial test used, default by 0.04)

R --vanilla < methylation_calling.R --profile outW.profile --output outW.meth -errrate 0.4 --cov 5

The program computes the methylation profile (e.g example_W.meth) for the input alignment profile.
It includes 4 lines: (1) Genomic coordinate, (2) number of mapped reads with C at that corrdinate,
(3) number of mapped reads with T at that corrdinate, (4) total coverage, and (5) Benjamini-Hochberg FDR-corrected p.value
Below is one example

(1)       (2)     (3)     (4)    (5)
1058    0       12      12    1
1068    0       9       9       1
1070    0       8       9       1
1088    1       8       9       1
1097    7       0       7       4.70988912692589e-09
1101    2       4       7       0.366540175861658
1102    4       2       6       0.000460400569779035
1107    1       4       5       1
1112    2       2       4       NA
1118    1       1       2       NA
1125    2       0       2       NA
1132    1       0       1       NA
1138    0       0       0       NA

You can customize the sinlge-cytosine methylation profiles for your own purpose.

Note:

Please, let us know if you download or have questions about using this pipeline by sending an email to {fritz.sedlazeck,huy.dinh}@univie.ac.at.

Reference:

Huy Q. Dinh, Manu Dubin, Fritz J. Sedlazeck, Nicole Lettner, Ortrun Mittelsten Scheid, Arndt von Haesler. Advanced methylome analysis after bisulfite deep sequencing: an example in Arabidopsis. PLoS ONE, 7, e41528

Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis Cell. 2008 May 2;133(3):523-36.

Supplementary data: An example on real data from floral-bud tissue Arabidopsis thaliana:

You can download our results on re-analysis of floral bud Arabidopsis ( TAIR7 genome version ) that include the comparison with previously published data.

[an error occurred while processing this directive]

contact imprint .