[an error occurred while processing this directive]
logo
Center for Integrative Bioinformatics Vienna
Max F. Perutz Laboratories
Dr. Bohr Gasse 9
A-1030 Vienna, Austria
printable version  
   
   Home
   People
   Publications
   Research
   Teaching
   Software
   Services/Databases

   Max F. Perutz Laboratories
   University of Vienna
   Medical University, Vienna

   Deep Metazoan Phylogeny
   MaBS group
   evolVienna
   Max Perutz Library
 

Welcome to Bisulfite Sequencing Scorer (BiSS)


Introduction:

BiSS(Bisulfite Sequencing Scorer) is an analysis pipeline for the deep sequencing data after bisulfite conversion (BS-Seq).

Mapping procedure:

  • using banded Smith-Waterman local alignment algorithm
  • implementated in the Graphical Processing Unit based package NextGenMap.
  • special asymetric scoring function
  • 3.5 letter hashing table.
  • Downtream analysis (methylation calling):

  • using widely-used binomial test
  • using a comprehensive adaptive error estimate that accounts for
  • sequencing errors
  • erroneous bisulfite conversion
  • and also wrongly mapped reads.
  • Implemented in Perl and R scripts.
  • Quick guide of using latest version:

  • installation and configuration: please refer to NextGenMap.
  • using new parameter configuration file as an example example_parameters .
  • Running instruction
  • Indexing: ngmc 0 0 read_filename AND ngmc 1000 10000 reference_filename
  • Running: ngm -c bs_config -r reference_filename -q read_filename -t number_of_threads_used -o output_filename --bs-mapping
  • Running with paired-end data: ngm -c bs_config -1 read1_filename -2 read2_filename -r reference_filename -o output_filename -t number_of_threads_used --bs-mapping


  • Availability:

    Latest version BiSS-on-NextGenMap:

    • Please read the User Manual carefully before using NextGenMap the first time or when you upgrade the new version! 
    • The current version of NextGenMap is currently only tested and released under Linux.
    • !!!IMPORTANT: We are releasing the new version which support multi-core computing for hashing procedure for upgrading the speed. See detail instructions about installation and configuration in the NextGenMap page.

    Download:

    Installation:


    How to use BiSS by an example:

    Sample input data:

  • Reference genome in single/multiple FASTA format, e.g.: example_reference.fas
  • Bisulfite deep sequencing library in FASTA or FASTQ format, e.g.: example_read.fa (single-end 56bp short read library). Currently, the pair-end version is on testing phase and will be released soon.
  • Download the sample data to your working folder directory.
  •  For lookup table, you can download the default 12-mer hashing table .
  • Or create your own k-mer by using a C++ bs_lookup.cpp program.
  • If you created your own table, you need to compile it first g++ bs_lookup.cpp -o bs_lookup
  • And then run the program as bs_lookup -k your_k_mer. Two files bs_k-mer_CT.txt and bs_k-mer_GA.txt will be generated at the same folder.

  • Parameters:

    You need to specify the parameters for NextGenMap as well as for BiSS in the same parameter file.
    You can download an example parameter file: example_parameters .
    The order of parameters shown in the example file, the odd line numbers are for comments (starting with #, for example #wordsize) that briefly explain the parameter, and is followed by that specific parameter value.
    Description of parameter file is as below (please pay attention on the bold parameters that are specficic for BiSS!)

    Parameters Description Default value Parameters Description Default value
    #word-size the k-mer for hashing procedure 12 #Alignment-mode Alignment-type option (0:BiSS) 0
    #SW-match-score SW match score 4 #BiSS-scoreT-T: SW match accounts for T-T matches 4
    #SW-mismatch-score SW mismatch score -2 #BiSS-scoreC-T: SW match accounts for bisulfite conversion C-T (reference-read) 4
    #SW-gap-score(read) SW penalty for gap in the read -10 #identity-cutoff(0..1): the identity cutoff for short read alignment 0
    #SW-gap-score(reference) SW penalty for gap in the read -10 #k-mer-hits_SWcutoff: cutoff for SW alignment 2
    #expanding-thres/corridor extension of the SW local alignment 10 #GPU-ID: for setting up which GPU to use, it has to be unique 0
    #N-best-hits (others: disabled) 1 #Forward/Backward: + or - mapping direction 2
    #Minimal-Score SW score cutoff 2 #BS-Seq-protocol: 1: only forward direction (referring Lister et al, 2008), other: both strands 1
    #Max-reference-length-per-run maximal length of reference for each run 10000010 #Multiple-Hits-allowed maximal multiple mapping allowed 100
    #Max-number-of-reads-per-run maximal number of reads for each run 300000 #Output-format: (0== NextGenMap, 1== SAM) 0
    #K-mer-step-size
    2 #BiSS-Watson-lookup lookup table file for Watson genome
    lookup/bs_12-mer_CT.txt

    #BiSS-Crick-lookup lookup table file for Crick genome
    lookup/bs_12-mer_GA.txt

    Running BiSS on NextGenMap package

    Please make sure that you have all necessary files (parameter file, reference genome, short read library) in the working folder and following the below template.
    NextGenMap_v0.0.1 <Parameter File> <Reference Genome> <Short read file>

    NextGenMap_v0.0.1 example_parameters example_reference.fas example_read.fa example.output

    Output:

    The output format (e.g example.output.) is NextGenMap output format as below (We also support to procedure the SAM format (parameter #Output-format) for visualization)

    Example Output:

    Seq ID: 0 Match ID: 0 Name: ref_W_35047_0:0:0_0/0 forward matches at: 35046 at scaffold: example_reference Score: 224
    times: 1 identity: 0.732143 ++ Q_start: 0 Q_end: 0
    TTTCTTCTCTCTCTCCTCTTTTGACAACTCCTATACACCAAAATAGGCAATTTAAC
    TTTTTTTTCTTTTTTCTTTTTTGATAATTTTTATATATTAAAATAGGTAATTTAAT

    For each short read, 4 output lines are reported:

    Line 1
  • Seq ID: NextGenMap ID.
  • Match ID: index for each equal scored alignment.
  • Name: short read name.
  • Strand: mapped strand (forward (+) or backward (-)).
  • matches at: mapped location in reference (0-base coordinates).
  • at scaffold: mapped chromosome.
  • Score: SW local alignment score.
  • Line 2
  • times: number of equally best scored alignments.
  • identity: local alignment identity (excluding any insertion or deletion site).
  • ++/+-/-+/--: indicate the genome strand and direction of mapping.
  • Q_start: start mapped location in the short read.
  • Q_end: end mapped location in the short read.
  • Line 3
  • Aligned Reference: reference sequence of the alignment.
  • Line 4
  • Aligned Read: short read sequence of the alignment.

  • Downstream analysis

  • Parsing the NextGenMap output mapping to produce the alignment profile for each cytosine by a Perl script biss_downstream.pl
  • perl biss_downstream.pl -r <reference_genome> -m <output_alignment> -c <identity_cutoff>
    (identity_cutoff is to remove the alignment whose less identity from short read SW alignment excluding C-T(G-A) mismatches.)

    perl biss_downstream.pl -r example_reference.fas -m example.output -c 85

    example.output gives the alignment profile for cytosines (both strand, e.g example_W.profile and example_C.profile).
    It consists of 4 columns:
    (1) corresponding genomic coordinate,
    (2) total number of mapped reads,
    (3) total number of mapped reads with C,
    (4) total number of mapped reads with T
    at that coordinate.

    Below is one example

    (1)       (2)     (3)     (4)
    1058    0       12     12
    1068    0       9       9
    1070    0       8       9
    1088    1       8       9
    1097    7       0       7
    1101    2       4       7
    1102    4       2       6
    1107    1       4       5
    1112    2       2       4
    1118    1       1       2
    1125    2       0       2
    1132    1       0       1
    1138    0       0       0

  • Performing the methylation calling test using a methylation_calling.R .
  • methylation_calling.R --profile <alignment_profile_file> --output <methylation_profile_file> --errrate <default_error_rate> [0.04] --cov <minimal_coverage>
    ( is the parameter for the error of binomial test used, default by 0.04)
    R --vanilla < methylation_calling.R --profile outW.profile --output outW.meth -errrate 0.4 --cov 5

    The program computes the methylation profile (e.g example_W.meth) for the input alignment profile.
    It includes 4 lines: (1) Genomic coordinate, (2) number of mapped reads with C at that corrdinate,
    (3) number of mapped reads with T at that corrdinate,  (4) total coverage, and  (5) Benjamini-Hochberg FDR-corrected p.value
    Below is one example

    (1)       (2)     (3)     (4)    (5)
    1058    0       12      12    1
    1068    0       9       9       1
    1070    0       8       9       1
    1088    1       8       9       1
    1097    7       0       7       4.70988912692589e-09
    1101    2       4       7       0.366540175861658
    1102    4       2       6       0.000460400569779035
    1107    1       4       5       1
    1112    2       2       4       NA
    1118    1       1       2       NA
    1125    2       0       2       NA
    1132    1       0       1       NA
    1138    0       0       0       NA

    You can customize the sinlge-cytosine methylation profiles for your own purpose.

    Note:
    Please, let us know if you download or have questions about using this pipeline by sending an email to {fritz.sedlazeck,huy.dinh}@univie.ac.at.

    Reference:

  • Huy Q. Dinh, Manu Dubin, Fritz J. Sedlazeck, Nicole Lettner, Ortrun Mittelsten Scheid, Arndt von Haesler. Advanced methylome analysis after bisulfite deep sequencing: an example in Arabidopsis. PLoS ONE, 7, e41528
  • Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis Cell. 2008 May 2;133(3):523-36.
  • Supplementary data: An example on real data from floral-bud tissue Arabidopsis thaliana:
  • You can download our results on re-analysis of floral bud Arabidopsis ( TAIR7 genome version ) that include the comparison with previously published data.

    [an error occurred while processing this directive]

    contact imprint .