Next Generation Sequencing Simulation (NGSS)


Next Generation Sequencing Simulation (NGSS) was developed to simulate the process of Next Generation Sequencing. NGSS takes an input sequence and mutates it given a user defined evolutionary distance applying the Jukes Cantor model of sequence evolution additionally NGSS includes insertions and deletions (given a mutation event a insertion or deletion have both the probability of 2.5%). This process models on one side the existence of mutants and distantly related species, where no reference sequence is available. On the other side it copes with annotation errors in existing reference sequences. The resulting differentiated sequence (target genome) is than taken as the targeted genome.
This is the sequence that is meant to be sequenced.then in silico sequenced.

In the second step of the simulation, NGSS takes the targeted genome and simulates the sequencing step. The rAll reads are randomly sampled across across the entire genome assuming the uniform distribution. Per For each read a user defined sequencing error is introduced given a user defined probability.

NGSS keeps track of any mutations that occur during sequence evolution as well as simulated sequencing errors in reads, which enables subsequent performance evaluation of different assembly programs.

Requirements:

NGSS requires a PC equipped with at least 4 GB of memory and the latest version of the java runtime environment.

Installation:

No installation required.

Parameters:

-c Desired coverage (default: 15)

-i path to rReference sequence filegenome (FASTA (single sequence))

-l Read length (default: 72)

-m Mutation rate (default: 0.01=1%)

-n Prefix for generated output files (default: name of input file)

-o Path to folder for output files, you have to create the folder first

-r Read sequencing error (default: 0.002 = 2%)

A call of NGSS looks like this:

java -Xmx8000m -jar NGSS.jar -c 20 -i Reference_file -l 36 -m 0.05 -n mySeq -o testRun -r 0.03

The option -Xmx8000m specifies that you will allow the Java Runtime to use up to 8000 MB of RAM. You can of course adjust this parameter according to your hardware.
This call will mutate the reference genomesequence with a mutation rate of 5%, create reads with 20-fold coverage and 3% sequencing error anderror rat reade of length 36. All results will be put in the folder 'testRun' in the same directory as the NGSS executable and all created files will be named with the prefix 'mySeq'.

Further adjustments:

Additional adjustments in the program can be madeare possible in the file 'default.properties'. The option 'maxBuffSize' determines how many bases can be stored in how many characters are read and processed simultaneouslya buffer. Memory fine-tuning can be done by adjusting this parameters, but it is only advised for experienced users. 'MaxBuffSize' MUST exceed the read length!

NGSE Output Files:

{prefix}_Coverage:
This file contains the simulated per base coverage of the target sequence in comma-separated format r ready to be plotted in R or any other software.

{prefix}_log:
This file logs all simulation parameters as well as the file paths to the result files. This file is also input file for subsequent analysis with NGE.

{prefix}_RefMutSeq:
The target sequence (mutated reference sequence)

{prefix}_RefMutSeqInt:
Contains tracking values position flags for mutations of in the reference sequence.

{prefix}_RefReads:
The simulated reads in fastq format.

{prefix}_RefReadsInt:
Contains position flags values for tracking mutations of in the reference sequence and sequencing errors for r evaluation of reference assembly.

{prefix}_DenovoReadsInt:
Contains only position flags values for tracking sequencing errors for evaluation of de-novo assembly.

contact imprint .