Next Generation Sequencing Evaluation (NGSE)
Next Generation Sequencing Evaluation (NGSE) is a program to evaluates
the performance of different mapping programs on the basis of the simulated read set
created by Next Generation Sequencing Simulation (NGSS). Supported mapping programs are
Reference assembly-wise BWA (and all assemblers producing SAM format output (including the
MD:Z field (See SAM-format specification)), Shrimp, NextGenMap, SSaha2, Bowtie, Blastn are supported.
Requirements:
NGSE requires a PC equipped with at least 4 GB of memory and the latest version of the
java runtime environment.
Installation:
No installation required.
Parameters:
-d Assembly method that is evaluated (supported is BWA (SAM format), Shrimp, NextGenMap,
SSaha2, Bowtie, Blastn )
-o Path of output file to be created
-p Path of property file (prefix_log) from simulation with NGSS
-r Path to file from assembly program
A call of NGSE looks like this:
java -Xmx8000m -jar NGSE.jar -d BWA -o result -p /home/workspace/testRun/mySeq_log -r /scratch/data/out
The option -Xmx8000m specifies that you will allow the Java Runtime to use up to 8000 MB of RAM.
You can of course adjust this parameter according to your hardware.
This call will evaluate the results of reference
assembly with BWA in file '/scratch/data/out'
from data produced by NGSS in the directory '/home/workspace/testRun/'.
Further adjustments:
Additional adjustments in the program can be made in the file
'default.properties'. The option 'maxLineLength' determines how
long any line in the input files can be. If a line is longer than
this value, the program will stop prematurely. 'maxReadLength' determines
how long a read can be. If there are any reads longer than this threshold
the program will stop prematurely. It is not recommended to change any of
those values. 'logPath' determines the path of the log files the program creates
during evaluation.
NGSE Output Files:
log directory:
Log files from application and debug log file
Output file:
File containing one row for each read with 4 columns:
% bases mapped correctly, % bases mapped wrongly, % unmapped bases, read ID
|