[an error occurred while processing this directive]

Center for Integrative Bioinformatics Vienna
Max F. Perutz Laboratories
Dr. Bohr Gasse 9
A-1030 Vienna, Austria

	Home
	People
	Publications
	Research
	Teaching
	Software
	Services/Databases

	Max F. Perutz Laboratories
	University of Vienna
	Medical University, Vienna

	Deep Metazoan Phylogeny
	MaBS group
	evolVienna
	Max Perutz Library

Welcome to Epi-Speller

Introduction:

Epi-Speller is a program for analyzing multiple genome-wide profling epigenomic data. It includes:

Signal discretization based on automatic inference of cut-offs for genome-wide signal levels.

Clustering based on letter-representation.

Using sequence logo to summarize the frequent signals using Weblogo.

Availability:

Source code:

Epi-Speller package

Installation:

Unzipping the package and compiling the epi_letter.cpp program by following command (in the same folder):

g++ epi_letter.cpp -o epi_letter

How to use?

Input data:

List of genomic coordinates (e.g. windows, bin, tiles, ...) and corresponding signals (microarray intensity or number of mapped short-reads with or without normalization) for chromatin marks according to the following format (tab-separated):

- First row: Names of chromatin marks

- For each of following row: first is the genomic coordinate (format start:end) following by corresponding signals for chromatin marks in the first row.

- Example input file: example.txt consist of 21K tiles from 12 tiling arrays for histone modification marks and DNA methylation in Arabidopsis.

Running the Epi-Speller step by step

Grouping of epigenetic signatures with input data example.txt
Assigning epi-letters

R --vanilla < alphabet_chrom.R --f <input file> --k <number_of_epi_letter> --d <dictionary_file>--r <0> --o <mutilple_epigenome_filename>

Example: R --vanilla < alphabet_chrom.R --f example.txt --k 3 --d epi_letter.dict --r 0 --o example.epi

Please create the text file with the acronyms for the epi-letter as you want, each row is for a letter (--d parameter, e.g. epi_letter.dict), --r parameter is for creating random epi-letter-represented epigenomes (0-no, 1-yes), default 0.

It will create the multiple epigenomes for all chromatin marks with epi-letter representation in a single file (--o is parameter for output file).

It also creates the look-up dictionary (.dict) listing all the tiles with coordinates, signals and letter_ID assigned and the epi-letter string file (.dna) for each individual mark. The coordinate file (.coor) is created for using in the next step.

Searching/Clustering for epigenetic signatures: either by using conventional profiling signals or by using epi-letter representation as following

3.1 Scanning for the epigenetic patterns

perl epimotif_scanning.pl -f <mutilple_epigenome_filename> (currently only support column patterns)

Example: perl epimotif_scanning.pl -f example.epi

It will create a file with ".cols" that list all column patterns and the corresponding frequency of its appreances (in the file .cols.freq).

3.2 Using R to make a unique column file for removing the repeated patterns for efficient computation of Hamming distance between patterns, for example:

write.table(unique(read.table("example.epi.cols.freq")), "example.epi.cols.freq.uniq", sep = "\t", quote=F, row.names=F, col.names=F)

OR using shell command-line as following:

sort example.epi.cols.freq | uniq > example.epi.cols.freq.uniq

example.epi.cols.freq.uniq is the file of unique column patterns. The orginal pattern file (example.epi.cols) is still necessary for tracing back the corresponding location in the genome.

3.3 Computing Hamming distance matrix for clustering

perl hamming_distance.pl -f <column_pattern_file>

Example: perl hamming_distance.pl -f example.epi.cols.freq.uniq

It will output the .hamming file that can be used for clustering, for example with k-mean method in R in the next step.

3.4 Clustering

R --vanilla < try_clustering.R --f <hamming_distance_file> --u <unique_pattern_file> --c <column_pattern_file> --k <number_of_cluster>

Example: R --vanilla < try_clustering.R --f example.epi.cols.freq.uniq.hamming --u example.epi.cols.freq.uniq --c example.epi.cols --k 4

It will output for each cluster one file (named cluster_xx, xx is the cluster_id) consiting of the pattern, coordinates and cluster_id. It also extract the pattern (the 2nd column in the file .logo) for the logo representation in the next step.

Logo representation using Weblogo 3.2 program (download the sourcecode or here. You have to unzip the files to use it)

Example: ./weblogo-3.2/weblogo --format pdf --ylabel '' --show-xaxis no --alphabet 'LMH' --errorbars no --color red H 'High' --color green L 'Low' --color blue M 'Middle' <cluster_1.logo >cluster_1.pdf

If everything works out, it will produce the logo for the input cluster 1 which looks like cluster_1.pdf.

References:

Dinh HQ, Mittelsten Scheid O, von Haeseler A. Epi-Speller - a bioinformatic tool for epigenomic signature discovery. (submitted)

Crooks et al., WebLogo: a sequence logo generator. Genome Res. 2004 Jun;14(6):1188-90.

Roudier et al., Integrative epigenomic mapping defines four main chromatin states in Arabidopsis. EMBO J. 2011 May 18;30(10):1928-38.

Note:

Please, let us know if you download this program by sending an email to {huy.dinh,arndt.von.haeseler}@univie.ac.at

[an error occurred while processing this directive]

contact imprint .