


These analyses are facilitated in model organisms that feature well-annotated genomes such as humans and yeast where genomic sequence is presented in full chromosomal form, the DNA sequence of which can be found as individual files. The output of NGS is therefore substantial and can be overwhelming for analyses. The advantage of these NGS methodologies for researchers is that high-throughput sequencing allows millions of DNA molecules to be read at the same time. For example, NGS-based methodologies are helping to address biological questions including the human genome project, the human microbiome project, RNA-Seq to analyze gene expression and Chromatin immunoprecipitation coupled to NGS (ChIP-Seq) to assess global DNA-binding sites. Some of these biological questions are being answered by Next Generation Sequencing (NGS). Among those, disciplines related to bioinformatics appear to be the most prominent in terms of demanding resources and tackling complex biological questions an example of which related to the understanding of the mechanisms underlying transcription. In the last few years, traditional HPC centers, such as SciNet at the University of Toronto, have been witnessing the emergence of increasing amounts of work-flows from non-typical disciplines in the field of computational science. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression. The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. RACS is an open source computational pipeline available from any of the following repositories or. We present a one-stop computational pipeline, “Rapid Analysis of ChIP-Seq data” (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale.
