Availability
The first release of GRC is now available for download. It currently requires a Linux OS with Perl (comes with almost all distros) and, if you want to compile from source, g++, and Make. SVN checkout is available through VBI's GForge (registration required). A copy of the user's guide can be found here. Any commments or questions are welcome (anwarren@vt.edu).

Publication
(To reference this work please cite the following)
The Genome Reverse Compiler: an explorative annotation tool
Andrew S Warren and Joao C Setubal BMC Bioinformatics 2009, 10:35

Description
The Genome Reverse Compiler (GRC) is an annotation tool for prokaryotic genomes. Its name and philosophy are based on analogy with a high-level, programming language compiler. In this analogy, the genome is a program in a certain low-level language that humans cannot understand. Given the sequence of any prokaryotic genome, GRC should output its corresponding "high-level program", its annotation. GRC works in a completely automatic manner, using standard input and output formats. The goal is to provide an open-source, easy-to-run, very efficient annotation program.

Automated annotation pipelines are common throughout laboratories that work on genomes, but to our knowledge there is no such thing as an open source, stand-alone, easy-to-run, lab-independent annotation tool such as GRC; therein lies its novelty. GRC finds genes and assigns functional annotations, based on sequence similarity, for prokaryotic genomes. In addition to the genome sequence, GRC requires a multiFASTA file with annotated genes. GRC will perform best when these genes are well annotated and come from organisms closely related to the target genome. In order to perform sequence comparisons, and ultimately function assignment, GRC incorporates an open source version of BLAST (FSA-BLAST). First all possible ORFs are generated for the given genome. The translated sequences are then checked for sequence similarity against the user-specified database using FSA-BLAST. Some ORFs are then discarded and start sites adjusted based on overlap, BLAST score, length, and sequence composition. Resulting putative genes are assigned functions based on the annotation of their corresponding best BLAST hits, and based on the consensus of these functions when Gene Ontology is used. GRC also provides a test module to easily gauge performance against reference annotations.