Institute for Molecular Biotechnology, Jena (Alemanya)
Much effort is currently invested into making gene prediction more accurate. Traditionally, techniques such as analyzing sequence composition statistics
or homology searches with the help of protein or EST databases, are employed. Only recently, with the launch of projects for comparative genomic sequencing,
a novel method for prediction of exons, genes and regulatory elements became available.
Gapped alignments of homologous genomic sequences of different species at the appropriate phylogenetic distance usually show a patchwork pattern of
conserved and less conserved fragments -the so- called phylogenetic footprint. Simple filtering based on fragment length and match percentage retrieves
a subset of candidate exons. To optimize filter parameters we extracted samples of annotated homologous man/rodent genomic sequences from GenBank. We
show that by optimized filtering of alignments alone one can achieve an approximate correlation (AC) value of predicted and true exons of at least 0.73 a value
which is in the range of that by customary prediction of complete gene models, candidate exons are corrected such that the quality of the sequence signals and
the similarity of the predicted amino acid sequences are maximized.
The examples studied in Man, Mouse and Fugu suggest that simirality based gene prediction may become a valid alternative to the more conventional methods.
|