A Fast Ab Initio Method for Predicting miRNA Precursors in Genomes - Université d'Évry Access content directly
Journal Articles Nucleic Acids Research Year : 2012

A Fast Ab Initio Method for Predicting miRNA Precursors in Genomes


MicroRNAs (miRNAs) are non-coding RNAs with only 21-25 nt in sequence length that are present in all sequenced higher eukaryotes ([1]). miRNA genes are cleaved into a 40-940 nt long precursor of miRNA sequences (pre-miRNAs). Pre-miRNAs, structured as hairpins, are transported into the cytoplasm and are cleaved into mature miRNA ([1]). They are involved as negative regulators of gene expression by binding to specific mRNA targets ([1]). Bioinformatics methods that predict pre-miRNAs can be divided into three approaches: comparative genomics, homology-based approaches and ab initio approaches. Comparative genomics and homology-based approaches cannot detect miRNAs of unknown families and/or miRNAs with no close homologous in genomes. Furthermore, comparative approaches do not work on new genomes that do not have a closely related sequenced species. Ab-initio methods are needed to predict new miRNAs in genomes. In our knowledge, there are very few ab initio algorithms that search for pre-miRNA structures in whole genomes and all are specific to one or some genomes. We present a new ab initio method, called miRNAFold, for predicting pre-miRNA structures in any genome. Our method consider a sliding window of a given size L sufficiently long to contain a pre-miRNA. In a first step, we search for long exact Watson-Crick stems which verify some criteria. In a second step, we extend the selected stem in order to get the longest symmetrical non-exact Watson-Crick stem verifying some criteria. This longest symmetrical non-exact stem can correspond to a large portion of a pre-miRNA. Possible pre-miRNA hairpins are then searched for in the subsequence associated to the selected symmetrical non-exact stem. At each step, several selection criteria are used, corresponding to several features observed on the exact stems, the symmetrical non-exact stems and the hairpins. Some of these criteria, for example G; ratio A, U, C and G, are also used in ([2,4]). Because a miRNA hairpin can present some of these features but not all, an exact stem, a symmetrical non-exact stem or an hairpin is selected when a certain percentage of the criteria are verified. This percentage is a parameter which could be set by the user. We compared our algorithm miRNAFold with RNALFold ([3]) which searches in genomic sequences for all possible non-coding RNA secondary structures including hairpins. We thus compared the hairpins predicted by RNALFold with the ones predicted by our algorithm miRNAFold. We used RNALFold software in version 1.8.4. downloaded from the Vienna RNA Package (www.tbi.univie.ac.at/RNA/) and it was run with its default parameters. We used a sliding window of 150 nt for each of thr two software. We tested miRNAFold and RNALFold on the human, mouse, zebrafish and sea squirt genomic sequences. Each sequence contains a cluster of several known miRNAs. miRNAFold was run with a threshold of 70% for the minimum percentage of verified criteria. miRNAFold has better sensitivity and selectivity results than RNALFold on the human, mouse, zebrafish and sea quirt genomic sequences. Moreover miRNAFold is the fastest algorithm. Our average time execution is 57 seconds for a sequence of 1 million of nucleotides, when RNALFold has an average time execution of 5 minutes and 46 seconds. miRNAFold is then almost 6 times faster than RNALFold.

Dates and versions

hal-00667075 , version 1 (06-02-2012)



Sébastien Tempel, Fariza Tahi. A Fast Ab Initio Method for Predicting miRNA Precursors in Genomes. Nucleic Acids Research, 2012, 40 (11), pp.e80. ⟨10.1093/nar/gks146⟩. ⟨hal-00667075⟩
97 View
0 Download



Gmail Facebook Twitter LinkedIn More