Although tens of thousands of mouse lncRNAs are annotated and taken from existing database sources (RefSeq, Gencode and UCSC annotation for mouse) and publications, many are not well defined and have little or no expression data. To solve this problem, we developed a stringent computational pipeline to reliably identify lncRNAs represented on the Arraystar Mouse LncRNA Microarray V3.0.
Genome-wide Discovery of LncRNAs from Databases by Arraystar Scientists
Arraystar's computational approach first filters transcripts according to known coding RNAs, small ncRNAs and structural RNAs such as tRNAs and rRNAs. Next, an evaluation step investigates whether each candidate transcript contains a significant open reading frame (ORF). Finally, only multiexonic transcripts > 200 nt are retained.
Figure 1. Reliable collection of LncRNAs from databases. RNAs collected from databases are taken as input. Next, transcripts from all inputs are filtered by known annotations and positive coding potential. Finally, transcripts <200 nt long are discarded, and only LncRNAs that pass the final evaluation step are included on the Arraystar Mouse LncRNA Microarray V3.0.
Reliable Collection of LncRNAs from Publications
Intergenic lncRNAs (lincRNAs) identified by Guttman et al.
The mouse genome encodes 3,289 large intergenic non-coding RNAs (LincRNAs) that are clearly conserved across mammals and, thus, functional. LincRNAs are named according to their 3’-protein-coding gene nearby. Gene expression patterns have implicated that these LincRNAs are involved in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. 3,289 LincRNAs are represented on the Arraystar Mouse LncRNA Microarray V3.0.
Ultra-conserved Regions encoding LncRNAs (T-UCRs)
The UCRs are a subset of conserved sequences that are located in both the intra- and the intergenic regions. 481 mouse UCRs are identified by Bejerano et al., which are absolutely conserved (100%) between the orthologus regions of the human, rat and mouse genomes. The UCRs are frequently located at fragile sites and at genomic regions involved in cancers. A large fraction of UCRs encode a
particular set of ncRNAs (T-UCRs) whose expression is altered in human cancers. To help discover potential non-coding transcripts from these regions, we have designed 962 probes to target both strands of these UCRs. (http://users.soe.ucsc.edu/~jill/ultra.html)
LncRNA Array Service
LncPath™ Array Service
T-UCR Array Service
1. Guttman, M., et al., Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 2009. 458(7235): p. 223-7.
2. Khalil, A.M., et al., Many mouse large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA, 2009.106(28): p. 11667-72.
3. Alexander, D.R., et al., Integration of Genome-wide Approaches Identifies lncRNAs of Adult Neural Stem Cells and Their Progeny In Vivo. Cell Stem Cell, 2013. 12(5):p616-28.
4. Sigova, A.A., et al.,Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells.Proc Natl Acad Sci U S A, 2013. 110(8): p. 2876-81.
5. Bejerano, G., et al., Ultraconserved elements in the mouse genome. Science, 2004. 304(5675):p. 1321-5.
6. Willingham, A.T., et al., A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science, 2005. 309(5740): p. 1570-3.
7. Mercer, T.R., et al., Expression of distinct RNAs from 3' untranslated regions. Nucleic Acides Res, 2010. 39(6): p2393-403.
8. Pruitt, K.D., T. Tatusova, and D.R. Maglott, NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res,2005.33(Database issue): p. D501-4.
9. Pang, K.C., et al., RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Res, 2005.33(Database issue): p. D125-30.
10. Pang, K.C., et al., RNAdb 2.0--an expanded database of mammalian non-coding RNAs. Nucleic Acids Res, 2007. 35(Database issue): p. D178-82.
11. Carninci, P., et al., The transcriptional landscape of the mammalian genome. Science, 2005.309(5740): p. 1559-63.
12. Dinger, M.E., et al., NRED: a database of long noncoding RNA expression. Nucleic Acids Res,2009. 37(Database issue): p. D122-6.
13. Benson, D.A., et al., GenBank: update. Nucleic Acids Res, 2004.32(Database issue): p.D23-6.
14. Hsu, F.,et al., The UCSC Known Genes. Bioinformatics, 2006. 22(9): p.1036-46.
15. Mercer, T.R., et al., Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A, 2008. 105(2): p. 716-21.
16. Amaral, P.P., et al., lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res, 2011. 39(Database issue): p. D146-51.