LncRNA Research

V3.0 LncRNA Collection


Although tens of thousands of mouse lncRNAs are annotated and taken from existing database sources (RefSeq, Gencode and UCSC annotation for mouse) and publications, many are not well defined and have little or no expression data. To solve this problem, we developed a stringent computational pipeline to reliably identify lncRNAs represented on the Arraystar Mouse LncRNA Microarray V3.0.

Genome-wide Discovery of LncRNAs from Databases by Arraystar Scientists

Arraystar's computational approach first filters transcripts according to known coding RNAs, small ncRNAs and structural RNAs such as tRNAs and rRNAs. Next, an evaluation step investigates whether each candidate transcript contains a significant open reading frame (ORF). Finally, only multiexonic transcripts > 200 nt are retained.


Figure 1. Reliable collection of LncRNAs from databases. RNAs collected from databases are taken as input. Next, transcripts from all inputs are filtered by known annotations and positive coding potential. Finally, transcripts <200 nt long are discarded, and only LncRNAs that pass the final evaluation step are included on the Arraystar Mouse LncRNA Microarray V3.0.

Reliable Collection of LncRNAs from Publications

Intergenic lncRNAs (lincRNAs) identified by Guttman et al.

The mouse genome encodes 3,289 large intergenic non-coding RNAs (LincRNAs) that are clearly conserved across mammals and, thus, functional. LincRNAs are named according to their 3’-protein-coding gene nearby. Gene expression patterns have implicated that these LincRNAs are involved in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. 3,289 LincRNAs are represented on the Arraystar Mouse LncRNA Microarray V3.0.

Ultra-conserved Regions encoding LncRNAs (T-UCRs)

The UCRs are a subset of conserved sequences that are located in both the intra- and the intergenic regions. 481 mouse UCRs are identified by Bejerano et al., which are absolutely conserved (100%) between the orthologus regions of the human, rat and mouse genomes. The UCRs are frequently located at fragile sites and at genomic regions involved in cancers. A large fraction of UCRs encode a
particular set of ncRNAs (T-UCRs) whose expression is altered in human cancers. To help discover potential non-coding transcripts from these regions, we have designed 962 probes to target both strands of these UCRs. (

