What Are the Limitations of RNA-seq for LncRNA Profiling?

 

LncRNAs often express and function at low abundance, buried in other classes of abundant RNAs (Fig. 1A and 1B)[1]. There are serious limitations of RNA-seq for lncRNA profiling.

Poor quantification of lncRNAs due to the low abundance and the lack of completeness in lncRNA annotation

For mere detection of the presence of a lncRNA, a few reproducible sequencing reads should suffice. But for quantification, at least hundreds read counts are required to reliably represent the RNA level due to the inherent numeric Poisson errors of RNA-seq count data with over-dispersion [2]. LncRNAs are generally ~10X less abundant than mRNA [3]. RNA-seq quantification at these low lncRNA levels is unacceptably poor and not nearly sufficient for differential expression analysis [1, 4] (Fig. 1C and 1D). Although increasing RNA-seq depth can improve better expressed transcripts such as mRNAs to certain extent, the improvement for lowly expressed transcripts such as lncRNAs is not significant. Even if the sequencing coverage is increased to an unaffordably deep coverage (dotted curve, several hundred times the normal RNA-seq coverage at 20 mil), a large proportion (40%) of transcripts can never be reliably quantified (Fig.1C)[4]. Additionally, FPKM (Fragments Per Kilobase of transcript per Million mapped reads) calculation in RNA-seq depends on accurate lncRNA transcript model lengths, many of which still lack completeness in lncRNA annotation [5]. In contrast, LncRNA Microarray oligo probes hybridize the target RNA at high affinity, independent of other abundant RNAs. The microarrays are highly sensitive and accurate even for low abundance lncRNAs [6] (Fig. 1D) .

Limitation-1

Figure 1. (A) The median lncRNA expression level is approximately 10X lower than that of mRNAs (based on GENCODE data)[3]. (B) Top 1% of the highest expressed genes, such as housekeeping genes, occupy ~40% of RNA-seq signal.  Lowly expressed lncRNAs receive very little sequencing coverage [1]. (C) In a typical mRNA-seq depth at 40 million reads, < 10% lncRNAs can be reliably quantified [4]. Even if the sequencing coverage is increased to unaffordably deep depth (dotted curve, several hundred times the normal RNA-seq coverage at 40 mil), a large proportion (40%) of transcripts can never be reliably quantified. (D) While quantitative error becomes unacceptably high for RNA-seq when the RNA level is in the low range, microarray continues to perform very well [6].

Ambiguous detection of LncRNA isoforms caused by the weak splice profile and the missing connectivity information

LncRNAs often have multiple transcript isoforms, which are more flexible and modular without the constraint to maintain a continuous open reading frame as mRNAs [1]. The transcript isoforms can function differently in complex genomic and regulatory relationships with their target mRNA genes. Profiling lncRNAs at transcript-specific level is important. However, RNA-seq coverage for the splice profiles is weak and non-uniform, particularly for non-predominant isoforms [1] (Fig. 2A).  Even at saturating coverage, accurate reconstruction of transcript isoform is inherently challenging due to the missing connectivity information with the short reads in distant exons on the same RNA fragment [1]. These make reconstructing lncRNA transcript isoforms and quantification very difficult [7-10]. For LncRNA Microarrays, the transcript-specific array probe design is based on well established transcript models for each lncRNA isoform, which is unambiguous and highly accurate in isoform detection and quantification (Fig. 2B).

Limitation-2

Figure 2. (A) Compared with better expressed mRNAs, lowly expressed lncRNA exons in the transcript isoforms cannot be adequately covered by short RNA-seq reads to reconstruct the spliced architectures nor their quantification [1]. (B) Arraystar LncRNA Microarray transcript-specific probes unambiguously and accurately detect and quantify transcript isoforms BCL-XL, BCL-XS, and ENST412972 having distinct oncogenic functions. The “Gene-specific” probes not designed for lncRNA isoforms cannot make such distinction. The arrows indicate the transcription direction.

Systematic and functional lncRNA annotation database publically unavailable for RNA seq analysis

 Unlike well established and curated protein coding genes, RNA-seq raw data are still in need of well resourced and consolidated reference bases for mapping and annotation, which are not readily publically available. However, the short RNA-seq reads, the RNA-seq processes creating non-uniform read coverage particularly at the 5’ and 3’ ends, and a combination of RNA degradation and the reverse transcription not always able to copy the entire RNA to the 5’ end make lncRNA annotation often incomplete at5’- or 3’-ends [1].

Arraystar Microarray lncRNA contents are based on the foundation of high quality proprietary Arraystar lncRNA transcriptome databases that extensively collect lncRNAs through all major public databases and repositories, knowledge-based mining of scientific publications, and our lncRNA collection pipelines.The microarray annotation and analyses are , rich, detailed, and comprehensive, unrivaled by any other profiling platforms.

Table 1. LncRNA Microarray vs RNA-seq for lncRNA profiling

LncRNA Microarray

RNA-Seq

High sensitivity and quantification accuracy for lncRNAs as low as 1 transcript/cell.

Most lncRNAs at low levels cannot be accurately and reliably quantified.

Natively specific for RNA strandedness for both sense and antisense lncRNAs.

Stranded RNA-sequencing library prep required.

Unambiguous and specific lncRNA isoform detection/quantification.

Poor sensitivity and accuracy for lncRNA isoforms.

Arraystar LncRNA Microarray premium lncRNA collection, annotation and analyses. Entire coding mRNA gene set also included.

Public lncRNA reference databases can be deficient. Systematic lncRNA annotation and analyses are not readily available for the RNA-seq data.

 

Related Service

LncRNA Array Service


Reference

1. Deveson, I.W., et al., The Dimensions, Dynamics, and Relevance of the Mammalian Noncoding Transcriptome. Trends Genet, 2017. 33(7): p. 464-478.
2. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome Biol, 2010. 11(10): p. R106.
3. Derrien, T., et al., The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res, 2012. 22(9): p. 1775-89.
4. Labaj, P.P., et al., Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics, 2011. 27(13): p. i383-91.
5. Uszczynska-Ratajczak, B., et al., Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet, 2018. 19(9): p. 535-548.
6. Zhang, X., et al., Maternally expressed gene 3 (MEG3) noncoding ribonucleic acid: isoform structure, expression, and functions. Endocrinology, 2010. 151(3): p. 939-47.
7. Consortium, S.M.-I., A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol, 2014. 32(9): p. 903-14.
8. Liu, Y., et al., Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. PLoS One, 2013. 8(6): p. e66883.
9. Steijger, T., et al., Assessment of transcript reconstruction methods for RNA-seq. Nat Methods, 2013. 10(12): p. 1177-84.
10. Baruzzo, G., et al., Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods, 2017. 14(2): p. 135-139.

 

Back to news