Limitations of RNA-Seq

Gene expression studies can often benefit from testing large numbers of samples, obtaining repeatable as well as accurate results and from low sequencing costs/sample, and low analysis costs. Analysis platforms have evolved from outdated hybridization array based technologies to RNA-Seq, and quantitative real time reverse transcription PCR (q-RT-PCR) has evolved into highly multiplexed platforms. Though RNA-Seq has become a gold standard and can be used as a quantitative assay to determine relative transcript abundance, it is costly, onerous, and employs a time intensive process for assay design, running the assay and data analysis.

Generating libraries for mRNA sequencing is a difficult and often error prone process involving many steps with loss of sample at every step. The RNA must be extracted and reverse transcribed, then processed further to generate the sequencing library. The presence of high abundance RNAs (rRNA, etc) requires additional steps to reduce background RNA and/or enrich for mRNAs. Although these methods can help data quality, they add to the labor, cost and time required and deplete the amount of original sample, which is especially problematic when working with needle biopsies, rare transcripts or single cells. To address these issues, many resort to pre-amplification of the RNA as well as to deeper sequencing to increase the number of reads. This presents challenges to data analysis, reduces the number of samples that can be batched together in a single library, and increases both cost per sample and time.

RNA-Seq is best used to identify new biomarkers or mutations, but is especially inefficient if gene markers have already been identified and consequently all the information of interest comes from this focused subset of genes. While qRT-PCR based methods are perfectly acceptable when measuring low numbers of targets, they are impractical when large numbers of targets need to be analyzed with high throughput sample processing, and like RNA-Seq, require RNA to be extracted and reverse transcribed. Multiplexing the measurement of more than one gene at a time within the same PCR reaction requires extensive optimization, and is limited to at most 4 genes at a time in any given reaction. Microfluidic and microplate platforms are available that permit a sample to be split across multiple PCR reactions for many different genes, but when configured, for instance, to measure 96 samples across 96 genes, the cost per sample is very high and the amount of sample is limiting.

Targeted sequencing methods have been developed that range from capturing a subset of targeted genes on an array and then releasing and sequencing these, to use of targeted PCR primers or reverse transcriptase primers to selectively amplify and process subsets of targeted genes. However, primer amplification-based methods of targeted sequencing have proven difficult to develop for a set of primers for each target, and as a result content is typically limited to 500 to 1000 genes. These approaches still require extraction and reverse transcription.

These serious limitations of qPCR, RNA-Seq, and targeted sequencing methods have driven the need for a higher throughput, higher content, simpler, and more sensitive targeted sequencing approach that is not limited in the number of genes that can be measured, or by the complexity of developing assays with different content, and which reduces the complexity of the transcriptome to short read sequences for each targeted gene.

Figure 1: TempO-Seq Biochemistry

Introducing Tempo-Seq

To address these challenges, BioSpyder Technologies has developed a novel product for targeted sequencing called TempO-Seq™, designed to monitor hundreds to thousands of genes at once in high throughput from as little as 10 pg of total RNA (the amount from a single cell) without pre-amplification, to maximize utilization of precious or limited samples. Based on BioSpyder Technologies’ proprietary Templated Oligo Detection Assay, TempO-Seq can quantitate targeted transcripts in an easy to follow workflow that does not require dedicated equipment. It can be run in a standard PCR instrument or microplate incubator manually or using standard pipetting platforms. The assay is highly amenable to automation, enabling implementation on 96-, 384-, and 1536-well formats. Sample barcoding, together with sequencing of short templates to measure each gene, allows pooling up to 6,144 samples in one sequencing run. Assay content is flexible and customizable, from focused panels monitoring specific genes or cellular pathways up to the whole transcriptome, delivering unprecedented accuracy and sensitivity for low level inputs. Investigators can select focused content from an archive of detector oligos measuring the whole transcriptome, and add additional custom content such as the measurement of specific isotypes, fusions, or mutations. Together with robust probe design and simplified data analysis that eliminates the need for bioinformatics, TempO-Seq assays deliver an easy to use solution for customers doing expression profiling for any species.

TempO-Seq is unique in its capacity to avoid RNA purification or reverse transcription, by targeting RNAs with detector oligos and removing excess probes and enzymatic inhibitors before the first enzymatic step. Correctly hybridized detector oligos are ligated, then amplified through primer landing sites that are shared among all probes (Figure 1).

This approach permits high target multiplexing because although the central part of the ligated oligos contains diverse sequences, only two PCR primers are needed for any sample, eliminating the primer cross-hybridization and competition inherent in multiplex PCR. Dual sequence tags are incorporated during PCR, to identify up to 6,144 samples in one sequencing library.

Key advantages to TempO-Seq include the capacity to definitively assign correctly ligated products to their RNA targets because the product is sequenced rather than read out on an array. Mis-ligated products can also be detected, unlike on arrays. As a result, and due to BioSpyder optimization efforts, the background reads for no-sample controls are nearly zero. Another advantage to TempO-Seq is that the assay only reports the intended targets. There is no need to eliminate globin or ribosomal RNAs. In addition, the requirement for ligation of two hybridized detector oligos means that the assay demonstrates excellent specificity, with 95% or more single base differential detection. Consequently, the assay selectively measures and discriminates between all members of highly homologous gene families, such as the CYP450. The assay does not require dedicated machinery, and is amenable to automation for high throughput applications.

Figure 2: Typical TempO-Seq Performance

The assay demonstrates excellent reproducibility (Figure 2), with log2 R2 values routinely exceeding 0.99. The data shown are raw reads, no normalization, for triplicates of total RNA preps from two cell lines (left and center panels). Comparing these cell lines shows dramatic differences in expression, as expected (right panel). Finally, because the sequencing reports the number of ligated probes, the data analysis is simple. Rather than aligning the reads to the genome, TempO-Seq reads are compared to a look up table of ligated detector oligos input to the assay, a task that can be completed on a standard PC within minutes. BioSpyder has developed a data analysis pipeline for users to convert FASTQ files to data tables, with assay quality metrics reported as well, eliminating the need for investigators to have bioinformatics support to perform analysis and generate tables of gene identity versus abundance, or to normalize data between replicates and treatments.

Figure 3: Expression Levels for ~900 Probes in Triplicate TempO-Seq Assays.

Detector Oligo Design

TempO-Seq’s whole trancriptome probe design pipeline creates detector oligos by maximizing the number of isoforms targeted per gene, selecting for the optimal GC content/thermodynamic properties, selecting against hairpins, avoiding homopolymer stretches and repetitive elements, and avoiding detector oligos pairs  that overlap SNPs. Additionally, the TempO-Seq detector oligo design pipeline designs detector oligo pairs with maximum ligation efficiency and specificity. The design pipeline takes advantage of the ligation properties of the TempO-Seq assay to design probes that can specifically detect all genes in the transcriptome with the ability to distinguish a 1 base-pair mismatch with 99% specificity. The design pipeline also minimizes background noise by controlling for nonspecific interactions even at transcriptome level multiplexing. The output of the probe design pipeline is a pair of detector oligos that produce a robust signal with little non-specific background. Additionally, our probe design pipeline, like the TempO-Seq assay, is highly modular with the ability to add and remove detector oligos pairs to/from the TempO-Seq assay without affecting comparison to previously run data. Options permit the design of detector oligos to measure different isotypes of the same gene, exon junctions, splice variants, fusions, and mutations.

Figure 4: Principle Component Analysis of Expression Profiles.

Data Analysis

Data analysis for the TempO-Seq assay is fast, accurate, and simple. Our ligated 50-mer Detector Oligos are designed to maximize the distance between all probes making each sequence unique. This design greatly simplifies the data analysis portion because reads produced by the assay are unambiguously assigned to a specific 50-mer associated with a gene, reducing alignment to a lookup table. Using our TempO-Seq analysis software package, reads generated by TempO-Seq can be aligned to each targeted gene and a table of gene identity vs abundance quickly generated using a standard laptop computer. TempO-Seq accurately maps > 95% of HTS reads with little or no false positives (reads mapping to the wrong genes). TempO-Seq is sequencing platform independent and available as an easy to install standalone package that is able to align reads, output standard quality control metrics for HTS data, normalize the data and calculate average and standard deviation for each gene between replicate measurements, and provide several different types of data visualization including heat maps (Figure 3) or PCA (Figure 4) for data analysis. Future versions of TempO-Seq will include differential expression analysis routines, pathway analysis, and higher dimensions of data visualization.