TreeTerminus - Creating transcript trees using inferential replicate counts

Abstract

The accuracy and robustness of many types of analyses performed using RNA-seq data are directly impacted by the quality of the transcript and gene abundance estimates inferred from this data. However, a certain degree of uncertainty is always associated with the transcript abundance estimates. This uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. To circumvent this problem, methods have proposed grouping transcripts together into distinct inferential units that should be used as a base unit for analysis. However, these methods don9t take downstream analysis into account. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. To obtain fixed groups for the downstream analysis, we provide a dynamic programming (DP) approach that can be used to find a cut through the tree that optimizes one of several different objectives. We evaluated TreeTerminus on two simulated and two experimental datasets, and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics..

Publication
bioRxiv