Long-read sequencing technologies, such as ONT and PacBio, are pivotal for advancing genomic and transcriptomic analyses by facilitating full-length isoform sequencing and simplifying novel transcript discovery. However, despite the expectation that long reads span entire transcripts, many reads are shorter than anticipated and compatible with multiple transcripts. This discrepancy, coupled with the traditionally lower throughput and the presence of technical artifacts, has hindered the effectiveness of current long-read quantification methods.
In this talk, I will introduce oarfish, a novel tool that addresses these challenges by refining the probabilistic model used for long-read transcript quantification. Oarfish incorporates a new transcript coverage probability model, which improves fragment assignment and leads to significantly enhanced quantification accuracy. I will present benchmarking results from both simulated and experimental datasets across ONT and PacBio platforms, demonstrating Oarfish's superior performance across various metrics when compared to existing methods.