RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model

Ying Li, Liguo Wang

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations


CPAT (Coding-Potential Assessment Tool) is a logistic regression model–based classifier that can accurately and quickly distinguish protein-coding and noncoding RNAs using pure linguistic features calculated from the RNA sequences. CPAT takes as input the nucleotides sequences or genomic coordinates of RNAs and outputs the probabilities p (0 ≤ p ≤ 1), which measure the likelihood of protein coding. Users can run CPAT online (http://lilab.research.bcm.edu/cpat/ ) or from the local computers after installation. CPAT provides prebuilt logistic models to recognize RNAs originated from human (Homo sapiens), mouse (Mus musculus), zebrafish (Danio rerio), and fly (Drosophila melanogaster) genomes. Instructions on how to train models for other genomes are described in CPAT website (http://rna-cpat.sourceforge.net/ ) and this chapter.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Number of pages13
StatePublished - 2021

Publication series

NameMethods in Molecular Biology
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029


  • LincRNA
  • LncRNA
  • Logistic regression
  • Noncoding RNA
  • Prediction
  • Protein coding

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics


Dive into the research topics of 'RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model'. Together they form a unique fingerprint.

Cite this