Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties

Juan Cui, Lian Yi Han, Hu Li, Choong Yong Ung, Zhi Qun Tang, Chan Juan Zheng, Zhi Wei Cao, Yu Zong Chen

Research output: Contribution to journalArticlepeer-review

60 Scopus citations


Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at

Original languageEnglish (US)
Pages (from-to)514-520
Number of pages7
JournalMolecular Immunology
Issue number4
StatePublished - Jan 1 2007


  • Allergen
  • Immunology
  • Statistical learning method
  • Support vector machine

ASJC Scopus subject areas

  • Immunology
  • Molecular Biology


Dive into the research topics of 'Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties'. Together they form a unique fingerprint.

Cite this