Identification of small molecule aggregators from large compound libraries by support vector machines

Hanbing Rao; Zerong Li; Xiangyuan Li; Xiaohua Ma; Choongyong Ung; Hu Li; Xianghui Liu; Yuzong Chen

doi:10.1002/jcc.21347

Identification of small molecule aggregators from large compound libraries by support vector machines

Hanbing Rao, Zerong Li, Xiangyuan Li, Xiaohua Ma, Choongyong Ung, Hu Li, Xianghui Liu, Yuzong Chen

Pharmacology

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of .17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR. compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1,14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates.

Original language	English (US)
Pages (from-to)	752-763
Number of pages	12
Journal	Journal of Computational Chemistry
Volume	31
Issue number	4
DOIs	https://doi.org/10.1002/jcc.21347
State	Published - Mar 2010

Keywords

Active compound
Aggregation
Aggregator
Drug discovery
High throughput screening
Machine learning method
Recursive feature elimination
Support vector machine
Virtual screening

ASJC Scopus subject areas

General Chemistry
Computational Mathematics

Access to Document

10.1002/jcc.21347

Cite this

@article{c117883f6efe4a9e98fa1c7d9f2a283f,

title = "Identification of small molecule aggregators from large compound libraries by support vector machines",

abstract = "Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of .17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR. compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1,14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates.",

keywords = "Active compound, Aggregation, Aggregator, Drug discovery, High throughput screening, Machine learning method, Recursive feature elimination, Support vector machine, Virtual screening",

author = "Hanbing Rao and Zerong Li and Xiangyuan Li and Xiaohua Ma and Choongyong Ung and Hu Li and Xianghui Liu and Yuzong Chen",

year = "2010",

month = mar,

doi = "10.1002/jcc.21347",

language = "English (US)",

volume = "31",

pages = "752--763",

journal = "Journal of Computational Chemistry",

issn = "0192-8651",

publisher = "John Wiley and Sons Inc.",

number = "4",

}

TY - JOUR

T1 - Identification of small molecule aggregators from large compound libraries by support vector machines

AU - Rao, Hanbing

AU - Li, Zerong

AU - Li, Xiangyuan

AU - Ma, Xiaohua

AU - Ung, Choongyong

AU - Li, Hu

AU - Liu, Xianghui

AU - Chen, Yuzong

PY - 2010/3

Y1 - 2010/3

N2 - Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of .17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR. compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1,14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates.

AB - Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of .17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR. compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1,14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates.

KW - Active compound

KW - Aggregation

KW - Aggregator

KW - Drug discovery

KW - High throughput screening

KW - Machine learning method

KW - Recursive feature elimination

KW - Support vector machine

KW - Virtual screening

UR - http://www.scopus.com/inward/record.url?scp=76249102379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76249102379&partnerID=8YFLogxK

U2 - 10.1002/jcc.21347

DO - 10.1002/jcc.21347

M3 - Article

C2 - 19569201

AN - SCOPUS:76249102379

SN - 0192-8651

VL - 31

SP - 752

EP - 763

JO - Journal of Computational Chemistry

JF - Journal of Computational Chemistry

IS - 4

ER -

Identification of small molecule aggregators from large compound libraries by support vector machines

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this