GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides

Jaskaran Singh; Narendra N. Khanna; Ranjeet K. Rout; Narpinder Singh; John R. Laird; Inder M. Singh; Mannudeep K. Kalra; Laura E. Mantella; Amer M. Johri; Esma R. Isenovic; Mostafa M. Fouda; Luca Saba; Mostafa Fatemi; Jasjit S. Suri

doi:10.1038/s41598-024-56786-9

GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides

Jaskaran Singh, Narendra N. Khanna, Ranjeet K. Rout, Narpinder Singh, John R. Laird, Inder M. Singh, Mannudeep K. Kalra, Laura E. Mantella, Amer M. Johri, Esma R. Isenovic, Mostafa M. Fouda, Luca Saba, Mostafa Fatemi, Jasjit S. Suri

Physiology & Biomedical Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint’s GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.

Original language	English (US)
Article number	7154
Journal	Scientific reports
Volume	14
Issue number	1
DOIs	https://doi.org/10.1038/s41598-024-56786-9
State	Published - Dec 2024

Keywords

Composite feature extraction
Ensemble deep learning
Ensemble machine learning
Statistical tests
Validation
miRNA classification

ASJC Scopus subject areas

General

Access to Document

10.1038/s41598-024-56786-9

Cite this

Singh, J., Khanna, N. N., Rout, R. K., Singh, N., Laird, J. R., Singh, I. M., Kalra, M. K., Mantella, L. E., Johri, A. M., Isenovic, E. R., Fouda, M. M., Saba, L., Fatemi, M., & Suri, J. S. (2024). GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Scientific reports, 14(1), Article 7154. https://doi.org/10.1038/s41598-024-56786-9

Singh, J, Khanna, NN, Rout, RK, Singh, N, Laird, JR, Singh, IM, Kalra, MK, Mantella, LE, Johri, AM, Isenovic, ER, Fouda, MM, Saba, L, Fatemi, M & Suri, JS 2024, 'GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides', Scientific reports, vol. 14, no. 1, 7154. https://doi.org/10.1038/s41598-024-56786-9

@article{3c7684bab7b74f40815e0e1d8d77e61d,

title = "GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides",

abstract = "Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint{\textquoteright}s GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.",

keywords = "Composite feature extraction, Ensemble deep learning, Ensemble machine learning, Statistical tests, Validation, miRNA classification",

author = "Jaskaran Singh and Khanna, {Narendra N.} and Rout, {Ranjeet K.} and Narpinder Singh and Laird, {John R.} and Singh, {Inder M.} and Kalra, {Mannudeep K.} and Mantella, {Laura E.} and Johri, {Amer M.} and Isenovic, {Esma R.} and Fouda, {Mostafa M.} and Luca Saba and Mostafa Fatemi and Suri, {Jasjit S.}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = dec,

doi = "10.1038/s41598-024-56786-9",

language = "English (US)",

volume = "14",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - GeneAI 3.0

T2 - powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides

AU - Singh, Jaskaran

AU - Khanna, Narendra N.

AU - Rout, Ranjeet K.

AU - Singh, Narpinder

AU - Laird, John R.

AU - Singh, Inder M.

AU - Kalra, Mannudeep K.

AU - Mantella, Laura E.

AU - Johri, Amer M.

AU - Isenovic, Esma R.

AU - Fouda, Mostafa M.

AU - Saba, Luca

AU - Fatemi, Mostafa

AU - Suri, Jasjit S.

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/12

Y1 - 2024/12

N2 - Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint’s GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.

AB - Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint’s GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.

KW - Composite feature extraction

KW - Ensemble deep learning

KW - Ensemble machine learning

KW - Statistical tests

KW - Validation

KW - miRNA classification

UR - http://www.scopus.com/inward/record.url?scp=85188603038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85188603038&partnerID=8YFLogxK

U2 - 10.1038/s41598-024-56786-9

DO - 10.1038/s41598-024-56786-9

M3 - Article

C2 - 38531923

AN - SCOPUS:85188603038

SN - 2045-2322

VL - 14

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 7154

ER -

GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this