2022 Annual Meeting

(688h) Robustness of Machine Learning-Based Classifiers for Disease Diagnostics

Checkout You must be logged in to view this content. Log in now.

Pricing

Individuals

List Price	225.00
AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Authors

Joshua Chuah - Presenter

Juergen Hahn, Rensselaer Polytechnic Institute

Ge Wang, Rensselaer Polytechnic Institute

Pingkun Yan, Rensselaer Polytechnic Institute

Robustness of Machine Learning-based Classifiers for Disease Diagnostics

Joshua Chuah^1,3, Pingkun Yan^1,3, Ge Wang^1,3, and Juergen Hahn^1,2,3

¹Department of Biomedical Engineering

²Department of Chemical & Biological Engineering

³Center for Biotechnology & Interdisciplinary Studies

Rensselaer Polytechnic Institute, Troy NY 12180

The amount of data that can be used for diagnosing diseases and disorders is increasing at a substantial rate due to developments in various âomics technologies. For example, it is common to receive information about the concentration of many hundreds and sometimes even thousands of biochemical molecules, e.g., metabolites, from a single blood sample¹. While the traditional approach has been to evaluate the concentrations of these molecules individually by comparing concentrations of someone with a diagnosis with concentration data collected from a control group, this rarely leads to diagnostically meaningful results for most complex health conditions due to several reasons: (1) there is a strong heterogeneity found in the concentrations of these molecules in the control group, i.e., a concentration measurement far from the mean of the control group does not automatically mean that the measurement came from someone with the health condition under investigation²; (2) many of these measurements are not independent at they come from the same biological pathways or from pathways that are interacting with each other³; and (3) there is rarely one measurement that can serve as a biomarker for diagnosis of complex conditions due to the fact that most biological pathways involve feedback loops that try to keep concentrations within a certain range⁴.

While individual measurements of some variables have been successfully used as biomarkers for some health conditions, e.g., blood glucose measurements for diagnosing diabetes, this approach is not feasible for conditions such as Autism⁵, Alzheimerâs⁶, Depression⁷, Schizophrenia⁷⁸, to just name a few examples where observation-based diagnosis is the standard. Instead, approaches that involve multivariate statistical analysis to find patterns among a number of measurements have to be used. In fact, there has been a significant amount of research in this realm recently which has also resulted in a number of patents for diagnosing health conditions via AI/ML approaches⁹. However, this results in a challenge for the Food and Drug Administration (FDA) as there are few clear guidelines on how to evaluate these proposed AI/ML-based diagnostic tools with regards to their robustness. This is also a problem that has been identified by the National Institutes of Health which posted a recent call for proposals for the âvalidation of digital health and artificial intelligence tools for improved assessment in epidemiological, clinical, and intervention researchâ¹⁰.

This work tries to address the point that there is a need to investigate how to evaluate AI/ML-based diagnostic algorithms with regards to their robustness without recomputing the classifier itself. This is done by proposing components of a framework that can be used to compare the robustness of already developed classifiers. This framework consists of factor analysis and a Monte Carlo approach for evaluating accuracy and changes in the accuracy under a number of different noise levels and scenarios. In contrast to other studies that focus on comparisons of different AI/ML based algorithms with regards to their accuracy, the focus of our work is on (a) a general evaluation framework, and (b) on the robustness of the accuracy due to changes in the data set that are common among patient populations. In order to illustrate this framework, five commonly used machine learning techniques (linear discriminant analysis¹¹, support vector machines¹², random forest¹³, partial-least squares discriminant analysis¹⁴, and logistics regression¹⁵) are investigated in a very detailed case study.

The application makes use of a metabolomics data set involving 24 measured metabolites taken from 159 study participants to determine if a classifier is able to detect a medical diagnosis. Overall, it was found that all five supervised classification techniques perform similarly with regards to their accuracy, however, significant differences were observed with regards to the robustness and in particular, random forest showed a stronger variation in the performance than the other four. This point is especially relevant as our evaluation framework indicated that the random forest-based classifier should be less robust even before the data on the testing set was analyzed to corroborate this finding. An additional finding is that the amount of replacement noise that a classifier can tolerate to stay above a desired classification accuracy can also be predicted based upon the best prediction accuracy achieved for the nominal data set, i.e., the data set that the classifier was developed for without perturbing it further. Lastly, it was found that a high variance in prediction performance or in the estimated values of the parameters of the classifier in response to small perturbations in the data may indicate that a classifier will not generalize well to new data not used for classifier development.

References:

Krassowski, M., Das, V., Sahu, S. K., & Misra, B. B. (2020). State of the field in multi-omics research: From computational needs to data mining and sharing. Frontiers in Genetics, 11. https://doi.org/10.3389/fgene.2020.610798
Ghosh, T., Zhang, W., Ghosh, D., & Kechris, K. (2020). Predictive modeling for Metabolomics Data. Computational Methods and Data Analysis for Metabolomics, 313â336. https://doi.org/10.1007/978-1-0716-0239-3_16
Johnson, C. H., Ivanisevic, J., & Siuzdak, G. (2016). Metabolomics: Beyond biomarkers and towards mechanisms. Nature Reviews Molecular Cell Biology, 17(7), 451â459. https://doi.org/10.1038/nrm.2016.25
Liebal, U. W., Phan, A. N., Sudhakar, M., Raman, K., & Blank, L. M. (2020). Machine learning applications for mass spectrometry-based metabolomics. Metabolites, 10(6), 243. https://doi.org/10.3390/metabo10060243
Vargason, T., Grivas, G., Hollowood-Jones, K. L., & Hahn, J. (2020). Towards a multivariate biomarker-based diagnosis of autism spectrum disorder: Review and discussion of recent advancements. Seminars in Pediatric Neurology, 34, 100803. https://doi.org/10.1016/j.spen.2020.100803
Blennow, K., & Zetterberg, H. (2018). Biomarkers for alzheimer's disease: Current status and prospects for the future. Journal of Internal Medicine, 284(6), 643â663. https://doi.org/10.1111/joim.12816
Schmidt, H. D., Shelton, R. C., & Duman, R. S. (2011). Functional biomarkers of depression: Diagnosis, treatment, and pathophysiology. Neuropsychopharmacology, 36(12), 2375â2394. https://doi.org/10.1038/npp.2011.151
Lai, C.-Y., Scarr, E., Udawela, M., Everall, I., Chen, W. J., & Dean, B. (2016). Biomarkers in schizophrenia: A focus on blood based diagnostics and Theranostics. World Journal of Psychiatry, 6(1), 102. https://doi.org/10.5498/wjp.v6.i1.102
Wu, E., Wu, K., Daneshjou, R., Ouyang, D., Ho, D. E., & Zou, J. (2021). How medical AI devices are evaluated: Limitations and recommendations from an analysis of FDA approvals. Nature Medicine, 27(4), 582â584. https://doi.org/10.1038/s41591-021-01312-x
S. Department of Health and Human Services. (n.d.). Not-CA-22-037: Notice of special interest (NOSI): Validation of digital health and artificial intelligence tools for improved assessment in epidemiological, clinical, and intervention research. National Institutes of Health. Retrieved February 11, 2022, from https://grants.nih.gov/grants/guide/notice-files/NOT-CA-22-037.html
Vaclavik, L., Schreiber, A., Lacina, O., Cajka, T., & Hajslova, J. (2011). Liquid chromatographyâmass spectrometry-based metabolomics for authenticity assessment of fruit juices. Metabolomics, 8(5), 793â803. https://doi.org/10.1007/s11306-011-0371-7
Mahadevan, S., Shah, S. L., Marrie, T. J., & Slupsky, C. M. (2008). Analysis of metabolomic data using support Vector Machines. Analytical Chemistry, 80(19), 7562â7570. https://doi.org/10.1021/ac800954c
Chen, T., Cao, Y., Zhang, Y., Liu, J., Bao, Y., Wang, C., Jia, W., & Zhao, A. (2013). Random Forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-Based Complementary and Alternative Medicine, 2013, 1â11. https://doi.org/10.1155/2013/298183
Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis â a marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10â23. https://doi.org/10.1016/j.aca.2015.02.012
van der Kloet, F. M., Tempels, F. W., Ismail, N., van der Heijden, R., Kasper, P. T., Rojas-Cherto, M., van Doorn, R., Spijksma, G., Koek, M., van der Greef, J., MÃ¤kinen, V. P., Forsblom, C., HolthÃ¶fer, H., Groop, P. H., Reijmers, T. H., & Hankemeier, T. (2011). Discovery of early-stage biomarkers for diabetic kidney disease using MS-based Metabolomics (Finndiane Study). Metabolomics, 8(1), 109â119. https://doi.org/10.1007/s11306-011-0291-6

Breadcrumb

2022 Annual Meeting

(688h) Robustness of Machine Learning-Based Classifiers for Disease Diagnostics

Authors