(688h) Robustness of Machine Learning-Based Classifiers for Disease Diagnostics | AIChE

(688h) Robustness of Machine Learning-Based Classifiers for Disease Diagnostics

Authors 

Hahn, J., Rensselaer Polytechnic Institute
Wang, G., Rensselaer Polytechnic Institute
Yan, P., Rensselaer Polytechnic Institute
Robustness of Machine Learning-based Classifiers for Disease Diagnostics

Joshua Chuah1,3, Pingkun Yan1,3, Ge Wang1,3, and Juergen Hahn1,2,3

1Department of Biomedical Engineering

2Department of Chemical & Biological Engineering

3Center for Biotechnology & Interdisciplinary Studies

Rensselaer Polytechnic Institute, Troy NY 12180

The amount of data that can be used for diagnosing diseases and disorders is increasing at a substantial rate due to developments in various –omics technologies. For example, it is common to receive information about the concentration of many hundreds and sometimes even thousands of biochemical molecules, e.g., metabolites, from a single blood sample1. While the traditional approach has been to evaluate the concentrations of these molecules individually by comparing concentrations of someone with a diagnosis with concentration data collected from a control group, this rarely leads to diagnostically meaningful results for most complex health conditions due to several reasons: (1) there is a strong heterogeneity found in the concentrations of these molecules in the control group, i.e., a concentration measurement far from the mean of the control group does not automatically mean that the measurement came from someone with the health condition under investigation2; (2) many of these measurements are not independent at they come from the same biological pathways or from pathways that are interacting with each other3; and (3) there is rarely one measurement that can serve as a biomarker for diagnosis of complex conditions due to the fact that most biological pathways involve feedback loops that try to keep concentrations within a certain range4.

While individual measurements of some variables have been successfully used as biomarkers for some health conditions, e.g., blood glucose measurements for diagnosing diabetes, this approach is not feasible for conditions such as Autism5, Alzheimer’s6, Depression7, Schizophrenia78, to just name a few examples where observation-based diagnosis is the standard. Instead, approaches that involve multivariate statistical analysis to find patterns among a number of measurements have to be used. In fact, there has been a significant amount of research in this realm recently which has also resulted in a number of patents for diagnosing health conditions via AI/ML approaches9. However, this results in a challenge for the Food and Drug Administration (FDA) as there are few clear guidelines on how to evaluate these proposed AI/ML-based diagnostic tools with regards to their robustness. This is also a problem that has been identified by the National Institutes of Health which posted a recent call for proposals for the “validation of digital health and artificial intelligence tools for improved assessment in epidemiological, clinical, and intervention research”10.

This work tries to address the point that there is a need to investigate how to evaluate AI/ML-based diagnostic algorithms with regards to their robustness without recomputing the classifier itself. This is done by proposing components of a framework that can be used to compare the robustness of already developed classifiers. This framework consists of factor analysis and a Monte Carlo approach for evaluating accuracy and changes in the accuracy under a number of different noise levels and scenarios. In contrast to other studies that focus on comparisons of different AI/ML based algorithms with regards to their accuracy, the focus of our work is on (a) a general evaluation framework, and (b) on the robustness of the accuracy due to changes in the data set that are common among patient populations. In order to illustrate this framework, five commonly used machine learning techniques (linear discriminant analysis11, support vector machines12, random forest13, partial-least squares discriminant analysis14, and logistics regression15) are investigated in a very detailed case study.

The application makes use of a metabolomics data set involving 24 measured metabolites taken from 159 study participants to determine if a classifier is able to detect a medical diagnosis. Overall, it was found that all five supervised classification techniques perform similarly with regards to their accuracy, however, significant differences were observed with regards to the robustness and in particular, random forest showed a stronger variation in the performance than the other four. This point is especially relevant as our evaluation framework indicated that the random forest-based classifier should be less robust even before the data on the testing set was analyzed to corroborate this finding. An additional finding is that the amount of replacement noise that a classifier can tolerate to stay above a desired classification accuracy can also be predicted based upon the best prediction accuracy achieved for the nominal data set, i.e., the data set that the classifier was developed for without perturbing it further. Lastly, it was found that a high variance in prediction performance or in the estimated values of the parameters of the classifier in response to small perturbations in the data may indicate that a classifier will not generalize well to new data not used for classifier development.

References:

  1. Krassowski, M., Das, V., Sahu, S. K., & Misra, B. B. (2020). State of the field in multi-omics research: From computational needs to data mining and sharing. Frontiers in Genetics, 11. https://doi.org/10.3389/fgene.2020.610798
  2. Ghosh, T., Zhang, W., Ghosh, D., & Kechris, K. (2020). Predictive modeling for Metabolomics Data. Computational Methods and Data Analysis for Metabolomics, 313–336. https://doi.org/10.1007/978-1-0716-0239-3_16
  3. Johnson, C. H., Ivanisevic, J., & Siuzdak, G. (2016). Metabolomics: Beyond biomarkers and towards mechanisms. Nature Reviews Molecular Cell Biology, 17(7), 451–459. https://doi.org/10.1038/nrm.2016.25
  4. Liebal, U. W., Phan, A. N., Sudhakar, M., Raman, K., & Blank, L. M. (2020). Machine learning applications for mass spectrometry-based metabolomics. Metabolites, 10(6), 243. https://doi.org/10.3390/metabo10060243
  5. Vargason, T., Grivas, G., Hollowood-Jones, K. L., & Hahn, J. (2020). Towards a multivariate biomarker-based diagnosis of autism spectrum disorder: Review and discussion of recent advancements. Seminars in Pediatric Neurology, 34, 100803. https://doi.org/10.1016/j.spen.2020.100803
  6. Blennow, K., & Zetterberg, H. (2018). Biomarkers for alzheimer's disease: Current status and prospects for the future. Journal of Internal Medicine, 284(6), 643–663. https://doi.org/10.1111/joim.12816
  7. Schmidt, H. D., Shelton, R. C., & Duman, R. S. (2011). Functional biomarkers of depression: Diagnosis, treatment, and pathophysiology. Neuropsychopharmacology, 36(12), 2375–2394. https://doi.org/10.1038/npp.2011.151
  8. Lai, C.-Y., Scarr, E., Udawela, M., Everall, I., Chen, W. J., & Dean, B. (2016). Biomarkers in schizophrenia: A focus on blood based diagnostics and Theranostics. World Journal of Psychiatry, 6(1), 102. https://doi.org/10.5498/wjp.v6.i1.102
  9. Wu, E., Wu, K., Daneshjou, R., Ouyang, D., Ho, D. E., & Zou, J. (2021). How medical AI devices are evaluated: Limitations and recommendations from an analysis of FDA approvals. Nature Medicine, 27(4), 582–584. https://doi.org/10.1038/s41591-021-01312-x
  10. S. Department of Health and Human Services. (n.d.). Not-CA-22-037: Notice of special interest (NOSI): Validation of digital health and artificial intelligence tools for improved assessment in epidemiological, clinical, and intervention research. National Institutes of Health. Retrieved February 11, 2022, from https://grants.nih.gov/grants/guide/notice-files/NOT-CA-22-037.html
  11. Vaclavik, L., Schreiber, A., Lacina, O., Cajka, T., & Hajslova, J. (2011). Liquid chromatography–mass spectrometry-based metabolomics for authenticity assessment of fruit juices. Metabolomics, 8(5), 793–803. https://doi.org/10.1007/s11306-011-0371-7
  12. Mahadevan, S., Shah, S. L., Marrie, T. J., & Slupsky, C. M. (2008). Analysis of metabolomic data using support Vector Machines. Analytical Chemistry, 80(19), 7562–7570. https://doi.org/10.1021/ac800954c
  13. Chen, T., Cao, Y., Zhang, Y., Liu, J., Bao, Y., Wang, C., Jia, W., & Zhao, A. (2013). Random Forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-Based Complementary and Alternative Medicine, 2013, 1–11. https://doi.org/10.1155/2013/298183
  14. Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23. https://doi.org/10.1016/j.aca.2015.02.012
  15. van der Kloet, F. M., Tempels, F. W., Ismail, N., van der Heijden, R., Kasper, P. T., Rojas-Cherto, M., van Doorn, R., Spijksma, G., Koek, M., van der Greef, J., Mäkinen, V. P., Forsblom, C., Holthöfer, H., Groop, P. H., Reijmers, T. H., & Hankemeier, T. (2011). Discovery of early-stage biomarkers for diabetic kidney disease using MS-based Metabolomics (Finndiane Study). Metabolomics, 8(1), 109–119. https://doi.org/10.1007/s11306-011-0291-6