Break | AIChE

Break

In 2019, there will be an estimated 1,762,450 new cancer cases diagnosed and 606,880
cancer deaths in the United States [1] . With both numbers increasing every year it is very
important to be able to diagnose each and every patient in the most cost efficient and timely
manner. Every patient that is suspicious of having any type of cancer will undergo multiple tests
to find a correct diagnosis. It has become extremely important to assess the signs of cancer in
patients with the most accurate test possible for efficient diagnosis. Biomarkers, external
symptoms, and other important indicators allow medical professionals to evaluate the cancer
status upon examination. One of the most prominent markers that appear within the blast cells of
patients with Acute Myeloid Leukemia (AML) is myeloperoxidase (MPO) [2] . This project has
focused on the use of machine learning methods to evaluate the chances of a patient having AML
or Acute Lymphoblastic Leukemia (ALL) based on the clinical variables obtained via
morphological, and cytochemistry tests. The two machine learning methods used in this study
were decision trees and neural networks. These methods require a data set to train the neural
network or decision tree and then a testing data set to evaluate how accurately these machine
learning methods identify each patient as having ALL or AML. The immunophenotyping (IPT)
results serve as the confirmatory test for checking the solution accuracy. These machine learning
approaches can possibly help to determine which symptoms/markers are most important to
search for when diagnosing a patient with either AML or ALL. A marker is deemed to be
significant when it is used as a node within the tree and as for the neural network there are
weights assigned to each possible marker with the more important markers resulting in a higher
weight. It will also help to inform doctors which tests will be most efficient in determining a
diagnosis for patients. It was found that the decision tree approach was able to diagnose patients
with AML 97.14% of the time when considering myeloperoxidase (MPO). The neural network
that was created using the same dataset was able to classify patients correctly 100% of the time
while assigning the highest weight to the MPO marker. To check for the significance of the MPO
marker both the decision tree and neural network were recreated neglecting the MPO marker.
Both machine learning methods had a drop in accuracy when classifying a patient with AML
with the accuracy of decision trees falling to 91.43% and the neural networks decreasing to
90.91%.