(516h) Smart Perfusion Machines
AIChE Annual Meeting
2021
2021 Annual Meeting
Computing and Systems Technology Division
Applied Math for Biological and Biomedical Systems
Wednesday, November 10, 2021 - 5:43pm to 6:02pm
Current derivative methods for policy improvement in machine learning attempt to maximize expected return based on gradient information (i.e., policy gradient methods) and are preferred in applications with uncertainty and complex continuous states and actions, as is the case here. However, policy gradient methods are local methods, often slow to converge, and can get trapped in local maxima. Therefore, in this work, we use global optimization methods to generate âsmartâ policies and, as a result, provide globally optimal protocols for MP. To do this, we use the terrain/funneling methods of Lucia and coworkers (2001, 2004, 2008) to maximize expected return as a function of states and actions at each time step in policy-based reinforcement learning.
The specific example studied in this work is the resuscitation of ischemic livers using machine perfusion and a key challenge in applying machine learning to MP is that the initial state of each liver is different â containing varying amounts of metabolites, cofactors, and enzymes and much of this information is unknown. Therefore, a technician overseeing machine perfusion is faced with a new challenge each time. Standard MP protocols usually involve placing the liver in static cold storage (SCS) at 4 C, then performing MP at a specified temperature for some designated period of time thatâs constrained between 4 and 24 hours. Current research favors MP temperatures in the range 21 â 37 C and the average perfusion time is 9 hours.
Using the currently accepted protocol of SCS followed by MP as a starting point, we present numerical results that illustrate proof-of-concept that policy-based reinforcement learning can be used to create âsmartâ policies for MP. It is shown that policies can have multiple maxima and that global optimization is the only way to find truly optimal policies. For example, the policy optimization approach developed in this work shows that temperatures in the sub-normothermic range (21 â 34 C) are preferred over temperatures in the normothermic range (35-38 C). In addition, we show that liver viability constraints such as maintaining pH > 7.3 can have an impact on policy optimization and perfusion times. Several other numerical illustrations are presented to elucidate key ideas and show improvement in MP performance.