(149ad) A Constraint-Based Modeling Framework with Deep Reinforcement Learning and Multi-Objective Optimization for Control of Mammalian Cell Cultures | AIChE

(149ad) A Constraint-Based Modeling Framework with Deep Reinforcement Learning and Multi-Objective Optimization for Control of Mammalian Cell Cultures

Authors 

Parulekar, S. - Presenter, Illinois Institute of Technology
Mammalian cell cultures have become the favored production hosts for monoclonal antibodies (MAbs) and therapeutic proteins, since microbial systems are not able to carry out the complex post-translational and functional modifications of these proteins, such as glycosylation. Products of these cell cultures have broad applications in vaccination, drug screening and development, and gene therapy. MAbs are significant reagents used extensively in diagnostic assays, therapeutic applications, and in vivo imaging. Chinese Hamster Ovary (CHO) cells and hybridoma cells, which share similar metabolic characteristics, have been popular cell types for production of MAbs. The efficient performance of these cell cultures requires highly specialized culture media to enhance MAb yield for in vitro production in view of substantial cell death and reduced MAb productivity due to the variations in culture conditions. Although some production practices have been employed for decades, cell kinetics is still under investigation to obtain quantitatively as well as qualitatively cost-effective production strategies. Creating these strategies requires understanding of cell metabolism affected by process dynamics in applicable culture environments. Kinetic models empower us to quantitatively illustrate cell growth and metabolic activity, which allows prediction of different cell phenotypes and provides better understanding of cell physiology, which is important in optimization of MAb production in animal cell cultures. Kinetic descriptions of mammalian cell cultures are difficult due to these complexities. Development of first principles models (FPMs) is tedious. While FPMs provide qualitative understanding of the cell culture processes, their application in dynamic operations for prediction, optimization and feedback control of mammalian cell cultures is limited due to their rigid structure. A recent focus has been on using statistical methods for modeling mammalian cell cultures. This approach has limited success due to it being restrictive to particular strains.

In this work, we used the Recurrent Neural Network Long-Short Term Memory (LSTM) method for modeling mammalian cell cultures. The LSTM method uses neural network algorithms to model and analyze sequential data and is capable of handling large and nonlinear experimental databases. This method has been improved in the past decade, and with the availability of large databases and better computing power, we are able to model and analyze mammalian cell cultures. The models developed here used data generated from in silico experiments involving a large number of process inputs and outputs. The model simulations predict the trends observed in batch and fed-batch mammalian cell cultures for key nutrients glucose and glutamine, viable cell density, target product (MAb) titer, and inhibitory metabolites lactate and ammonia with high accuracy. The optimization of mammalian cell cultures is a crucial step in biopharmaceutical production. However, the nonlinearity and complexity of cell cultures make it difficult to model and optimize these processes. To overcome these hindrances, we demonstrate the use of Deep Reinforcement Learning with Deep Deterministic Policy Gradient (DRL-DDPG) method for multi-objective optimization of fed-batch mammalian cell cultures. The DRL-DDPG method is a model-free, actor-critic algorithm that can find optimal control policies in dynamic systems such as bioprocesses. To represent the actor-critic in the DRL-DDPG algorithm, we employed the LSTM method to model and analyze experimental databases for fed-batch cultures. LSTM is capable of handling large sets of data with nonlinearities and can be tuned further for improved predictive ability. The model in this study is trained using in-silico data generated using 11 inputs and outputs, including MAb titer, viable cell density, and concentrations of glucose, glutamine, lactate, and ammonia. The proposed framework predicts the trajectories of these process variables satisfactorily and with sufficient accuracy and reliability. In addition to multi-objective optimization techniques, we utilized appropriate constraints on state and manipulated variables to ensure that the proposed control policy satisfies constraints on nutrients, target metabolites, and undesired metabolites (waste products). The results presented here reveal that the proposed framework can effectively optimize multiple objectives while satisfying appropriate constraints. The study represents a promising approach for multi-objective optimization and control of mammalian cell cultures using DRL-DDPG. In the near future, we will focus on further improving the accuracy and efficiency of the proposed framework by incorporating additional process variables and optimizing the DRL-DDPG algorithm.