(498a) MODEL-Free Simulation and Fed-Batch Control of Cyanobacterial Phycocyanin Production By Artificial Neural Network and DEEP Reinforcement Learning | AIChE

(498a) MODEL-Free Simulation and Fed-Batch Control of Cyanobacterial Phycocyanin Production By Artificial Neural Network and DEEP Reinforcement Learning


Ma, Y. - Presenter, Louisiana State University
Benton, M. G., Louisiana State University
Romagnoli, J. A., Louisiana State University


Yan Ma*, Michael G. Benton, José A. Romagnoli

Louisiana State University

Baton Rouge, LA 70803


      The optimization of a bio-reactor often
involves a cascade of design process, including obtaining a suitable reaction
kinetic model, tuning model parameters, and deriving a control strategy by
searching for the optimum reaction condition (Xiong & Zhang, 2004).  However, there are many constraints and
difficulties in such process besides the design process being laborious.
Especially, finding a suitable model which defines the experimental data can be
difficult. Also, kinetic equations usually make many assumptions that sometimes
are inaccurate in different reaction conditions (Bas et al. 2007).

      Previously, artificial neural networks
(ANNs) have been successfully adapted for various applications for their great
performance in approximating non-linear dynamics (Basheer & Hajmeer, 2000).
ANNs are composed of numerous inter-connected neurons activated by linear
functions. ANNs do not require a priori
knowledge or assumptions of the system; nevertheless, given sample vectors of
the inputs and outputs, the ANNs are able to automatically map the relationship
between them by the gradient descent algorithm (Tetko et al., 1995; Hippert et
al., 2001). 

      Besides the successful applications of
ANN in prediction tasks, recently, deep reinforcement learning (DRL) becomes a
supernova in the artificial intelligence community for its super-human
proficiency in control tasks (Mnih et al., 2015). The DRL algorithm employs an
agent represented by deep neural networks that iteratively interacts with an
environment, where at each time instance, it receives an observation and a
reward based on its actions. The agent gradually learns to optimize its action
that maximizes long-term reward from its experience (Lewis & Vrabie, 2009).

      In this work, we are interested to
control a semi-batch bio-reactor using DRL. We propose a strategy called
"DRCS", Data-driven Reactor Control Scheme, where we take the
combination of an ANN reaction model with the DRL control algorithm. In this
approach, a kinetic model is not necessary in the process of developing the
controller, while the controller does not depend on any control models either.
The design process for this controller is completely data-driven that only uses
several sets of experimental data obtained at different conditions. An ANN is
trained to predict the reaction outputs given the current measurements, which
is employed as the environment for control optimization using DRL. The main
advantage of this DRCS approach is that it does not rely on any kinetic
assumptions, and since neural networks are computed with linear functions, it
is also computationally inexpensive compared to traditional approaches that
require non-linear optimization.

Bioreactor system

      C-phycocyanin is a valuable bioproduct
for food industry and potential valuable pharmaceutical product (del
Rio-Chanona et al., 2016). In the experiments, cyanobacteria Plectonema sp. is
used for C-phycocyanin production. The batch process is performed in an
incubator with constant light intensity and temperature at 23 °C. The volume of
the culture is kept constant at 110 mL with equal initial biomass concentration.
Experiments are performed at different initial nitrate concentrations. 3
growing processes last 17 days until biomass accumulation reaches stationary
phase, while one extended experiment is run with extra high initial
concentration of nitrate, and the experiment runs until nitrate is completely
depleted.  The biomass, nitrate and C-PC
concentrations are collected daily during the experiment.


      The experimental data at 4 different initial
nitrate concentrations (300mg/L, 600 mg/L, 900 mg/L and 3000 mg/L) are
collected, and 3 sets are used for training the network, and the rest is used
for validation2.  Due to the experimental
disturbances and difference of time interval between measurements. We use the
data computed by the polynomial fit to the actual experimental data, aiming to
reduce noise from real experimental data and make time intervals equal.  The time interval used to train the network
is 1 day. The validation set is used to prevent the network from over-fitting,
and a reliable reaction model is obtained when the validation cost converges.

      The algorithm we implement for this
application is Asynchronous Actor-Critic Agents (A3C) algorithm in deep
reinforcement learning (Mnih et al., 2016). 
The A3C is a very powerful algorithm that utilizes multi-threading in CPU
that allows parallel actors (policy networks) that employ exploration policies
simultaneously which significantly enhances the efficiency and stability of
training a controller. In our setup, we build an A3C agent with 4 parallel
policy networks that each network interacts with an environment simultaneously.
At each time step, a policy network outputs a probability distribution of
control actions during interaction with the simulated environment. The control
action with highest probability is selected to feedback to the corresponding

      For training the controller, the state
input for the network is a tuple of historical measurements including time,
biomass, nitrate concentration, phycocyanin concentration, and total nitrate
allowed until the current time point. 
The output from the policy network is the current nutrient addition
added to the reactor.  In the simulation,
we simulate the experiment with a total allowed nitrate nutrient addition of
2000mg/L. The action outputs at each time step contains 6 different nutrient
dosages from 0mg/L to 250mg/L to the reactor. 
After the allowance of 2000mg/L is consumed, further nutrient addition is

Figure 1. Schematic diagram of A3C control


      Training of the DRL controller is
completed in Intel Core i7-3770K CPU, and for 30,000 iterations, the training
time is approximately 4 minutes.  The
fast training speed credits to the parallel computation in the A3C setup, where
it utilizes multi-threading which significantly enhances the training
efficiency than reading simulation results of past iterations from a replay
buffer (Mnih et al., 2016). The experiment is a 30-day fed-batch culture, where
the nitrate nutrient additions are controlled by the A3C controller for the
first 20 days of experiment, then the cyanobacteria is harvested on day 30. The
resulting phycocyanin yield is compared to a control group where nitrate is
replenished in a 2-step addition manner. The results are shown in Fig. 2 and
Fig. 3.  The A3C controller group shows
45.20% higher phycocyanin yield compared to the 2-step nitrate addition in the
control group.

Figure 2. Nitrate concentration profile
(simulation) controlled by the A3C controller (red), and a comparison group
with a 2-step nitrate addition (green).

Figure 3. C-PC yield obtained (red: A3C, green:
2-step addition).


      The approach DRCS is presented in this
document, where we employ the combination of artificial neural network (ANN) as
reaction simulation and deep reinforcement learning (DRL) to obtain feed-batch
control strategy for controlling a bioreactor that produces a valuable
bioproduct C-phycocyanin in Plectonema.
The proposed method is entirely model-free, that does not depend on reaction
kinetics assumptions or control models. 
The resulting controller demonstrates robust performance in the
feed-batch reaction via self-learning in a simulated environment by ANN that
shows a 45.14% higher in the C-PC yield validated by experimental results comparing
to a control group of the same total nitrate amount but added via a 2-step addition
during these mi-batch experiment. The proposed ”DRCS” approach demonstrate a
huge potential for chemical manufacturing industries as itis able to learn
control strategy without domain-specific human knowledge and models, to perform
control tasks in diverse settings.


D. Bas
̧, F. C. Dudak, and I. H. Boyacı, “Modeling and       optimization iii: Reaction rate
estimation using artificial neural network (ann) without a kinetic
model,”Journal of food engineering, vol. 79, no. 2,pp. 622–628, 2007.

E. A. del
Rio-Chanona,  E. Manirafasha,  D. Zhang, 
Q. Yue,  and K. Jing,  “Dynamic modeling and optimization of
cyanobacterial c-phycocyanin production process by artificial neural
network,”Algal Research, vol. 13, pp. 7–15, 2016.

I. A.
Basheer and M. Hajmeer, “Artificial neural networks: fundamentals, computing,
design, and application,” Journal of microbiological methods, vol. 43, no. 1,
pp. 3–31, 2000.

H. S.
Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short-term load
forecasting:  A review and evaluation,”IEEE
Transactions on power systems, vol. 16, no. 1, pp. 44–55, 2001.

F. L. Lewis
and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for
feedback control,” IEEE circuits and systems magazine, vol. 9, no. 3, 2009.

V. Mnih, A.
P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K.
Kavukcuoglu,“Asynchronous  methods  for 
deep  reinforcement  learning,” 
in International  conference  on 
machine learning, 2016, pp. 1928–1937.

V. Mnih, K.
Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M.
Ried-miller, A. K. Fidjeland, G. Ostrovskiet al., “Human-level control through
deep reinforcement learn-ing,”Nature, vol. 518, no. 7540, p. 529, 2015.

I. V.
Tetko, D. J. Livingstone, and A. I. Luik, “Neural network studies. 1.
comparison of overfitting and overtraining,”Journal of chemical information and
computer sciences, vol. 35, no. 5, pp. 826–833,1995.

Z. Xiong
and J. Zhang, “Modelling and optimal control of fed-batch processes using a
novel controlaffine feedforward neural network,”Neurocomputing, vol. 61, pp.
317–337, 2004.


This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.


Do you already own this?



AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00