(629h) Continuous Control of a Polymerization System with Deep Reinforcement Learning | AIChE

(629h) Continuous Control of a Polymerization System with Deep Reinforcement Learning


Ma, Y. - Presenter, Louisiana State University
Zhu, W., Chemical Engineering Department, Louisiana State U
Benton, M. G., Louisiana State University
Romagnoli, J. A., Louisiana State University
Improving and deriving noble control laws for polymerization reaction has been one of the uppermost tasks in the chemical industry. There have been extensive studies in control industries from proportional controls to various forms of model predictive controls, which have been benefiting in current chemical processes and plants[1]. Recent breakthroughs in deep learning area have started to inspire the development of Artificial Intelligence(AI)-based controllers in chemical reaction control area thanks to its known success in wide-range applications from gaming to robotics control[2]. These applications utilize deep reinforcement learning (Deep RL), of which the ultimate goal is to enable computers to make human-like policies based on AI’s exploration of the environment[3]. However, some challenges are still faced for applying Deep RL in controlling chemical reactions, since the inputs of chemical reactors are usually high-dimensional, and the system dynamics are often sensitive and has considerable time-delay effect[2].

Previous studies have been done in controlling a free-radical polymerization process by following real-time measurement of weight-average molar mass[1], where the specific molar mass distribution can be achieved by following a trajectory of weight-average molar mass with respect to time[1]. In this work, we developed a deep learning based controller for a free radical poly-acrylamide polymerization system using Deep Deterministic Policy Gradient (DDPG). The DDPG utilizes actor-critic structure that is able to predict actions of infinite dimensions[4]. The controller calculates the control action at, which consists of the monomer flow rate Fm and initiator flow rate Fi to adjust at each time step t, and the system response of the action is recorded as a state st. The network is trained to maximize the cumulative reward r that accounts for the distance between current output and target output for each iteration[5]. The network is trained on an established kinetic model of the polymerization reaction, and the controller gradually learns the policy through exploration of the system. Then convergence is achieved when the average cumulative reward reaches a desired threshold. In our experiment, the controller successfully has learned the control policy to follow the target trajectory of the weight-average molar mass.

Overall the smart controller has shown robust control over a range of operating conditions, which indicates the deep reinforcement learning based approach’s capability in controlling a nonlinear dynamic semi-batch system.


[1] N. Ghadipasha, W. Zhu, J. A. Romagnoli, T. Mcafee, T. Zekoski, and W. F. Reed, “Online Optimal Feedback Control of Polymerization Reactors : Application to Polymerization of Acrylamide − Water − Potassium Persulfate ( KPS ) System,” 2017.

[2] S. P. K. Spielberg, R. B. Gopaluni, and P. D. Loewen, “Deep Reinforcement Learning Approaches for Process Control”, 2017

[3] V. Mnih, D. Silver, and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” pp. 1–9.

[4] T. P. Lillicrap et al., “Continuous learning control with deep reinforcement learning,” , 2016.

[5] D. Silver, G. Lever, D. Technologies, G. U. Y. Lever, and U. C. L. Ac, “Deterministic Policy Gradient Algorithms.”


This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.


Do you already own this?



AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00