(59b) Control System Design Using Reinforcement Learning (Poster) | AIChE

(59b) Control System Design Using Reinforcement Learning (Poster)


Tzorakoleftherakis, E., The Math Works, Inc.
Reinforcement learning (RL) is a technique for computing state-based decisions that lead to maximal rewards. It has much in common with optimal control theory.

As an example: What is the schedule of control setpoints that achieves plant start-up with minimum total energy consumption? This is the type of problem, extraordinarily complex, that RL and related techniques endeavor to solve.

In contrast to supervised machine learning methods which require training data, RL instead requires a mathematical model that describes how sequences of decisions cumulatively contribute to an overall reward. By running experiments and generating data dynamically, reward-maximizing strategies can be learned as the amount of "experience" increases. Since process simulation is a foundational aspect of plant design, such models are commonly available to the process engineer.

In this paper we focus our attention on control design. Multiple-input, multiple-output (MIMO) processes are a feature of almost all chemical plants. The design of robust control strategies is critical for maintaining consistent product quality, ensuring safe operations, minimizing downtime, and generating profit. The design process typically involves the comparative evaluation of alternative control loop configurations for interacting process units, applying domain expertise and using techniques such as the relative gain array and decouplers. How about using RL?

In this talk we start with a generic example to introduce the elements of RL, including agents, policies, the "environment", update algorithms, and reward functions. We then describe a MIMO process control design problem and demonstrate how RL can be used to generate a design solution. We conclude by comparing the RL results to those deriving from a traditional design approach and offering some observations about the promise of RL for broader applications.