(344f) A Deep Reinforcement Learning Approach for Production Scheduling
demand uncertainty, to pricing changes, among others. For many supply chain scheduling problems, decisions
need to be made in real-time when the situation in the plant changes, leading industrial operations to
employ human schedulers in the decision making process who react to pressures in the organization, often
rescheduling and creating suboptimal schedules. Shobrys and White  estimate that âgoodâ decisions
for these scheduling problems can increase the profit margin by at least $10/ton of product. Given the
thousands of tons produced by modern, industrial, chemical operations each day, there is a large financial
incentive to improve scheduling processes and decision making under uncertainty.
Although there is a long history of optimization under uncertainty, many techniques are difficult to
implement due to high computational costs, sources of uncertainty (endogenous vs exogenous), and their
measurement [Grossmann et al., 2016]. The stochastic optimization approach deals with uncertainty in
stages whereby a decision is made, then the uncertainty is revealed which enables a recourse decision to
be made given the new information. For scheduling applications, Jung et al.  develops a multi-stage
stochastic optimization model to determine safety stock levels to maintain a given customer satisfaction
level with stochastic demand. Sand and Engell  developed a two-stage stochastic mixed-integer linear
program to address the scheduling of a chemical batch process with a rolling horizon while accounting for the
risk associated with their decisions. Englberger et al.  develop and implement a two-stage stochastic
optimization problem on a rolling horizon for integrated master production scheduling that reduced delays
at the expense of higher safety stock.
In this paper we explore a new approach to scheduling using deep reinforcement learning (DRL) to
train an agent to schedule a multi-product reactor under uncertainty and meet service level and profitability
targets for the supply chain organization motivated by a real-world example. The deep reinforcement learning
model recasts the scheduling problem as a Markov Decision Process (MDP) where the state is defined as the
demand, forecast, inventory levels, current production, schedule, and time, and the decisions are related to
products to be scheduled at given time intervals. This redefinition as a MDP enables a natural representation
of uncertainty for the model. While training must take place off-line using a simulation, once trained, the
model can then be deployed on-line in a production system to yield real-time schedules as new orders are
entered into the enterprise resource planning (ERP) system and as the situation in the plant changes due to
delays and unplanned events.
A drawback of DRL is the lack of theoretical guarantees of performance. To address this, we benchmark
the trained DRL agent against stochastic mixed-integer linear programs (MILP) and deterministic models to
understand how well the DRL solutions measure up. All models are validated using Monte Carlo simulations,
which show that DRL performs quite well in comparison to MILP approaches. In addition, we explore
integration of MILPâs to leverage the optimality guarantees of MILPâs with the efficient time-to-solution of
DRL by training the agent to make optimal decisions in a supervised setting before allowing the agent to
train in reinforced mode.
J. Englberger, F. Herrmann, and M. Manitz. Two-stage stochastic master production scheduling under
demand uncertainty in a rolling planning environment. International Journal of Production Research, 54
(20):6192â6215, 2016. ISSN 1366588X. doi: 10.1080/00207543.2016.1162917.
I. E. Grossmann, R. M. Apap, B. A. Calfa, P. GarcÄ±Ìa-Herreros, and Q. Zhang. Recent advances in math-
ematical programming techniques for the optimization of process systems under uncertainty. Computers
and Chemical Engineering, 91:3â14, 2016. ISSN 00981354. doi: 10.1016/j.compchemeng.2016.03.002. URL
J. Y. Jung, G. Blau, J. F. Pekny, G. V. Reklaitis, and D. Eversdyk. A simulation based optimization
approach to supply chain management under demand uncertainty. Computers and Chemical Engineering,
28(10):2087â2106, 2004. ISSN 00981354. doi: 10.1016/j.compchemeng.2004.06.006.
G. Sand and S. Engell. Modeling and solving real-time scheduling problems by stochastic integer pro-
gramming. Computers and Chemical Engineering, 28(6-7):1087â1103, 2004. ISSN 00981354. doi:
D. E. Shobrys and D. C. White. Planning, scheduling and control systems: Why cannot they work together.
Computers and Chemical Engineering, 26(2):149â160, 2002. ISSN 00981354. doi: 10.1016/S0098-1354(01)