(165c) A Reinforcement Learning Approach for the Exact Guillotine Cutting Stock Problem | AIChE

(165c) A Reinforcement Learning Approach for the Exact Guillotine Cutting Stock Problem

Authors 

Kang, J. L., National Yunlin University of Science and Technology
Hung, Y. F., National Tsinghua University
Jang, S. S., National Tsing Hua University
The cutting stock problem (CSP) is a well-studied optimization problem that minimizes trim loss while cutting stock materials into pieces of required sizes to satisfy customer demand. There are various industrial applications for two-dimensional CSP, such as production of paper, metal, and glass. In the production scheduling for chemical industries, the production cost is affected by the inventory level, waste of materials, and back-order directly. Two-dimensional CSP can be considered an example of such complex scheduling problem, which attempts to minimize production costs and enhance economic efficiency.

The two-dimensional CSP is divided into guillotine cutting and non-guillotine cutting. Furthermore, the guillotine cutting can be classified into an exact guillotine and a non-exact guillotine (Fig 1) [1], which increases the complication and difficulty of the two-dimensional CSP. The computation times of traditional mathematical optimization methods, such as mixed-integer programming (MIP), rise dramatically with the amount of data and complexity of the problem, making it difficult to meet the limited rescheduling time required by factories.

Reinforcement learning (RL) is a promising approach when addressing the scheduling problem, which is expected to provide near-optimal solutions within a very short computation time [2]. However, there are few studies focusing on solving CSP via RL and have been applied only to one-dimensional CSP [3,4]. Hence, the purpose of this study is to adopt RL approach to solve a two-dimensional exact guillotine CSP in the paper-cutting factory.

The industrial case is more complex and challenging to solve than a conventional CSP due to three main differences. First, the length of the item to be cut is a variable that we must determine and ensure is within the specified required length range of the corresponding order. Second, numerous item length requirements are included under each order, which is placed based on the total weight demand. It could produce any number of items as long as the overall weight of those items can satisfy the demand and each item's length can satisfy any length requirement, resulting in unclear numbers of items. Third, there are much more constraints to satisfy during production due to the processes and customers, leading to the difficulty and unstable performance of RL training.

For each episode, which represents the scheduling of one day, there are several bins with variable lengths, widths, and weights per area to cut into various items. Patterns are developed at the beginning of each episode by generating all possible and feasible combinations of items in the width dimension and arranging them in ascending order of width. The RL agent then would schedule the bins one by one. At each time step, the RL agent selects a pattern and decides the pattern length to be cut on the current bin. At most, six patterns could be cut on each bin in the length dimension, depending on the length of the bins. By pre-generated patterns, the amount of constraints RL needs to handle is reduced, and the results are guaranteed to conform to the exact guillotine.

We developed a supervised deep Q-learning (DQN) model based on MolDQN that overcomes the problem of non-fixed action space [5]. MIP was implemented to produce data with optimal solutions for supervised learning to train the model to choose the pattern that minimizes the trim loss and satisfies the remaining constraints.

After training, we compared the performance of the RL model and MIP via the mean utilization rate and bin-used ratio. The mean utilization rate is computed by dividing the total weight of the items produced in the episode by the total weight of the used bins. The higher the mean utilization rate, the lower the trim loss per bin. The bin-used ratio is calculated by dividing the number of used bins by the total number of bins. A high bin-used ratio would lead to a low inventory level. Fig 2 shows that although the mean utilization rate of the RL model is slightly lower than MIP, the RL model has a much higher bin-used ratio. It should be noted that the mean utilization rate of the RL model has an average value of 0.67, which is sufficient for the industry even though it is lower than the MIP. The average computation time per episode for RL and MIP are 10.6 seconds and 1208.3 seconds, respectively, demonstrating that the RL model can solve the complex problem significantly faster, which is essential for rescheduling.

In conclusion, the proposed supervised DQN with pre-generated patterns could provide a high bin-used ratio that reduces the inventory level and acceptable utilization rate within an extremely short calculation time, making it practical for industrial rescheduling. The results showed the potential of RL to address complex constrained-industrial problems across various industries. In future work, more constraints will be implemented to fit the needs of a real-world factory, and the solutions of the RL model would serve as the initial guesses of MIP to speed up the calculation of MIP.

References

[1] Andrade, R., Birgin, E. G., & Morabito, R. (2016). Two‐stage two‐dimensional guillotine cutting stock problems with usable leftover. International Transactions in Operational Research, 23(1-2), 121-145.

[2] Hubbs, C. D., Li, C., Sahinidis, N. V., Grossmann, I. E., & Wassick, J. M. (2020). A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 141, 106982.

[3] Pitombeira-Neto, A. R., & Murta, A. H. (2022). A reinforcement learning approach to the stochastic cutting stock problem. EURO Journal on Computational Optimization, 10, 100027.

[4] Fang, J., Rao, Y., Luo, Q., & Xu, J. (2023). Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics, 11(4), 1028.

[5] Zhou, Z., Kearnes, S., Li, L., Zare, R. N., & Riley, P. (2019). Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1), 1-10.