(38b) Dealing with Small Data in Biopharmaceutical Batch Process Monitoring: A Machine-Learning Approach
AIChE Spring Meeting and Global Congress on Process Safety
2018
2018 Spring Meeting and 14th Global Congress on Process Safety
Industry 4.0 Topical Conference
Big Data Analytics and Statistics
Monday, April 23, 2018 - 4:00pm to 4:30pm
To efficiently monitor and control biopharmaceutical processes, multivariate statistical techniques are commonly deployed for batch process monitoring (BPM). A BPM framework uses multivariate statistical models (e.g., principal component analysis (PCA) and partial least squares (PLS)) to capture common-cause variations in the batch [2]. The control charts (e.g., Hotelling T2 and the squared prediction error (SPE) statistics) and control limits are then used to determine whether a new batch demonstrates normal operating behavior or not. An alarm is raised if a batch is statistically different from the normal operations.
Despite over two decades of research in BPM, the biopharmaceutical BPM framework suffers from a unique challenge the Low-N problem (or small-data problem). The Low-N problem represents a scenario where a product has a limited production history, denoted here by N. It is common for companies to have only one or two runs for a new drug product at a manufacturing facility. While a new product may require a limited number of runs to meet clinical or early commercial demand, it also creates a Low-N scenario for the product. In terms of BPM, a Low-N scenario poses several challenges. First, under the Low-N, it is nontrivial to capture the common-cause variations in its entirety. Second, the predictive capabilities of PCA and PLS are less accurate under the Low-N scenario. Further, under the Low-N scenario, model over-fitting becomes much harder to avoid and the effects of outliers are much more pronounced.
The Low-N problem is a longstanding, industry-wide problem in biopharmaceutical manufacturing that challenges the theoretical foundations and practical applicability of the existing BPM platform. We propose an approach to transition from a Low-N scenario to a Large-N scenario by generating an arbitrarily large number of in silico batch data sets. The proposed method is a combination of hardware exploitation and algorithm development. To this effect, we propose a block-learning method for a Bayesian non-parametric model of a batch process, and then use probabilistic programming to generate an arbitrarily large number of dynamic in silico campaign data sets. The proposed solution not only alleviates the monitoring issues associated with a Low-N scenario, it is also compatible with the industrial BPM framework. To the best of authors knowledge, this is the first method that describes a systematic approach to address the small data problem using the tools for big data. The efficacy of the proposed solution is elucidated on an industrial biopharmaceutical process.
References
[1]. Hamilton G. âThe Biotechnology Market Outlook: Growth Opportunities and Effective Strategies for Licensing and Collaborationâ. Dublin: Research and Markets. 2005.
[2]. P. Nomikos and J. F. MacGregor, âMonitoring batch processes using multiway principal component analysis, AIChE Journal, vol. 40, no. 8, pp. 1361â1375, 1994.