(259e) Kernel Mean Embedding of Distributions: A New Data-Driven Model Approach for Modelling a Continuous Pharmaceutical Twin-Screw Granulation Process | AIChE

(259e) Kernel Mean Embedding of Distributions: A New Data-Driven Model Approach for Modelling a Continuous Pharmaceutical Twin-Screw Granulation Process


Stock, M., Ghent University
De Beer, T., Ghent University
Nopens, I., Ghent University
Traditional pharmaceutical solid oral dosage processes comprise of a series of batch unit operations. More recently, a transition is developing from batch unit processes to continuous manufacturing application, to cope with the inefficiencies and high cost involved in process development & scale up. Twin-screw wet granulation is a developing pharmaceutical continuous process that is being assessed for its performance in solid dosage manufacturing. In this research, the twin-screw wet granulation unit is a unit operation of the ConsiGmaTM-25 continuous powder-to-tablet process line from GEA Pharma Systems. However, since these continuous processes are still under development in the pharmaceutical industry, detailed process knowledge and understanding is still evolving. Application of models, either mechanistic of data-driven, can help in filling knowledge gaps by thoroughly investigating gathered detailed experimental data and unravelling the underlying mechanisms.

Previous work of the authors focussed on population balance modelling of the continuous pharmaceutical twin-screw wet granulation unit [1]. In that work, a novel two compartmental population balance model was calibrated and validated using measurements of particle size after the wetting zone and at the end of the granulator. The population balance model used in that work can be described as a mechanistic model: the physical phenomena of aggregation and breakage of particles are explicitly described to allow a transformation of one particle size distribution into another.

In this work, the following question to is posed: can we predict particle size distributions without any assumption on the governing mechanistic phenomena in the system? In other words, can a model learn how to transform one distribution into another using only data-driven techniques?

The model under study is a Hilbert space embedding of distribution, or in short, a kernel mean embedding. The main idea behind this framework is to map distributions into a high-dimensional reproducing kernel Hilbert space (RKHS). It can be viewed as a generalization of the original “feature map” common to support vector machines and other kernel methods. Rather than manipulating the individual observations, a distribution of points (in this case a particle distribution) is represented as a mean in the RKHS. Under mild conditions, this mean retains all information on the moments of the original distribution. Kernel mean embedding has found application in fields ranging from kernel machines and probabilistic modelling to statistical inference, causal discovery, and deep learning. The embedding of distributions enables us to apply RKHS methods to probability measures which prompts a wide range of applications such as learning on distributional data [2].

This learning on distributional data is the core of this work. The kernel mean embedding framework allows for a direct feature mapping from the process variables (machine settings such as liquid to solid ratio and mass flow rate as well as blend properties such as the concentration of the active pharmaceutical ingredient and water binding capacity) to the particle size distributions at the end of the granulator. This reveals an upside of the data driven approach: it is no longer needed to provide an extra layer to the mechanistic model to link it with the process variables. The learning of distributional data can be extended to learning on conditional distributions, i.e. modelling several process manipulations on the particle distributions in series. This is learning the feature mapping between one (intermediate) distribution, e.g. the particle size distribution in the wetting zone, and the another distributions, e.g. the particle size distribution at the outlet of the granulator. Similar to the population balance model, a compartmental model can be constructed, describing the distribution transformations in every stage of the granulator.

Further, it is interesting to investigate the relation between the data-driven model in this work and the mechanistic population balance model in the previous work. The calibrated feature mapping could potentially provide insight in the physical phenomena governing in the twin-screw wet granulator by linking it with aggregation and breakage descriptions of the population balance model.

The results of this work are promising: using the kernel mean embedding, particle size distributions at the end of the granulator can be predicted. The model is fast to compute and does not require much computational resources to train. The speed of the model allows for future applications such as real time soft-sensors, model based control and optimal experimental design of the twin-screw wet granulator.

[1] Van Hauwermeiren, D., Verstraeten, M., Doshi, P., am Ende, M. T., Turnbull, N., Lee, K., ... Nopens, I. (2018). On the modelling of granule size distributions in twin-screw wet granulation: Calibration of a novel compartmental population balance model. Powder Technology, 341, 116–125. https://doi.org/10.1016/j.powtec.2018.05.025

[2] Muandet, K., Fukumizu, K., Sriperumbudur, B., & Schölkopf, B. (2016). Kernel Mean Embedding of Distributions: A Review and Beyond. https://doi.org/10.1561/2200000060