(200s) Stoichiometry Identification in Pharmaceutical Reactions Using Dynamic Response Surface Methodology and Target Factor Analysis

Dong, Y. - Presenter, Tufts University
Georgakis, C., Tufts University
Santos-Marques, J., Tufts University
Mustakis, J., Pfizer Inc.
Wang, K., Pfizer Inc.
McMullen, J. P., Merck & Co. Inc.
Grosser, S. T., Merck & Co. Inc.
In many pharmaceutical reactions under development, one knows the main reaction’s stoichiometry, but when the reaction experiments are performed, the presence of several other nonidentified compounds are measured. Then there is an urgent need to identify the possible additional reactions that generate the unwanted secondary compounds and to find ways to minimize their presence. These secondary compounds are either intermediates or reaction by-products. To identify the possible reaction stoichiometries that will agree with the data and those that can be refuted, we perform a target factor analysis (TFA)1,2 in combination with the development of the dynamic response surface methodology (DRSM) model3,4.

The DRSM model is a newly proposed data-driven modeling methods which generalize the Response Surface Methodology (RSM) used to quantify the results of Design of Experiments (DoE). We use the constrained version of the DRSM-2 model and the TFA to identify the stoichiometry of the reactions under study. We update the TFA approach by utilizing a statistical test for the determination of the number of significant singular values, and we propose a new test to statistically evaluate whether the projection of candidate stoichiometries is successful.

The use of the DRSM model enables the generation a more informative set of data on the rate of concentration changes than directly relying on the initial concentration data. This is especially crucial if there are infrequent measurements and missing data. We use an F test5 to decide the number of significant eigenvalues and we propose a new statistical test for deciding whether the difference vector, subtracting the response vector from the target vector in TFA, is close to the zero vector. We also compare it against a statistical test previously proposed 5. These two statistical tests are:

  1. A t-test that each individual component of the difference vector is inside the interval, with being a value close to zero.
  2. An F test on the hypothesis that the difference between the target and the response vectors is of the same magnitude as the normal variability of the process.

We apply the projection algorithm both in a parallel or in a sequential mode, in a blind test of a complex pharmaceutical reaction network, involving ten species in eight reactions. In the parallel test we are able to identify six of the eight reactions when the measurement error is 0.005 mol/L. Under the same conditions we refute all six of the proposed false reaction stoichiometries. One more reaction is identified in the sequential TFA. The two reactions that were not initially identified proceed in slow rates and the corresponding concentrations of the participating species have concentrations of the same order of magnitude as the measurement error.


  1. Bonvin D, Rippin DWT. Target Factor Analysis for the Identification of Stoichiometric Models. Chem Eng Sci. 1990;45(12):3417-3426.
  2. Malinowski ER. Factor analysis in chemistry. 3rd ed. New York: Wiley; 2002.
  3. Klebanov N, Georgakis C. Dynamic Response Surface Models: A Data-Driven Approach for the Analysis of Time-Varying Process Outputs. Ind Eng Chem Res. 2016;55(14):4022-4034.
  4. Wang ZY, Georgakis C. New Dynamic Response Surface Methodology for Modeling Nonlinear Processes over Semi-infinite Time Horizons. Industrial & Engineering Chemistry Research. 2017;56(38):10770-10782.
  5. Malinowski ER. Statistical F-tests for abstract factor analysis and target testing. J Chemom. 1989;3(1):49-60.