(747a) Fragrance Product Design/Screening Methods Using an Integrated Machine Learning and Camd Model
Fragrances are used in a wide variety of daily products such as perfumes, cosmetics, toiletries and household cleaners . The business of flavors and fragrances has become a multibillion dollar market with a great impact all over the world, but the design of fragrances remains mostly empirical, based on the experience and knowledge of experts. However, potentially better fragrance products could be missed when employing experience and knowledge-based design methods. Therefore, a systematic computer-aided design method is needed to assist in designing/screening fragrance molecules to meet various product needs as a first step before focused experiment-based verification can be performed. However, the lack of relevant property models prevents the computer-aided design methods being applied to the fragrance products. In chemical and pharmaceutical research, large amounts of experimental data are available. Knowledge may be extracted from such data, for example by deriving models that can predict properties for new compounds. Therefore, machine learning techniques are used to establish Quantitative Structure-Property Relationships (QSPR). A growing number of new potentially useful machine learning (ML) methods are being used in chemistry. Although many models for property prediction have been developed and many more are needed, very few have been developed by applying machine learning methods even though sufficient fragrance data can be found.
In this paper, an integrated model of ML and Computer-Aided Molecular Design (CAMD) is developed for the design of new fragrance molecules. The odors of the molecules are predicted using a data-driven ML approach. In the developed ML method, group-based representation is selected to represent the molecular structures. The database contains 480 different molecules, and 20 odor characters are used  to classify them. Convolutional Neural Network (CNN) is used in the modeling of odor of organic molecules. The established machine learning (CNN) model structure consists of a 50×64 embedding layer, two 47×128 convolutional layers (47×128 and 44×128), a 22×128 max-pooling layer, a 22×128 dropout layer, a 2816×1 flatten layer, a 128×1 dense layer, another 128×1 dropout layer and 20×1 dense layer. Finally, the properties of odor characters and odor pleasantness are predicted using this model structure, with regressed model parameters, using the available data for training and verification. The average correctness of odor characters is 92.9%, while the average prediction error of odor pleasantness is 18.4%. Next, a CAMD model is developed based on the established work-flow for molecular design and group-contribution based models for other needed properties. The optimization model includes an objective function, molecular structural constraints and property constraints including odor character, odor pleasantness, diffusion coefficient, vapor pressure, normal boiling and melting points, solubility parameter, viscosity, density and LC50. A decomposed-based solution approach, in which the established MILP/MINLP model is decomposed into an ordered set of sub-problems, is used to obtain the optimal design results. Finally, two case studies are presented to demonstrate the method for optimal design of fragrance molecules.
 Teixeira, M.A., et al., Chapter 1 - A Product Engineering Approach in the Perfume Industry, in Perfume Engineering. 2013, Butterworth-Heinemann: Oxford. p. 1-13.
 Keller, A. and L.B. Vosshall, Olfactory perception of chemically diverse molecules. Bmc Neuroscience, 2016. 17(1): p. 55.
This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.
Do you already own this?
Log In for instructions on accessing this content.
|AIChE Graduate Student Members||Free|
|AIChE Undergraduate Student Members||Free|