(59h) Smells like AI: Harnessing Machine Learning for Advanced Olfactory Experience Reproduction and Odorant Optimization | AIChE

(59h) Smells like AI: Harnessing Machine Learning for Advanced Olfactory Experience Reproduction and Odorant Optimization

Authors 

Nogueira, I. - Presenter, LA / LSRE - LCM
Viena Santana, V., Norwegian University of Science and Technology
Rodrigues, B., Faculty of Engineering od University of Porto
Murins, S., SIA Murins
Shardt, N., University of Alberta
The human olfactory system can detect and differentiate a vast array of scents. Neuroscience research suggests that the perception of smell consists not only of the sensation of the scent itself but of the experiences, memories, and emotions associated with these sensations. The literature indicates that scents can trigger memories much better and more emotionally than audio-visual stimuli [1]. Hence, the importance of studying scents and developing methods to digitalize and reproduce their olfactory experience.

The perception and recognition of scents are influenced by the odorants’ chemical composition and the individual's prior experience with similar odors. Research in olfactory perception has shed light on the underlying mechanisms of odor detection and processing in the olfactory bulb and cortex [2]. Scents result from VOCs interacting with olfactory receptors in the nose, where different VOCs affect scent perception, some more dominantly. Comprehending the link between a scent's chemical composition and perceptual qualities is vital for reproducing olfactory experiences.

The synthesis and reconstitution of scents involve determining the appropriate combination of chemical compounds and their respective concentrations to recreate a specific olfactory experience. Usually, this is done in a trial-and-error manner. Researchers have investigated some methods to achieve it, including the use of natural and synthetic odorants [3] and, more recently, the application of mathematical models and algorithms for optimizing scent composition [4-5].

Despite the advancements in olfactory research, there are still several challenges in reproducing olfactory experiences accurately. One such challenge is the sheer complexity of scents, which often contain numerous chemical compounds with complex interactions [6]. Additionally, individual variations in olfactory perception can make it difficult to create a universally consistent scent experience [7].

Hence, machine learning techniques are increasingly applied to olfactory research, addressing key challenges like odor classification and recognition [8]. Algorithms like neural networks (NN) [9] and random forests (RF) demonstrate potential in accurately handling complex odor datasets. Quantitative structure-odor relationship (QSOR) modeling [10], which predicts odor properties based on molecular structure, has also benefited from machine learning techniques [11], with deep learning showing promise in QSOR modeling.

Therefore, the main objective of this study is to develop a scent reproduction machine-learning-based methodology capable of accurately classifying a scent and reproducing its olfactory experience by identifying an alternative combination of chemical compounds and their concentrations required to reproduce a specific olfactory experience. This opens the doors for future researchers in olfactory perception by making it possible to reproduce scents by finding alternative molecules guided by certain criteria, e.g., synthesizability, production costs, if all the odorants can be considered natural molecules, etc. In this way, this work proposes an odorant recommender system based on several machine-learning techniques.

Synthesizing and reconstructing scents involves determining the appropriate mix of chemical compounds and their respective concentrations to recreate a specific olfactory experience. Therefore, the first step in the scent recreation methodology proposed in this work is obtaining a dataset with at least six cases of scent samples, referred to as the “scents dataset” gathered from [6]. This data set will serve to test the scents reconstruction methodology. Hence, a dataset containing molecular structures and their corresponding odor properties was collected and preprocessed. Each odor property cataloged included the defined chemical composition and molecular structure of the odorants (smell-producing chemical compounds) within the scent. It is essential to differentiate between a scent and an odorant: a scent is a blend of odorants.

To reconstruct a scent, one first needs to know its odor properties, i.e., its odor semantic descriptors. One can rely on an experienced perfumer, but it can be expensive. In this way, a curated dataset comprising 2230 unique odorants and their molecular structures and odor properties was constructed by web scraping, organizing, and cleaning information from the website thegoodscentscompany.com. The resulting tabulated file is denoted the curated dataset and is made public by manuscript authors at the link github.com/viniviena/Projects/blob/main/Data\%20Mining/curated_ds.csv.

The subsequent stage in scent recreation involves identifying the odor properties of each odorant in the samples to suggest potential replacements for the original odorants. Scent experts typically use semantic descriptors, such as ``floral," ``fruity," and ``woody" to classify and group similar odorants. As relying on experts to smell thousands of odorants can be very costly, a computational method that can predict the semantic descriptor of an odorant based on its features is a helpful tool. In this way, a QSOR model was developed to predict an odorant's odor properties based on its molecular structure using machine learning and Artificial Intelligence (AI). To achieve this, a Graph Convolutional Neural Network (GCNN) model was trained to classify the odor properties of odorants with known molecular structures obtained in the first step of the proposed methodology.

GCNNs are valuable for efficiently handling graph-based molecular representations. Deep neural networks build intermediate data representations, or embeddings, which are fixed-size vectors crucial for making predictions. In scent science, these embeddings help identify similarities between odorant molecules.

After training the AI model, an odorant recommender system was developed based on the QSOR model. In other words, given an odorant's molecular structure, the system will recommend a new odorant with similar odor properties. This process was done for 6 odors randomly sampled from [6] to validate the methodology. It is important to note that each odorant is classified under a particular function, such as "top, middle, and base notes," due to the dataset's origin from perfumes.

The problem of recommending new odorants given analyzed odorants in a sample is framed as a content-based recommendation problem. Content-based recommender systems provide suggestions by analyzing the features of items - here odorants. These systems recommend items with similar characteristics, thereby ensuring a tailored experience. In scents reconstruction, the tailored experience means recommending odorants with similar properties to the original sample. Content-based recommendations require defining a metric of similarity between items. Here the cosine similarity between the Sample Odorant Embeddings (SOE) and the Curated Dataset Embeddings (CDE) was calculated.

In the final stage of scent reconstruction, determining the composition of each recommended odorant is crucial. While this is challenging, knowing the molecular structure and composition eases the process. Ensuring equal headspace concentrations for original and recommended odorants, equivalent compositions can be computed using molecular weights for interpolation.

Therefore, the proposed methodology was applied to reproduce the selected 6 scents. After that, we recommended the odorant with the highest score, excluding the cases where the recommendation was the original odorant. By analyzing the suggested odorants, it is possible to observe that the AI-based recommender system suggests perceptually similar odorants to replace the original ones. For instance, for ethyl-vanillin, the recommender suggests methyl-vanillate (CAS Number 3943-74-6) with limits low and high limits for the molecular fraction very close to the original one. Ethyl-vanillin smells ``like sweet creamy vanilla caramel" and methyl-vanillate as ``warm spicy vanilla". To replace lilyall which smells like ``floral muguet watery green powdery cumin”, for instance, 3-[3-(prop-1-en-2-yl)phenyl]butanal which smells like ``ozone cortex green floral melon" was suggested. To replace ``benzoin", which smells like ``balsamic vanilla medicinal”, the system recommended "phenethyl salicylate" which smells like ``balsamic, floral, medicinal, rose".

It is crucial to acknowledge that identifying patterns in odor classification can be quite difficult, as each perfumer's choice of words to describe a scent is influenced by various personal factors. When it comes to defining possible concentrations for each suggested odorant, it becomes a simpler task because it is an objective quantity used for the calculation.

Finally, this work developed and validated a scent reproduction machine-learning-based methodology capable of accurately classifying a scent and reproducing its olfactory experience. These results can be explored by the future olfactory researcher in reproducing scents by finding alternative molecules guided by certain criteria, such as the production costs or the origin of the odorants (e.g., natural or not).

[1] “The secret of scent: adventures in perfume and the science of smell,” Choice Reviews Online, doi: 10.5860/choice.44-6247.

[2] C. Bushdid et al., “Humans can discriminate more than 1 trillion olfactory stimuli,” Science (1979), doi: 10.1126/science.1249168.

[3] C. S. Sell, “On the unpredictability of odor,” Angewandte Chemie - International Edition, doi: 10.1002/anie.200600782.

[4] X. Zhang, T. Zhou, and K. M. Ng, “Optimization‐based Cosmetic Formulation: Integration of Mechanistic Model, Surrogate Model, and Heuristics,” AIChE Journal, doi: 10.1002/aic.17064.

[5] M. A. A. Teixeira, “Perfume Performance and Classification: Perfumery Quaternary-Quinary Diagram (PQ2D®) and Perfumery Radar,” Futures, doi: 10.1002/fut.

[6] L. Dormont et al., “Human Skin Volatiles: A Review,” Journal of Chemical Ecology, doi: 10.1007/s10886-013-0286-z.

[7] P. M. Wise et al., “Quantification of odor quality,” Chemical Senses, doi: 10.1093/chemse/25.4.429.

[8] E. D. Gutiérrez et al., “Predicting natural language descriptions of mono-molecular odorants,” Nat Commun, doi: 10.1038/s41467-018-07439-9.

[9] S. S. Schiffman et al., “Measuring odor intensity with e-noses and other sensor types,” 9th International Symposium on Olfaction and Electronic Nose.

[10] A. Keller and L. B. Vosshall, “Olfactory perception of chemically diverse molecules,” BMC Neurosci, doi: 10.1186/s12868-016-0287-2.

[11] B. Sanchez-Lengeling et al., “Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules,” arxiv.org/abs/1910.10685 (accessed Sep. 19, 2020).