(560gv) Rational Catalyst Design: Kinetics Put into Action for Small Open Data
line-height:150%;color:black">The development of more sustainable processes as
the main source for carbon-based molecules calls for a breakthrough in catalyst
design. To do so, the exploration of the chemical space for a new catalyst
formulation can be efficiently achieved by computers. Such computational
research can be conducted via molecular modelling or via experimental data-based
mathematical models. In the latter approach, a set of catalysts featuring highly
diverse formulations and properties (e.g. combining metals and supports) is
firstly screened experimentally prior to any mathematical model development1.
Only afterwards, the data will be exploited to extract the underlying chemical
information leading finally to the regression and/or training of a mathematical
model. In such a manner, a long period of experimental data acquisition is
required. Alternatively, the upcoming Open Data might be used.
line-height:150%;color:black">Up to now, research has been mainly focused, on
the one hand, on automating data visualization and statistical inference2
for Big Data, and, on the other hand, on the development of the models
themselves, either black-box (e.g. machine learning based) or kinetic ones,
for catalytic data. color:black"> However,
to take full advantage of the Open Data revolution, new methodologies for
scientific data mining, adapted to the characteristics of catalytic small data,
have to be developed.
line-height:150%;color:black">In Figure 1, a
line-height:150%;color:black"> general approach for data-based rational
catalyst design is described. Step 1 is the extraction of chemical (preferably
kinetic) information from the data. In Step 2, this information, together with
the data, serves as input for a mathematical model. Finally, in Step 3, this mathematical
model, via relationships with catalyst structure and/or properties, provides
guidelines for the synthesis of better catalysts (Step 4). However, suitable methodologies
to convert catalytic data into useful kinetic information (as in Step 1) or information
into knowledge on optimized catalyst structure, via catalytic data, (as in Step
3) are lacking.
Figure 1: Data-based
rational catalyst design cycle. Focus of this work: green.
line-height:150%;color:black">Hence, this work aims at shedding light at the
state-of-the-art of Open Access catalytic data and developing the suitable tools
to evaluate the underlying kinetic information and relate it to catalyst
structure or properties by making use of these small datasets.
Access catalytic data
starting point, the current storage of data on catalysis was evaluated. Open
Access storage of data has been mainly driven by recent policies from funding
bodies and publishers. Hence, to investigate the ongoing potential, the focus
should be on a reaction widely investigated in the past few years.
Hydrodeoxygenation has been receiving significant attention in the last years,
but catalyst design remains an open challenge 11.0pt;line-height:150%"> 3 line-height:150%">. By employing a broad search for whole reaction families
under hydrodeoxygenation, well-known repositories and search engines were
surveyed (Table 1). The overall number of available datasets is modest as
compared to the number of publications in the same field (ca. 2600, according
to WoS at the same moment). Particularly, if one excludes Figshare (mainly a
repository of articles preprints). In the other cases, not all search results were
relevant (e.g. data generated by molecular modelling calculations), and data were
often unstructured and flawed.
avoid">Table 1: Number of Open
Access datasets on hydrodeoxygenation (October 2018).
summary, one cannot presently rely on Open Access data, but more data sharing, better
curation and standardized formats, following the 10.0pt;line-height:150%">FAIR Data Principles 4,will turn it into reality
in the near future. Furthermore,
literature mining software, already employed in biology, can enable the access
to virtually the whole data generated till present. In that sense, the
characteristics of (upcoming) Open Access data will result from the combination
of individual datasets, as we know them today. For a
given reaction, these will result in a significant amount of data, but very far
in volume and balance from the massive volume of big data. Conversely, the existing
data visualization and inference tools, designated under the umbrella term
machine learning, have been designed for truly big data, i.e. huge amounts of
well-balanced data11. Hence, tools adapted to the considerable
smaller size of catalytic data must be developed.
way to automate kinetic information extraction
today, the first step of information extrac line-height:150%">tion (Figure 1) line-height:150%">relies solely on the researchers prior knowledge and
experience. The methodology under development aims at filling in this gap. In
practice, this means that all kinetic features (e.g. variations in conversion)
must be identified and, preferably, classified in terms of relevance. This can
be achieved via the recognition of patterns and fingerprints 5, e.g.
abrupt variations in the performance indicators. The first step is to visualize
the general trends in the dataset. Typically, a researcher draws a curve based
on his/her intuition, which represents the overall trend in the data,
acknowledging experimental error. The tool developed herein is, hence, meant to
mimic how the researcher would draw the overall trend in a dataset.
developed algorithm is based on the class UnivariateSpline in SciPy module
of Python 6. The latter consists of an iterative procedure in which
the number of piecewise polynomials (which altogether constitute a spline) is
increased until the residual sum of squares is below the defined tolerance
level. The results with fictive data (Figure 2.a and .b) indicated that the algorithm
could not adequately reproduce simple shapes. By testing data featuring different
trends, it also became clear that the optimal tolerance level depends on the dataset.
algorithm featuring lower tolerance was thus developed. To prevent overfitting,
the tolerance level is decreased until a maximal number of piecewise
polynomials is reached. In some cases, particularly for small datasets, the
piecewise polynomials can still be superfluous. Therefore, the spline is
replaced by a lower-degree polynomial if the generated trends are not
physically realistic (e.g. excessive variability) or if that polynomial yields
a goodness-of-fit sufficiently similar to the higher-degree one. In addition, a feature classification
function in terms of shape and variability was also introduced. The results are
shown for a fictive dataset with an S-shape line-height:150%">(Figure 2. 150%">c) and two real anisole hydrodeoxygenation datasets by Otyuskaya et al. 7
(Figure 2.e and .f). For the conversion, the trend could be described by two
shapes, while the selectivity could be described by a linear increase. The developed algorithm is hence able
to follow the intuitive trends in data for variable number of points, shape,
Figure 2: Performance
of state-of-the-art and developed algorithms.
color:black">From descriptors to structure: a case study
order to establish relationships between kinetically-relevant catalyst features
and parameters which can be tuned during the synthesis procedure, data diverse in catalyst
performance is required. Fortunately, in the field of Oxidative Coupling of
Methane, studies involving more than a few catalysts have been carried out
paving the way for its utilization in catalyst design even before the advent of
Open Access data. The most prominent case comprises forty-four catalysts tested
at similar operating conditions8 10.0pt;line-height:150%">, i.e. line-height:150%">small catalytic data.
match experimentally observed performances with simulated ones and potentially
draw significant relationships, line-height:150%;color:black">microkinetic simulations were carefully combined
with statistical tools. The microkinetic simulations were carried out using a
state-of-the-art model 150%">9
the catalyst descriptors by the means of Design of Experiments. By combining
the simulated catalyst with ones from the referred dataset, this resulted in
the identification of four clusters of catalysts holding distinct performances
(Figure 3). Interestingly enough,
a cluster (the blue) with optimal performing catalyst could be identified, but
no tested catalysts were included. More importantly, by comparing the descriptors
of different clusters, relationships between the composition and properties of
the catalysts tested by Kondratenko et al. 10.0pt;line-height:150%">8 line-height:150%;color:black"> and the simulated catalyst descriptors are being
3. Comparison of experimental8 (closed) and simulated (open symbols) data
for OCM catalysts at iso-operating conditions. The color code distinguishes
clusters of catalyst with comparable performance.
line-height:150%">The lack of curated and standardized Open Access data on
catalysis precludes its use at present, but the ongoing policies will overcome
this obstacle. This will generate small catalytic data which cannot for which
machine learning techniques are not adapted. To efficiently make use of such
data, a tool for automated kinetic information extraction is under development
which can, as of today, recognize the relevant patterns in small data,
mimicking the intuition of a researcher. Finally, a methodology able to extract
knowledge for catalyst design for typical catalytic data has been also
granted by Ghent University BOF (BOF18/PDO/093)
and EU commission (ERC Grant No. 615456).
1. Van der Borght, K. et al.,
Catalysts 2015, 5, 1948.
2. Chiang, L. et al., Annu.
Rev. Chem. Biomol. Eng. 2017, 8, 63-85.
3. Chen, S. et al., Renewable
and Sustainable Energy Rev. 2019, 101, 568-589.
4. Wilkinson, M. D. et al. ,
Scientific Data 2016, 3, 160018.
5. Caruthers, J. M. et al., J.
Catal. 2003, 216, 98-109.
6. Bellussi, G. et al., Catal.
Sci. Technol.2013, 3, 833-857.
7. Otyuskaya, D. et al., Energy
Fuels 2017, 31, 7082-7092.
8. Kondratenko, E. V. et al.,
Catal. Sci. Technol. 2015, 5, 1668-1677.
9. Pirro, L. et al., Ind.
Eng. Chem. Res. 2018, 57, 16295-16307.