(346aq) Topological Data Analysis: Applications to Soft Matter and Molecular Simulations

Authors: 
Smith, A., University of Wisconsin-Madison
Chew, A. K., University of Wisconsin
Van Lehn, R. C., University of Wisconsin-Madison
Abbott, N. L., Cornell University
Zavala, V. M., University of Wisconsin-Madison
Data generated by experiments and complex simulations (e.g., molecular dynamics simulations) is often summarized using descriptive statistics (e.g., averages, moments, and correlation functions) in order to reduce complexity and to facilitate analysis [1]. Unfortunately, descriptive statistics might fail to capture important aspects of complex datasets. Specifically, statistical techniques might fail to capture key geometrical structures (e.g., complex heterogeneous domains). Interesting examples that illustrate these limitations are the anscombe quartet and the datasaurus dozen datasets [2,3]. These datasets are visually distinct (define different geometrical spaces) but have the exact same descriptive statistics (mean, standard deviation, and correlation).

Recent advances in applied topology and geometry have led to the development of a field known as Topological Data Analysis (TDA) [4,5,6]. TDA is a framework that views complex data through the lenses of geometry and topology. A particularly method of TDA, known as persistence homology, represents datasets (e.g., point clouds and images) as geometric objects and performs dimensionality reduction by projecting the data onto a low-dimensional space composed of elementary geometric objects (topological features) that persist at different scales [7,8]. The features are quantifiable and stable to basic deformations (e.g., stretching, rotation, bending) and can be used to perform different tasks (e.g., classification, regression) [9,10]. TDA methods have been applied successfully in materials science [11,12], time series and signal analysis [13,14], and bio-sciences [15,16].

This talk focuses on the application of TDA to complex datasets arising in soft matter and molecular dynamics (MD) simulations. We use TDA to characterize topological features that develop in liquid crystal films when exposed to air contaminants [17]. We also show how TDA can be used to characterize topological features of scatter fields for flow cytometry of emulsions containing liquid crystal droplets [18]. Finally, that TDA can be used to characterize the geometry of three-dimensional liquid-phase environments generated by MD [19]. For all of these studies, we show that the topological features are strongly correlated to emerging properties of interest (e.g., concentration of air contaminant or reactivity of a molecule in a solvent environment).

[1] Rahul R Shah and Nicholas L Abbott. Principles for measurement of chemical exposure based on recognition-driven anchoring transitions in liquid crystals. Science, 293(5533):1296–1299, 2001.

[2] Francis J Anscombe. Graphs in statistical analysis. The american statistician, 27(1):17–21, 1973.

[3] Justin Matejka and George Fitzmaurice. Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1290–1294, 2017.

[4] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009

[5] Herbert Edelsbrunner and John Harer. Computational topology: an introduction. American Mathematical Soc., 2010.

[6] Afra Zomorodian. Topological data analysis. Advances in applied and computational topology, 70:1–39, 2012.

[7] Robert Ghrist. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1):61–75, 2008.

[8] Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas J Guibas. Persistence barcodes for shapes. International Journal of Shape Modeling, 11(02):149–187, 2005

[9] Fred´ eric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot. The structure and stability of ´ persistence modules. arXiv preprint arXiv:1207.3674, 21, 2012.

[10] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. Discrete & Computational Geometry, 37(1):103–120, 2007

[11] Takashi Ichinomiya, Ippei Obayashi, and Yasuaki Hiraoka. Persistent homology analysis of craze formation. Physical Review E, 95(1):012504, 2017.

[12] Mickael Buchet, Yasuaki Hiraoka, and Ippei Obayashi. Persistent homology and materials in- ¨ formatics. In Nanoinformatics, pages 75–95. Springer, Singapore, 2018.

[13] Jose A Perea and John Harer. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799–838, 2015.

[14] Bernadette J Stolz, Heather A Harrington, and Mason A Porter. Persistent homology of timedependent functional networks constructed from coupled time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(4):047410, 2017.

[15] Peter M Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J Guibas, and Vijay S Pande. Persistent voids: a new structural metric for membrane fusion. Bioinformatics, 23(14):1753–1759, 2007.

[16] Hyekyoung Lee, Hyejin Kang, Moo K Chung, Bung-Nyun Kim, and Dong Soo Lee. Persistent brain network homology from the perspective of dendrogram. IEEE transactions on medical imaging, 31(12):2267–2277, 2012.

[17] Alexander Smith, Nicholas L Abbott and Victor M Zavala. Convolutional Network Analysis of Optical Micrographs. doi.org/10.26434/chemrxiv.11688924.v2.

[18] Shengli Jiang, JungHyun Noh, Alexander D Smith, Chulsoon Park, Nicholas L Abbott, Victor M Zavala. Identification of Endotoxins from Bacterial Species using Liquid Crystal Droplets and Machine Learning. Under Review, 2020.

[19] Theodore W Walker, Alex K Chew, Huixiang Li, Benginur Demir, Z Conrad Zhang, George W Huber, Reid C Van Lehn, and James A Dumesic. Universal kinetic solvent effects in acid-catalyzed reactions of biomass-derived oxygenates. Energy & Environmental Scient, 11(3):617-628, 2018.