(109b) Topological Data Analysis: Concepts, Computation, and Applications | AIChE

(109b) Topological Data Analysis: Concepts, Computation, and Applications

Authors 

Smith, A. - Presenter, University of Wisconsin - Madison
Zavala, V. M., University of Wisconsin-Madison
Statistics, machine learning, and signal processing are the dominant paradigms used to analyze data; unfortunately, such techniques provide limited capabilities to analyze certain types of datasets. A couple of interesting examples that illustrate this limitation are the anscombe quartet [1] and the datasaurus dozen [2] datasets. These datasets are visually distinct (define a different geometrical object) but they have the exact same descriptive statistics (e.g., mean, standard deviation, and correlation).

The recent application of algebraic and computational topology to data science has led to the development of a new field known as Topological Data Analysis (TDA) [3]. TDA techniques are based on the observation that data can be interpreted as elements of a geometrical object; as the name suggests, TDA utilizes techniques from computational topology to quantify the geometry of data [4]. Fundamentally, topology studies geometric and spatial relations that are persistent (are stable) in the face of continuous deformations of an object (e.g., stretching, twisting, and bending). This perspective provides multiple advantages over other techniques [3, 5]:

• Topology studies the geometry of the data in a manner that is independent of the chosen coordinates.

• Topology studies the geometry of the data in a way that minimizes sensitivity to the metric chosen.

• Topology generalizes well to high-dimensional spaces.

The main focus of this talk is a technique in the field of TDA that is known as persistence homology [6, 7]. The goal of persistent homology is to extract topologically dominant features within the data in the form of basic features such as connected components, holes, loops, and voids. This feature information can be quantified and leveraged by statistical and machine learning techniques to perform regression, classification, hypothesis testing, and clustering tasks [8, 9, 10, 11, 12, 13].

TDA can be seen as dimensionality reduction technique that maps data from its original high-dimensional space to a low-dimensional space that it is easier to understand and visualize. This is similar in spirit to principal component analysis (PCA), which projects the data into a low-dimensional space by extracting latent variables (principal components) that contain information in terms of variance. In TDA, the latent variables are homologies that contain information in terms of topological features.

In this talk, we review orelevant concepts and computational methods of TDA from the perspective of chemical engineering applications. We show how to apply persistent homology to analyze datasets described by point clouds and functions in high dimensions and we discuss fundamental stability results of topological features in the face of perturbations. We present multiple case studies with complex synthetic and experimental datasets to demonstrate the concepts and advantages of TDA. Specifically, we show that TDA extracts informative features from complex datasets that correlate strongly with emerging features of practical interest. For instance, we demonstrate the application of these techniques in the analysis of time series and state space geometry, and in the geometric analysis of 2-dimensional diffusion scalar fields. Our work seeks to open new research directions and applications of TDA in chemical engineering.

[1] Francis J Anscombe. Graphs in statistical analysis. The american statistician, 27(1):17–21, 1973.

[2] Justin Matejka and George Fitzmaurice. Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1290–1294, 2017.

[3] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009

[4] Herbert Edelsbrunner and John Harer. Computational topology: an introduction. American Mathematical Soc., 2010.

[5] Afra Zomorodian. Topological data analysis. Advances in applied and computational topology, 70:1–39, 2012.

[6] Robert Ghrist. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1):61–75, 2008.

[7] Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas J Guibas. Persistence barcodes for shapes. International Journal of Shape Modeling, 11(02):149–187, 2005

[8] Peter Bubenik. Statistical topology using persistence landscapes. arXiv preprint arXiv:1207.6437, 3, 2012.

[9] Peter Bubenik, Gunnar Carlsson, Peter T Kim, and Zhi-Ming Luo. Statistical topology via morse theory persistence and nonparametric estimation. Algebraic methods in statistics and probability II, 516:75–92, 2010.

[10] Peter Bubenik and Paweł Dłotko. A persistence landscapes toolbox for topological statistics. Journal of Symbolic Computation, 78:91–114, 2017.

[11] Andrew J Blumberg, Itamar Gal, Michael A Mandell, and Matthew Pancia. Robust statistics, hypothesis testing, and confidence intervals for persistent homology on metric measure spaces. Foundations of Computational Mathematics, 14(4):745–789, 2014.

[12] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. Persistence images: A stable vector representation of persistent homology. The Journal of Machine Learning Research, 18(1):218–252, 2017.

[13] Jan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt. A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4741–4748, 2015.