(190h) Linking Phenotypes to Genotypes By Population-Based Mathematical Algorithms and Clustering Methods

San Miguel, A., Georgia Institute of Technology
Shen, K., Stanford University
Lu, H., Georgia Institute of Technology

Since the advent of fluorescent biomolecules to study gene expression as well as protein localization and function, biological studies have relied heavily on microscopy. Furthermore, current high-throughput techniques enable the acquisition of large data sets where the relevant information is hidden and difficult to analyze. One approach to tackle these large data sets is to rely on mathematical and statistical approaches to extract relevant information, particularly in problems where phenotypes are subtle and the characteristic features are not known a priori.

Here, we take advantage of logistic regression models and clustering algorithms to analyze large data sets of phenotypic profiles of synaptic patterning in C. elegans.  Recently, we developed high-throughput technologies that enable quantitative characterization of subcellular fluorescent reporters of synapses in live C. elegans. By incorporating microfluidics, computer vision and automation, we performed automated image acquisition and genetic screens where animals with interesting phenotypes are isolated. Particularly, we applied this system to find mutants with very subtle phenotypic differences in synaptic patterning. The phenotypic differences of the novel mutants range from a reduction in the number of micron-sized synaptic puncta, to subtle differences in puncta density. However, without prior knowledge of the phenotypic differences, establishing whether these are true mutants and where these differences may reside is exceedingly difficult when dealing with a multidimensional feature framework. Moreover, due to the stochastic nature of gene expression and environmental noise, isogenic populations of animals show a spread in the phenotypic landscape, making the identification of true subtle mutants exceedingly difficult. This is especially true when the features of interest are micron-sized synaptic sites. From our automated genetic screens we have obtained several putative mutants. With the aim of elucidating whether these are true mutants of interest, and predicting which genetic pathways could be altered in them, we apply two separate mathematical approaches to analyze phenotype-to-genotype relationships from a whole-population phenotypic profile perspective. Through statistical and mathematical analysis we are able to discern whether population-based differences are significant and can thus discriminate true subtle mutant phenotypes from false positives. Moreover, we apply custom clustering techniques to identify whether the true mutants identified correlate to known genetic pathways.

This method enables identification of genotypes of interest by comparing whole-population phenotypic profiles, while allowing prediction of putative altered genetic pathways.