(633d) Developing a Machine Learning Model for Fitness Prediction in Multiplex Knockout S. Cerevisiae mutants | AIChE

(633d) Developing a Machine Learning Model for Fitness Prediction in Multiplex Knockout S. Cerevisiae mutants

Authors 

Zhao, H., University of Illinois-Urbana
Metabolic engineering of yeasts allows for modification of industrial relevant phenotypes including robustness phenotypes like improved growth rates and resistance to feedstock broth inhibitors. Traditional in silico design tools relying on mechanistic representations of the cell have shown some success, but they typically represent only a fraction of the total genome. The most updated genome scale model for Saccharomyces cerevisiae covers ~20% of the genome. The most common approach for engineering robustness phenotypes is through adaptive laboratory evolution (ALE), where successive passages of mutating cells are selected for improved growth phenotypes. While ALE has proven its reliability in engineering growth phenotypes, it requires lengthy experimentation that can last months and whole genome next generation sequencing (NGS) analysis of checkpoint mutants to identify mutations that must then be reintroduced to the parent strain to deduce causal mutations. Additionally, ALE is restricted to evolving phenotypes that are dependent on growth and it can only take advantage of single nucleotide polymorphisms and small insertions or deletions. Recent activity in machine learning for biosystems design has demonstrated that a data driven approach can be used in making improved predictions for engineering desirable phenotypes. We aimed to construct a genome scale machine learning model that accounts for nearly the entire genome (~5,845 genes in Saccharomyces cerevisiae) which can predict targets for improved fitness without ALE experimentation or NGS analysis. To do so, we developed a multi-relational graph neural network (GNN), to predict high fitness Saccharomyces cerevisiae multiplexed knockout mutants, and an associated experimental pipeline for constructing and profiling mutants. Specifically, we have combined gene-gene interactions, protein-protein interactions, the regulome, gene ontology, and mined gene features, in a gene-graph to predict growth phenotypes for yeast with n-order gene deletions. The GNN models genes as nodes and their corresponding interactions as edges. The GNN has been generalized to construct representations at node, edge, and global levels of the gene-graph. Growth prediction corresponds to a global prediction on the gene-graph. The machine learning problem is formatted as a ranking problem where the GNN provides a rank-ordered list of gene deletions for high fitness mutants. When compared against the known top 200 fitness mutants, preliminary results achieved a breakeven precision of 0.59. High ranked mutants from the GNN model are then profiled with a liquid growth assay and compared against colony size measurements from synthetic genetic array data. In addition to global fitness prediction, the node representations from the GNN are used to suggest gene expression modifications, unachievable by ALE mechanisms, to confer tolerance against furfural, a common chemical inhibitor found in lignocellulosic hydrolysates. We envision this GNN serving as an intermediate design tool that can warn against genetic designs with high-order synthetic lethality when constructing multiplexed knockout mutants and for suggesting tuned genetic expression to maintain mutant fitness under environmental stress. Importantly, the GNN feature representation is generalized to be transferable to other system biology and metabolic engineering prediction tasks that are unrelated to growth phenotypes. Some of these prediction tasks could include, but are not limited to, gene function classification for under characterized genes, existence of novel regulation interactions, and other global cell states like changes in cell morphology resulting from gene knockouts.