(3ht) Protein Structure Prediction Using Equivariant Convoluted Networks with Applications in Drug Design and Next Generation Biomaterials | AIChE

(3ht) Protein Structure Prediction Using Equivariant Convoluted Networks with Applications in Drug Design and Next Generation Biomaterials


Chowdhury, R. - Presenter, Harvard Medical School
Research Interests

Over the past six decades, researchers have been able to discern and report three dimensional geometries of proteins in labs using experimental techniques like cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography. However, each method depends on a lot of trial and error, which lead to both time and monetary overhead (often thousands of dollars per protein structure). This is why biologists are turning to AI-based methods as a proxy to this cumbersome process for proteins. A sophisticated protocol will enable accelerated research in drug design and even determining novel biocatalytic scaffolds, and other biomaterials. To this end, there exists three state of the art tools (as of March 2020) – AlphaFold (Google, DeepMind), and trRosetta (David Baker Lab), and RGN (from our lab by Mohammed AlQuraishi) which are able to use convoluted and/or recursive geometric neural networks to predict most probable dihedrals from a sequence, which is when used to build a protein structure.

However, none of these methods are able to resolve (a) loop structures, and (b) cis/ trans orientations of amino acids (because it is difficult two learn from the bi-modal distribution of w (omega), which is 0° for cis and 180° for trans amino acids). We hereby propose a novel method which will learn rotations and torsions, in addition to inter-residue distances and dihedrals to predict a distogram which not only encodes information between residue i and i+1, but also information about all possible NC2 information using an equivariant neural network (see Figure 1). Subsequently, the best fit Ca-trace would be obtained that meet the distance, dihedral, rotational, and torsional constraints. Finally, in-built functions of PyRosetta would be used to build a PDB structure of the protein with appropriate rotamer-repacking to obtain a lowest energy structure.

My primary focus would be to devise efficient computational protein design tools that will first treat sequence as language A and structure as language B and then discern rules that govern sequence-dependent protein folding, function, and stability. I will then use these principles to design efficient biophysical channels for aqueous separation of solutes or DNA sequencing, ion-channels and kinases as drug targets for various diseases, novel enzymatic scaffolds for production of industrially relevant chemicals, and biomimetic scaffolds for chimeric antigen receptor T-cell therapy (CAR-T).

Teaching statement

Engineering curricula equips students with technical know-how on conventional aspects that the particular engineering stream pertains to. However, the emergence of interdisciplinary approaches in solving modern day industrial and academic research goals has silently mandated students to opt
for several online learning platforms (such as Coursera for coding skills). Chemical engineering, specifically, falls under this paradigm as chemical engineers are hired to work in sectors ranging across biopharmaceuticals, systems optimization, process engineering, fabrication of biomaterials or computer chips, production of paints and cosmetics. Thus, it is imperative to strike a balance between fundamentals and application-oriented courses in engineering which would be beneficial for students in their professional careers. To this end, I think, incorporating courses that are designed to: (a) include discussion, group projects, and lectures that introduce emerging technologies in industry and academia alike (for example, the introduction to machine learning in a process optimization course), (b) encourage lateral thinking on several topics, (c) by cross listing them across engineering and sciences to enable discourse between diverse students, and (d) aid students’ holistic growth by ensuring lifelong learning.
Discussion about emerging technologies/contemporary issues: When I was teaching ‘Design of Chemical Plants’ (course ID: CHE470) at Penn State, I figured that most of the students, while doing cost analysis of a certain process plant they had designed, had least idea about whether operating costs various units (such as scrubber, pumps) were economically feasible or not. I felt the pressing need of at least a single introductory lecture in this course which would inform students about the modern industrial practices in terms of process plant layout, and overall operation costs of some leading companies (from publicly reported datasheets). This is aligned with the ABET learning outcome ‘j’, which focuses on the knowhow of contemporary issues over and above adding the dimension of making them aware of industrial standards and practices. I will put in a concerted effort to do a survey on industrial (along with academic) practices and company figures and thereafter put together lecture material on the same besides discussions on contemporary technologies/standards throughout the duration of the courses. Finally, I will draft course projects, which include comparative analyses on technologies of contemporary interests. Lateral and critical thinking: Decision making is a crucial aspect for any engineering role, be it for an industrial process optimization or an academic research project. Based on my experience as a Teaching Assistant for ‘Design for Chemical Plants’ course, a collaborative, knowledge-sharing environment promotes better decision making and holistic understanding of the problem to be solved and the tools at hand. This active learning practice facilitates critical thinking to solve design problems and also meet two ABET learning outcomes ‘c’ and ‘k’ (namely, ability to design systems and usage of modern tools for engineering practices).

Prospective Courses to Teach: Owing to both my undergraduate and Ph. D major being Chemical Engineering, I am comfortable teaching a broad range of core chemical engineering courses and special courses pertaining to mathematical modeling, optimization, biophysics, graph theory,
molecular mechanics, protein engineering and metabolic engineering. From among the core chemical undergraduate courses, I am particularly confident of teaching chemical kinetics, numerical methods, heat and mass transfer. Additionally, I would be comfortable developing course
materials at the interface of optimization and protein engineering ranging from optimization in biological networks, protein engineering, machine learning, statistical analysis and systems biology. I believe the proposed courses will not only complement the standard chemical engineering courses offered but also students can benefit from learning about the role of optimization in modern pharmaceutical industry. The proposed list of courses has been enumerated below.

Optimization in biological networks will cover history and theory of resource allocation, algorithm formulation, linear algebraic nomenclature, implementation scripts (GAMS or Python/3.6) and introduction to optimization solvers (CPLEX, gurobi), and application of these frameworks to
encode and analyze cellular metabolite interconversion networks (referred to as genome-scale models), protein interaction networks, signal transduction networks, and ME models.

Machine learning and optimization in protein engineering will cover a foundation of machine learning approaches (recursive neural networks, recursive geometric neural networks, convoluted neural networks) in context to protein structure prediction. Next, it will explain how to use existing machine learning tools published in GitHub. Finally, there will be lectures and projects that will involve protein design tasks using integer optimization, Monte Carlo sampling, rotamer repacking, and introduction to energy-minimization using molecular mechanics and knowledge-based Rosetta potentials.

Fundamentals of peptide-therapeutics will be a project-based foray into peptide-based drug design and will first involve a few lectures that build upon basic undergraduate biochemistry and introduce protein-protein docking which will subsequently introduce the design objective of peptide/ antibody design that would form complexes with another protein. Then, the students will be provided the code of RosettaAntibodyDesign (or OptMAVEn-2.0), and they would be first split into groups where they would design an antibody against a target antigen, analyze the results by looking at structures in PyMOL, and then present in front of the rest of the class their findings.

I will try to establish strong mentor-mentee relationship with students by not only implementing the proposed plans (as deemed fit per the department’s outlook) but also offering one-on-one career-advising sessions to any student in the department and anyone who has enrolled in my
courses. Furthermore, I will implement a feedback system where I will discuss all course-related inputs twice during the span of the course to the benefit of the students.