(284c) Hierarchical Graph-Based Representation Drives Prediction of Stapled Peptide Drug-like Properties | AIChE

(284c) Hierarchical Graph-Based Representation Drives Prediction of Stapled Peptide Drug-like Properties


Bilodeau, C., Massachusetts Institute of Technology
Thurber, G., University of Michigan
Sequence-based models like AlphaFold and UniRep have led to transformational improvements in globular protein structure prediction and quantitative function prediction, respectively. However, there are a wealth of applications where proteins based on the 20 canonical amino acids are insufficient to address biomedical, biosensing, and biotherapeutic challenges. These non-standard modifications often include modifications beyond the linear information contained in protein sequences, such as intramolecular disulfide bonds, which are difficult to incorporate in sequence-based models. One such application is stapled peptides, chemically modified peptides that overcome pharmacokinetic barriers of the human body necessary to serve as efficacious therapeutics; these barriers include maintaining structural integrity through high proteolytic activity, penetrating the highly hydrophobic cellular membrane, and having the affinity and specificity to efficiently engage disease related proteins. To overcome these challenges, a “stapled” peptide can be formed by covalently linking two amino acids, locking the peptide into its alpha-helical state. Recently, the Thurber Lab has pioneered a technique known as Stabilized Peptide Engineering by E. coli Display (SPEED) to rapidly characterize and develop >109 unique stapled peptide therapeutics, minimizing the need for expensive and low-throughput solid phase synthesis. However, the process of narrowing the selection of peptides from 109 members to a small handful for in vitro or in vivo analysis is non-trivial.

In this work, we design and validate a hierarchical, graph-based model to predict and optimize properties of stapled peptides and apply it to identify lead compounds for translation to in vitro models. Because the complex topology of stapled peptides is challenging to represent, we employed a message passing graph neural network (MP-GNN) to produce a machine interpretable, fully differentiable representation. First, canonical and non-canonical amino acids alike are represented as vectors, instantiated with quantum chemical descriptors like charge, hydrophobicity, and bond order. Then, information about local chemical environments is encoded by passing messages between atoms, updating the vector representations at each step. Next, because a peptide’s properties are a function of its amino acids, each amino acid is represented as a sum of its atom vectors and the same message passing process is repeated at the amino acid level. This encodes global information about the peptide, which theoretically includes information such as intramolecular NH3+ - COOH- salt bridges between charged amino acids and π-π interactions between aromatic amino acids.

We demonstrate the power of this model by designing stapled peptides towards the Bcl-2 proteins, or B cell lymphoma 2 proteins, which regulate apoptosis within cells and are often overexpressed in cancer cells. Because these proteins are highly related in sequence but play distinct roles in apoptosis regulation, methods and models that optimize specificity are in great demand. We designed a library of stapled peptides that inhibit Bcl-2 proteins and therefore induce cancer tells to undergo apoptosis, screened it for desired properties using the bacterial cell surface, and generated quantitative property labels using Next Generation Sequencing. These labels are combined with the unsupervised representation from the MP-GNN of stapled peptides to train and validate the model. Using this model, we can explore the sequence-function space of stapled peptides. Finally, lead compounds are identified by optimizing peptides to the frontier of idealized target properties. Importantly, the peptides identified through modeling had improved affinity and specificity compared to ones identified directly through sorting. The ability to improve affinity, specificity, and stability of stapled peptides demonstrates the value of a model that can handle proteins with geometry beyond naturally occurring proteins. This model could eventually lead to better prediction and generation of similar molecules, such as cyclic peptides or glycoproteins.