P06-05

Improved Method of Predicting Protein Allosteric Site Based on Atomistic Bond-to-bond Interaction by Using GNN

Chaowen OU *Takashi ISHIDA

School of Computing, Tokyo Institute of Technology
( * E-mail: ocw24680@gmail.com )

Allosteric effect is a fundamental aspect of protein function, where the binding of a ligand at a site distinct from the active site—known as an allosteric site—induces a conformational change that affects protein activity. This mode of regulation presents a promising avenue for drug discovery, as allosteric modulators offer the potential for high specificity and reduced side effects compared to orthosteric drugs, which target the active site directly. However, identifying allosteric sites is challenging due to the complexity and dynamic nature of protein structures.
Traditional methods for discovering allosteric sites have relied on extensive experimental techniques, such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. While these methods provide valuable insights, they are often time-consuming and resource-intensive. Recent advances in machine learning, particularly deep learning, have revolutionized the field of computational biology. Graph convolutional networks (GCNs) have emerged as powerful tools for modeling complex interactions within biological systems.
The key innovation of our method lies in the use of energy-weighted graphs to model the strength and distribution of atomic interactions within proteins. By focusing on the energetic properties of bonds, we aim to identify regions of the protein structure that are critical for allosteric signaling. Our model integrates features from both the protein graph and candidate pocket features, allowing for a comprehensive analysis of potential allosteric sites.
To train and validate our model, we utilized the Allosteric Site Database (ASD), a curated collection of experimentally verified allosteric sites. We applied rigorous preprocessing steps to ensure the quality and relevance of the data, resulting in a dataset that is well-suited for machine learning applications. Our approach addresses the challenge of class imbalance—a common issue in biological datasets—by employing a sampling strategy that enhances the representation of positive allosteric site samples.
The evaluation of our method demonstrates its superior performance compared to existing models, such as PASSer2.0, across various metrics, including accuracy, precision, recall, and F1 score. The proposed method not only achieves higher accuracy but also significantly improves recall, indicating its enhanced ability to detect true positive allosteric sites.
In conclusion, our research presents a novel and effective approach for predicting protein allosteric sites using deep learning techniques. By integrating graph-based protein representations with energy-weighted bond interactions, we provide a powerful tool for identifying potential drug targets. This work advances the computational prediction of allosteric sites and underscores the importance of considering atomic-level interactions in understanding protein function and regulation.