P06-08

Development of docking simulation with high-speed graph neural network scoring function

Kohei HOASHI *, Takashi ISHIDA

Department of Computer Science, School of Computing, Tokyo Institute of Technology
( * E-mail: hoashi.k.aa@m.titech.ac.jp )

Docking simulation is a primary method for narrowing down candidate compounds when developing new drugs. It predicts the binding poses between a protein and a ligand and their binding affinity. The binding pose is optimized in the simulations using search algorithms to minimize its binding affinity. The binding affinity is generally calculated using a scoring function from the binding poses.

There are now two types of scoring functions: the classical method and machine learning (ML)- based. The classical methods use a manually designed liner equation, including several terms representing chemical and physical properties involved in binding. In contrast, ML-based methods directly output predicted binding affinity via ML models. Accuracy is one of the scoring function's most critical factors, but execution speed is also essential because it is executed many times during docking. Many ML-based methods have been reported to offer superior prediction accuracy. However, classical methods are considered faster due to their lower computational requirements.

Although using ML-based scoring methods will improve the accuracy of docking simulation, existing docking simulations use a classical scoring method because of the calculation speed. Recently, several ML-based scoring methods that are fast enough for docking simulation have been proposed. GenScore is an ML-based method using a mixture density network. The mixture density network makes it possible to express the score for each pair without using ML. The parameters used in the mixture density network are learned by a graph neural network.

In this research, we used GenScore as a scoring function and AutoDock Vina as a docking engine because it is one of the most widely used docking tools. Docking using the proposed methods involves two steps. First, GenScore calculates the parameters of the mixture density network using the protein and ligand structures as input. This step is performed only once for each new protein-ligand pair. Subsequently, the calculated parameters are loaded into AutoDock Vina, and docking is executed using GenScore's score calculation.

We use the Posebusters benchmark dataset for evaluation, which has contained complexes from PDBbind since 2021. This dataset doesn't overlap with GenScore's training dataset. This evaluation differs from GenScore in terms of timing. They used GenScore for re-docking and re-ranking. However, we use it for docking. It often causes the appearance of binding poses far away from the native pose.