O07_03

Efficient docking simulation-based generation of bioactive compounds with deep generative models

Hideto HOSHINO *, Li CHEN, Yamanishi YOSHIHIRO

Graduate School of Informatics, Nagoya University
( * E-mail: hoshino.hideto.g7@s.mail.nagoya-u.ac.jp)

The identification of bioactive compounds that regulate the function of a therapeutic target protein is important in the drug development, but conventional experimental methods are costly and time-consuming. Thus, deep learning-based structure generators have been studied as a more efficient method[1]. Most of the previous studies use a quantitative structure-activity relationship (QSAR) model trained on chemical structures and the corresponding bioactivities in the reward function of structure generator. However, if there is insufficient bioactivity data used for training a QSAR model, the accuracy of the QSAR model tends to be low, and the quality of the newly generated compounds generated by the structure generator tends to be poor. A possible solution is to perform docking simulations[2] for calculating the binding affinity with the three-dimensional structure of a therapeutic target protein in the reward function of the structure generator, but it requires huge computational costs.
In this study, we developed an efficient docking simulation-based structure generator that generates new bioactive compounds with high binding affinity to a therapeutic target protein, which was made possible by incorporating a binding affinity QSAR model into a pure transformer encoder-based generative adversarial network (TenGAN)[3]. First, docking simulations against a given target protein were performed with pre-selected compounds. Next, a QSAR model was trained to predict binding affinity scores from the chemical structures using the binding affinity scores calculated by the docking simulations. Finally, new compounds with high binding affinity were generated using the predicted binding affinity as a reward with reinforcement learning in the structure generator. As a case study, we showed the usefulness of the proposed method in the design of new bioactive compounds for various target proteins such as epidermal growth factor receptor (EGFR). The introduction of the binding affinity QSAR model eliminated the need for docking simulations during the GAN training, which enabled a significant reduction in computational cost. For example, the proposed method requires just one day for generation of 5,000 compounds, while conventional methods require approximately 280 days. The proposed method is expected to be useful for rapid structure design of bioactive compounds for any target proteins for which three-dimensional structures are available.

[1] Kaitoh et al, Journal of Chemical Information and Modeling, 61, 4303-4320, 2021
[2] Jerome et al, Journal of Chemical Information and Modeling, 61, 3891-3898, 2021
[3] Li et al, International Conference on Artificial Intelligence and Statistics, 238, 361-369, 2024