O01_04

Scaffold-retained molecule generation considering gene expression profiles with deep learning

Yuki MATSUKIYO *1, 2Yuko SAKAJIRI2Kaho YAMABE3Saki OHSHIMA3Chika TOHZAWA4Tomokazu SHIBATA1Ryusuke SAWADA5Takuya OKADA3, 4Hisashi MORI6, 7Naoki TOYOOKA3, 4Yoshihiro YAMANISHI2

1Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology
2Department of Complex Systems Science, Graduate School of Informatics, Nagoya University
3Graduate School of Pharma-Medical Sciences, University of Toyama
4Faculty of Engineering, University of Toyama
5Department of Pharmacology, Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University
6Department of Molecular Neuroscience, Faculty of Medicine, University of Toyama, 7Research Center for Idling Brain Science, University of Toyama
( * E-mail: matsukiyo.yuki566@mail.kyutech.jp)

In the lead optimization process, molecular substructures are optimized while a molecular scaffold (i.e., core structure) is retained to improve molecular properties. Conventionally, the lead optimization is performed by experimental methods that are resource- and time-consuming. To perform this process more efficiently, deep learning models such as recurrent neural network (RNN) and variational autoencoder (VAE) have been utilized for scaffold-retained molecule generation (i.e., generating molecular structures while retaining a scaffold). However, previous scaffold-retained molecule generation methods primarily used chemical information and did not consider biological information [1-3]. Some previous molecule generation methods integrate comprehensive biological information within the cell into the molecule generation by using gene expression profiles, but they are specifically for molecule generation from scratch [4,5]. Therefore, to the best of our knowledge, there is no study on scaffold-retained molecule generation using gene expression profiles.
In this study, we present a novel computational method to generate molecules from gene expression profiles in a scaffold-retained manner. The proposed method consists of two deep learning models (i.e., VAE and RNN). The VAE was used to extract latent vectors from gene expression profiles and the RNN generated new chemical structures from the latent vectors in a scaffold-retained manner. We also carried out docking simulations of generated molecules to obtain molecules with desirable binding affinity to a target protein. We used the proposed method to design a novel inhibitor of glutaminase 1 (GLS1), an attractive target for anticancer treatments [6]. We synthesized and experimentally evaluated some generated molecules, revealing that one of them is a very promising novel GLS1 inhibitor. These findings highlight the great potential of gene expression data-driven molecule generation in the lead optimization process.

[1] He, J. et al. J.Cheminform. 2021, 13, 26.
[2] Langevin, M. et al. J. Chem. Inf. Model. 2020, 60, 5637−5646.
[3] Kaitoh, K. and Yamanishi, Y. J. Chem. Inf. Model. 2022, 62, 2212-2225.
[4] Méndez-Lucio, O. et al. Nat. Commun. 2020, 11.
[5] Kaitoh, K. and Yamanishi, Y. J. Chem. Inf. Model. 2021, 61, 4303-4320.
[6] Okada, T. et al. Bioorg. Med. Chem. Lett. 2023, 93, 129438.