P03-17

Natural product-like compound generation with chemical language models

Koh SAKANO *Kairi FURUIMasahito OHUE

School of Computing, Institute of Science Tokyo
( * E-mail: sakano@li.c.titech.ac.jp )

Natural products are substances created by organisms in nature, often known for their biological activity and diverse structures. While drug development using natural products has been a common practice for many years, these compounds' complex structures pose significant challenges in determining their structure and synthesizing them. When compared to the more efficient high-throughput screening of synthetic compounds, natural products drug discovery tends to be avoided in terms of the cost.
In recent years, deep learning-based methods have been applied to the generation of molecules. Particularly, chemical language models, applications of natural language processing technology to the field of chemistry, have made remarkable progress. In this study, we fine-tuned pre-trained chemical language models on a natural product dataset and generated natural product-like compounds.
A total of 100 million molecules were generated, and the results showed that the distribution of the generated compounds was similar to that of natural products. The effectiveness of the generated compounds as drug candidates was also evaluated. This study proposes a method to explore the vast chemical space and reduce the time and cost of natural product drug discovery.