P06-07

Compound Retrosynthesis Analysis Using Consensus Estimate

Akira SHINOHARA *Takashi ISHIDA

Department of Computer Science, School of Computing, Tokyo Institute of Technology
( * E-mail: shinohara.a.ae@m.titech.ac.jp )

Research on compound synthesis methods is one of the important themes in organic chemistry. Retrosynthesis analysis, a method that designs synthetic routes by repeatedly performing chemically rational cleavages until the target compound becomes easily and inexpensively obtainable compounds, is a very useful analysis for designing synthetic routes. Therefore, to improve the prediction accuracy of each step in multi-step retrosynthesis analysis, many studies have been conducted on single-step retrosynthesis analysis, which only considers one-step retrosynthesis reactions. Single-step retrosynthesis analysis can be broadly categorized into two types based on the use of templates: "template-based" and "template-free" methods.
Template-based methods perform well for predictions that reference templates, but they lack generalization ability and require time and effort to create templates. As a result, since 2017, the development of template-free methods using machine learning has been frequently conducted. While template-free methods resolve the disadvantages of template-based approaches, they tend to have slightly lower prediction accuracy. Furthermore, within each method, there are differences in techniques such as the use or non-use of Atom-Mapping, the utilization of SMILES, the use of substructures, and the use of graph structures. Both methods have their pros and cons, making it difficult to develop a model that serves as a compromise between the two.
In this research, we propose a method that utilizes consensus estimate to improve the accuracy of single-step retrosynthesis analysis. we select several template-based and template-free models and obtain their respective prediction results. Then, for the compounds predicted by all models, we adjust their rankings by methods such as taking the average of their ranks across all models, and perform re-ranking. When comparing the prediction accuracy obtained through this process with that of the original models and the most accurate model reported in the literature, we confirmed an improvement in accuracy.