O02_06

Fragment descriptors in predictive modeling for molecules and reactions

Pavel SIDOROV *

Hokkaido University, Institute for Chemical Reaction Design and Discovery (ICReDD)
( * E-mail: pavel.sidorov@icredd.hokudai.ac.jp )

The application of machine learning methods in chemistry requires encoding chemical structures as vectors of numbers (features or descriptors) to then train predictive models on them. Currently, there are many types of descriptors that are used in the field, from physico-chemical parameters to those derived from 2D or 3D structures of molecules. However, calculation of such parameters may be complicated and render the modeling process too slow. Fragment descriptors encode molecules as number of occurrences of different substructures, allowing to directly translate the chemical structure into numerical format. Moreover, these are derived from 2D structures (molecular graphs), which allows for cheap and fast modeling. In this presentation, we show two examples of the application of fragment descriptors in predictive modeling in chemistry. First, the prediction of absorption spectra of photoswitches was facilitated by using 2D fragment descriptors in collaboration with Dr Hashim (RIES). In a benchmark with other 2D descriptors fragments have shown superior performance. Second, a collaboration with Dr Tsuji (WPI-ICReDD, List group) involved modeling and design of novel potent catalysts in organic synthesis. While traditionally costly 3D calculations are used for such tasks, we managed to predict a new highly selective catalyst by using simple 2D fragments for both catalysts and reactions. Moreover, these descriptors allow to use methodologies for model interpretation, facilitating the rational and guided design of new compounds with desired properties. For example, in the latter case, the ColorAtom technique was used to explain which substructures are most important for ensuring high selectivity of new catalysts.