P03-13

Data-driven design of visible-light photoswitches using structural features

Said BYADI *Pavel SIDOROV

Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University
( * E-mail: saidbyadi@icredd.hokudai.ac.jp )

In the current study, we present a exhaustive computational approach to predict two properties, λmax (the wavelength with maximum light absorption) and t1/2 (the thermal half-life of a metastable photoisomer), of visible-light photoswitches by using quantitative structure-property relationship modeling (QSPR). Photoswitches, which undergo reversible changes in structure and properties when exposed to light, have important applications in materials science and biology. Traditional methods for predicting these properties rely on time-consuming density functional theory (DFT) calculations, which led to the need for more efficient computational techniques.

To address this, we developed machine learning (ML) models leveraging a robust dataset of azobenzenes and azoheteroarenes collected from literature sources, comprises 798 unique compounds with measured absorption maxima and 134 compounds with measured half-lives. The ML models utilize structural descriptors (including CircuS fragments, Morgan fingerprints, and other structural and topological parameters) derived directly from 2D representations of the compounds, allowing for faster modeling processes. We successfully conducted a rigorous benchmark investigation to identify the most relevant structural descriptors for predicting λmax and t1/2. To build and validate our models we used the Descriptors and Optimization tools (DOPtools) platform as a powerful Python library for calculation of chemical descriptors and hyperparameters optimization of three methods SVM Random forest and XGboost [*]. Our selected descriptors incorporated molecular fingerprints and fragment counts, which were used for models' training. The best-performing model was validated by repeated 10-fold cross-validation and demonstrated similar predictive precision to density functional theory (DFT) calculations, but with significant reduction of inference time. The machine learning method employed in this study was Support Vector Machines (SVM), chosen for its ability to handle smaller datasets with high accuracy. The best predictive accuracy for the absorption maximum (λmax) was achieved using models based on CircuS fragments. 

Our study demonstrates the potential of QSPR modeling in predicting the key properties of photoswitches with a good precision. This advancement exhibits a significant step toward fast and efficient design of functional materials, with new implications on diverse scientific and technological applications.

[*]: ChemRxiv: https://chemrxiv.org/engage/chemrxiv/article-details/6694790901103d79c508aaea. DOI 10.26434/chemrxiv-2024-23v3c.