P03-15

Development of the data management system to acquire the strategic data for AI

Miwa SATO *¹, Mari OHTA¹, Shion HOSODA¹, Akira KIMURA², Takahiro MIMORI², Michiaki HAMADA², Daisuke KIGA², Kazuhide AIKOH¹, Miaomei LEI¹, Tanabe MAIKO¹, Ito KIYOTO¹, Akihiko KANDORI¹

¹Center for Exploratory Research, Research and Development Group, Hitachi, Ltd.
²Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda university
( * E-mail: miwa.sato.jr@hitachi.com )

Greenhouse gases are one of the main causes of climate change, and reducing their emissions is a global challenge. To achieve carbon neutrality, we are working on the development of substance production technologies that utilize biological functions. In the development of enzymes responsible for substance production, it is necessary to design their amino acid sequences appropriately. Artificial modification requires many trials due to the complexity of biological functions, so the DBTL cycle is repeated: design the enzyme sequences (Design), build the actual enzyme (Build), evaluate the desired function (Test), interpret the results (Learn), and feed back to the enzyme design again. The DBTL cycle is repeated. Since there are countless combinations of amino acids and the search space is enormous, AI is expected to speed up the process.
However, there are challenges in utilizing AI, such as uniform interpretation of data and information, as well as collection and organization of the large amount of training data required for AI development. In addition, there is still a gap between the number of sequences predicted by the generative AI and the number of sequences that can actually be experimentally verified, making it difficult to experiment with all the sequences predicted by the AI. To effectively proceed with the Build/Test phase of the DBTL cycle, it is necessary to evaluate and select sequences that will be beneficial for AI training.
To solve these data issues, we constructed the data management system that strategically acquires the necessary data for the AI to realize an efficient enzyme development cycle (DBTL cycle) through sequence design by the AI and experimental validation. Issues for improving AI performance were identified and addressed, by building a prototype, we obtained prospects for enzyme improvement through the linkage of AI and wet experiments.
This research is based on results obtained from a project JPNP14004 commissioned by the New Energy and Industrial Technology Development Organization (NEDO).