P07-25
Predicting Antibody Stability pH Values from Amino Acid Sequences: Leveraging Protein Language Models for Formulation Optimization
Takuya TSUTAOKA *1, Noriji KATO1, Toru NISHINO1, Yuanzhong LI1, Masahito OHUE2
1Bio Science & Engineering Laboratory, FUJIFILM Corporation
2School of Computing, Institute of Science Tokyo
( * E-mail: takuya.tsutaoka@fujifilm.com )
Monoclonal antibodies (mAbs) offer significant therapeutic benefits; however, their formulation requires careful optimization to prevent instability, such as aggregation and thermal degradation. Standard practices for determining optimal formulation conditions rely on time-consuming and costly wet lab experiments. Therefore, we developed a machine learning-based approach to predict the optimal pH value for stabilizing mAbs using only their amino acid sequences. Briefly, amino acid sequences were input into a protein language model to extract features, which were then used in a regression model to predict the pH values. We compiled an original dataset of 56 commercially available mAbs and obtained their pH values from publicly available FDA documentation. The performance of our approach was evaluated using a 10-fold cross-validation method, assessing the correlation coefficient between the predicted and actual pH values. Due to the absence of directly relevant methods, we established a baseline by comparing various combinations of elements, including different antibody domains, protein language models, and regression models. We also conducted feature engineering to enhance the predictive performance by incorporating structural information and descriptors. Our approach achieved a high Pearson correlation coefficient of 0.88. This result complements that of wet lab experiments and highlights the potential of increasing the efficiency and cost-effectiveness of optimizing the conditions for mAb formulation.