P03-19

REALM: Region-Empowered Antibody Language Model for Antibody Property Prediction

Toru NISHINO *1Noriji KATO1Takuya TSUTAOKA1Yuanzhong LI1Masahito OHUE2

1Bio Science & Engineering Laboratory, FUJIFILM Corporation
2Institute of Science Tokyo, School of Computing
( * E-mail: toru.nishino@fujifilm.com )

To reduce the manufacturing costs of antibody drugs, it is crucial to predict antibody property from antibody sequences.
Recently emerged protein language models (pLM) can build property prediction models based solely on fine-tuning with a small amount of antibody property data.
However, accurate prediction of antibody property with pLM is challenging because pretraining of protein language models primarily focuses on learning antibody co-evolution from large antibody sequence database.
In this study, we propose Region-Empowered Antibody Language Model (REALM), an antibody language model pretrained from scratch with novel pretraining strategy, to incorporate not only co-evolution but also region information of antibodies.
Region information within the variable region of antibodies, particularly loop structures such as complementary determining regions (CDR) and strand structures, is important for understanding the characteristics of antibodies.
Moreover, we proposed a strategy for determining masking positions that enables antibody structural information to be more appropriately embedded in the protein language models.
We evaluate our proposed REALM using a dataset of three assays: hydrophobicity, thermal stability, and specificity.
The evaluation results show that REALM improves the accuracy of the two assays, hydrophobicity and thermal stability, compared to the previous antibody language model.
In addition, we analyze the internal behavior of our antibody language model. We show that the proposed REALM enables focusing on residues regarding important region information.