Abstract
Fast and accurate prediction of folding stability and binding affinity due to protein mutations is a critical need in structural biology. While experimental methods are accurate, they are often time-consuming and costly. Computational approaches like FoldX offer fast and inexpensive alternatives by estimating free energy changes (ΔΔG) from a single 3D structure, but their accuracy remains limited. In this work, we present dFX, a neural-network-based residual correction framework that improves FoldX free energy predictions by learning from its underlying energy terms. We first assembled a collection of experimentally measured ΔΔG values and their corresponding 3D structures. Using those structures, we obtained baseline ΔΔG predictions with FoldX, along with their contributing energy terms. We then trained dFX using these FoldX energy terms as input features and a correction factor as the target, where the correction factor is the difference between experimental and FoldX ΔΔG. Our dFX models improve the prediction accuracy relative to FoldX. The dFX correction leads to improvements in Pearson correlation between experimental and predicted ΔΔG relative to FoldX across folding and binding tasks. Models trained on lower-order mutations retain predictive capability for higher-order mutations in binding. Our dFX models also lead to improvements in epistasis prediction relative to FoldX, suggesting a better capture of nonadditive effects. To test the generalizability of our models, we used an external SARS-CoV-2 data set and found that the dFX model trained on single mutations for binding outperformed FoldX and other machine-learning approaches. Once trained, our neural network models add minimal computational time but provide improvements in accuracy, making them a valuable addition to any FoldX free energy prediction pipeline. Our dFX approach could be further optimized to predict antibody escape, aiding in the efficient development of watch lists.