(A) We choose landscapes that cover different types of sequence diversity. The GB1 landscapefocuses on simultaneous mutation of four epistatic sites with nearly complete coverage (PDBID: 1PGA). The AAV capsid protein landscape is large and covers multiple mutations acrossthe binding interface (PDB ID: 6IH9). The Meltome landscape measures a propertyshared by diverse proteins: thermostability. (B). We also provide up to four suggested data splits foreach protein engineering split, which are described in grater detail in the manuscript and GitHub repository
Below you can find the active landscapes. Clicking on the images will open a description page with some information about where the landscape was collected from, what data was included, how splits were performed to create different splits and what licenses apply to the original data, as well as to the processed data.
We may deprecate landscapes because their experimental measure may not fully align with the goals of this benchmark, or because we deem their discriminative ability insufficient.