Validation sets for computational chemistry
This page provides protein-ligand binding datasets within the BindingDB collection that are designed to be useful for parameterizing or testing protein-ligand modeling codes. Each set has roughly 10-50 congeneric small molecules with a range of affinities for a single protein target, and at least one complex in each series has a structure in the PDB, as a basis for modeling the rest of the series.
Each row in the following table corresponds to one dataset, and provides the article(s) from which the data were drawn, a freely downloadable SDfile with the compounds and data, and a link to preview the compounds on-line. Note that no effort has been made to establish appropriate protonation or tautomer states of the compounds.
We have not yet reviewed these datasets for accuracy, so users are advised to check them before relying on them. If you find any are particularly useful, or that are problematic, please share your thoughts via the User Comments and/or let us know by email.
The 3D SDfiles on this page provide conformations for the free molecules generated with the program Vconf.
The Sets listed in green have been computationally docked, and their docked poses can be retrieved from their Compounds pages.
If a docked Set appears multiple times, it is because the compounds have been docked into multiple crystal structures; each set of dockings generates a separate Set.
Thanks to OpenEye Scientific Software for providing the maximal common substructure software used in constructing these sets.