Similarity Score¶
Overview¶
This module provides 3D, electron-density-based similarity scores for pairs of molecules. Molecules are uploaded using SDF files that may contain multiple conformations. Pairs of molecules are then optimally aligned based on their electrostatic character and these structures are provided as output along with their similarity scores. The alignment for each pair of molecules is performed through rigid-body rotations and translations before final electrostatic similarity scores are computed.
Alignment is accomplished by maximizing the density overlap in the space of rigid-body rotations and translations (6-dimensional space). The density overlap is evaluated at the density functional theory (DFT) level given a particular rotation/translation. The optimization is done globally by first evaluating overlap on a 3D grid in the rotational space and then locally optimizing (with BFGS) starting from a set of grid points that have high overlap.
Once the optimal alignment for each pair of conformers is found, this pair of conformers and their alignment is scored by evaluating the electrostatic potential difference on a grid. The grid is constructed separately for each molecule by taking a cartesian grid and retaining only those grid points that fall within an inner and outer multiples of the vanderwaals radii of the set of atoms. Finally, the electrostatic potential difference is computed. The electrostatic score is calculated on the grid of both molecules.
The Similarity Score Task List¶
When you enter the platform, you will be presented with the Similarity Score Task List, a list of calculations (Tasks) that you have previously set up and/or run, as well as a dialog to create a new Task. Clicking on a Task will bring you to the setup/results page for that Task.
For more details on the this Task List, see the chapter QSP Life User Interface.
Similarity Score Setup¶
When you click on a new batch and enter the calculation page, you will first be presented with a section to upload your inputs and submit the calculation.
Upload 3D Structures as SDFs¶
The expected inputs are 3D SDF files containing compound structures. Any SDF file uploaded can contain either a single compound or multiple compounds. You can upload the files by either pressing the browse button and navigating to your files, or dragging and dropping the files into the Choose file box. After the file or files are staged in the box to the left, press the Upload button to complete the upload. Even once you have uploaded compounds you can upload more by the same procedure.
If you upload 100 or fewer compounds, the table below will populate with your compounds. There is one row in the table for each unique compound based on its smiles. In each row, you will find the provided name of the compound and the number of conformers provided for that compound. There is also a red “X” that allows you to remove that compound from your set. You can clear all uploaded compounds simultaneously by pressing the red “X” in the header of the table.
Clicking on any row will highlight that compound and display its 2D structure to the right.
The Download report button will download a compressed file containing the structure for all uploaded compounds and a csv with information on each one. The csv contains a line for each unique conformer of each compound. The fields are the compound name (and conformer count after a hyphen appended to the name), the smiles for that compound, and a message which either states whether the compound is valid or information about why it is invalid.
Any compounds that the platform could not parse or accept for any reason will appear in red at the top of this list. Clicking on them will populate the Information section to the right with the reason why the compound could not be accepted. This information can also be found in the message field of the compound in the downloadable report. You can press the Delete invalid button to remove all flagged compounds. You can then address the problems with the input and upload an ammended compound in a new file and it will be appended to the list.
If more than 100 compounds are uploaded the platform will instead show a count of the valid, invalid, and total compounds. In this case, please refer to the downloadable report to see more information about each compound.
Once you have uploaded your compounds and have resolved or remove all invalid compounds, you can then submit the calculation by pressing the Start simulation button.
Simulation Status¶
This panel provides the status of the calculation once it has been started using the Start simulation button. It also provides the ability to stop and restart a calculation that has been submitted.
Stop: Stop a calculation that was previously submitted and is in progress. A stopped calculation is saved in the cloud storage associated with your account, and can be restarted later, using the Run command.
Run: Run a previously stopped job. If the job has not previously been started using Start Simulation, then the Run button has the same effect as Start Simulation, i.e., it will begin the calculation.
Below the control buttons you will find information about the status of your job. The total estimated virtual CPU usage (vCPU) is given, as are progress bars for each of the stages of the simulation: DFT/Analysis. Color coding is as follows:
Light Grey: That segment of the calculation workflow has not been run.
Green + Light Grey: That segment of the workflow is in progress, and the green portion of the segment reflects a progress bar.
Dark Grey: That segment of the workflow has completed.
Red: That segment of the workflow completed with errors.
Results¶
In the final section you will find a download for the complete results and a preview.
The Results Download furnishes a compressed file containing: - A CSV file: This contains the overlap and similarity scores for all of the compound pairs. This includes:
overlap: The maximum overlap in the electrostatic potential comparing the two aligned compounds. This maximum is taken across all conformers and therefore is the overlap from the conformer with the best overlap after alignment.
score_1: The difference in the electrostatic potential using a grid generated from the first molecule in this pair. The score is the RMS squared value. Also included is the the max difference in electrostatic potentials, and the pearson and spearman coefficients.
score_2: The difference in the electrostatic potential using a grid generated from the second molecule in this pair. The score is the RMS squared value. Also included is the the max difference in electrostatic potentials, and the pearson and spearman coefficients.
similarity_avg: The average of the score_1 and score_2 results. The score is the RMS squared value. Also included is the the max difference in electrostatic potentials, and the pearson and spearman coefficients.
similarity_joint: The similarity score based on a grid constructed from the grids of both molecules. The score is the RMS squared value. Also included is the the max difference in electrostatic potentials, and the pearson and spearman coefficients.
Directories for each pair of compounds: Within each directory there are a few files
A SDF of the aligned structure for each compound. If multiple conformers were used for a given compound this is the aligned structure of the conformer with the best overlap.
A molden file of each compound with the electron density surface, so the user can visualize the overlap themselves if they wish. This can be viewed using tools like Jmol.
You can also take a glance at the results in the Results Preview. This includes a table of all compound pairs with their overlap and a selection of similarity scores found in the Results Download. Clicking on a row of the table shows the 3D structure of the best overlap of the pair of compounds, including both the atoms and bond as well as the electron density surfaces.