LigClean: The Ligand Preparation Tool

Overview

Multiple modules within the QSimulate platform require input of 3D molecular structures, including the QUELO FEP module and QuValent. It is very common to have only a 1D (SMILES) or 2D (2D-SDF) representation, or else a 3D model that you may not be confident in.

This module provides three functionalities:

  • Generate energy optimized 3D coordinates for an input ligand

  • Evaluate the maximum common substructure overlap between the 3D coordinates of the ligand and a reference ligand

  • Output 3D coordinates that are optimized and in the appropriate ligand-binding reference frame, suitable for direct use in QUELO/QuValent

3D Conformer Generation Workflow Details

The workflow consists of two general steps: 1) 3D conformer generation and 2) energetic and similarity filtering. The filtering process can be seen as a funnel, where the top of the funnel has a larger pool of conformers that are progressively refined during each stage, eventually leading to a smaller pool distinct, low-energy structures. The funnel, which reduces the number of conformers being evaluated at each subsequent step, is designed optimize throughput by reducing the number of conformers that are evaluated using the more computationally expensive approach (semiempirical xTB).

../_images/LP_Workflow.jpg

Conformer Generation

From the input SMILES string (or structure), a bounds matrix based on the input topology and atom types is calculated, and this is used as input to a distance geometry (DG) calculation. If, optionally, the user uploads a SDF file with structural information, then the bounds matrix is obtained from that input structure.

The bounds matrix, along with a random number seed, is used to create a specific distance matrix consistent with the bounds. This specific distance matrix is then used by DG to produce a starting structure. A series of N random number seeds are used to create N different specific distance matrices, which lead to N different DG-generated starting structures. The N structure pool is pruned to remove structures that are very similar to one-another. Similarity among structures is evaluated using a root-mean-squared (RMS) coordinate metric.

N is defined by the user using the Max. number of conformers input (below).

Note that the protonation state will remain as defined by the input molecule. If you want to investigate multiple protonation states, you should input separate molecules that reflect those protonation states.

Molecular Mechanics (MM) Optimization and Filtering

Starting with the conformer set from the first step (3D Conformer Generation) conformers are then subjected to a geometry optimization using the Universal Force Field (UFF). Conformers with a MM energy larger than a cutoff value from the minimum energy structure are filtered out in this stage, as are structures that are too similar to another (lower energy) structure. The values used to filter the energy cutoff and the RMS similarity are automatically assigned by the program.

Semi-Empirical Quantum Mechanical (QM) Optimization and Filtering

The surviving conformers from the MM stage are then subjected to geometry optimization using the state-of-the-art semi-empirical QM method GFN-xTB. As with the MM step, high-energy and similar conformers are discarded from the pool, and the values that control these filters are defined by the program.

Note that due to the energy and similarity filtering, the number of ligand conformers that is output for a particular input ligand will typically be smaller than the total that was originally generated via DG.

Maximum Common Substructure (MCS) Overlap Optimization

For each minimized conformer that results from conformer generation, MCS optimization is performed relative to the reference ligand. This is a 3D coordinate optimization focused on maximizing atomic coordinate overlap between generated conformation and the reference ligand while maintaining a reasonable energy for the generated conformation. The output of this process is a MCS region, which consists of (ligand-reference) atom pairs within 0.01 Angstroms of one another. In a free energy calculation, these are the “single topology” atoms. The remaining atoms, which do not closely overlap in the refined atom pair, fall outside the MCS, and will be considered as “dual topology” atoms during QUELO FEP. Free energy calculations tend to converge better, and give more reliable results, when the number of atoms outside the MCS is lower.

The LigClean Task List

When you enter the platform, you will be presented with the LigClean Task List, a list of calculations (tasks) that you have previously set up and/or run, as well as a dialog to create a new task. Clicking on a task will bring you to the setup/results page for that task.

The LigClean task list has analogous options for renaming/deleting/cloning jobs as QUELO’s task list, as described in the section The Task List.

Expert Mode

A small number of options (described below) are only shown in Expert Mode. The options shown in (default) Standard Mode are sufficient for most users to run a reliable simulation. If you need access to Expert Mode, that is accomplished via a toggle in the User Settings panel.

For more details on enabling Expert Mode, see the chapter QSP Life User Interface.

File Input Specification

../_images/LP_Input.jpg

File Uploads

When you click on a task with New status in the LigClean Task table, you enter the setup dialogs for that task. At the top of the setup, you will need to specify:

  • Reference ligand: the reference ligand that is bound to the receptor and that will serve as the target for aligning other ligands. This ligand must be specified using a 3D format file, either SDF or MOL2.

  • All ligands (excluding reference): A set of additional ligands for which 3D conformations will be generated and aligned to the reference ligand. These ligands can be specified as either SMILES strings, or with SDF files. As many ligands as desired can appear in the input file. Multiple ligand sets can be input by repeating the Browse/Upload process for multiple files.

  • Generate all stereoisomers for ligands with undetermined chiral centers (EXPERT MODE ONLY): In expert mode, this option will be available. By default it is unchecked. If you check this box, then for any input ligand in the All Ligands set where there are atom(s) with ambiguous chiralities, both possible chiralities will be sampled for those atoms.

  • Download Input Report: Allows the user to download a csv file with information on the list of ligands that was imported.

Once ligands are uploaded, they are parsed for correctness, and a list of the input ligands will be reported, as shown below:

../_images/LP_Input_Populated.jpg

Each ligand is annotated here with the name read from the input file or generated if not provided in the file, as well as the name of the file it is originally from.

Clicking on any line of the table will pop up a 2D view of the associated ligand.

../_images/LP_Input_2D_popup.jpg

Beside each ligand, you find the Information field. This field is populated if an error was identified for a particular ligand. To the right of that field you will find a red X. This X allows you to delete the associated ligand from the input. In addition, there is a bolder X at the top of the column of Xs for the All Ligands input. This bolder X can be used to delete all the ligands in the All Ligands section at once.

Options

  • Max. number of conformers: The maximum number of 3D conformers to initially be generated for each input molecule, using a randomized distance geometry process. The default is 100. Conformers will subsequently be filtered by energy and similarity, so that the number of conformers that remain after the 3D generation process is typically much smaller than the number of conformers initially generated.

Start Simulation

Clicking this button will start the ligand prepartion.

Simulation Status

This portion of the panel provides the status of the calculation once it has been started using the Start simulation button. It also provides the ability to stop and restart a preparation that has been submitted.

../_images/LP_Status.jpg
  • Stop: Stop a calculation that was previously submitted and is in progress. A stopped calculation is saved in the cloud storage associated with your account, and can be restarted later, using the “Run” command.

  • Run: Resume a previously Stopped job.

Below, and also to the right of the control buttons, you will find information about the status of your job. The total estimated virtual CPU usage (vCPU) is given, as are progress bars for generation of the 3D conformers, energy optimization of the 3D conformers (MM and xTB) and alignment of the conformers with the reference ligand.

Ligand preparation is generally reasonably fast.

Results

../_images/LP_Results_Closed.jpg

Below the Simulation Status section, you will find the results of your calculation. For every input ligand, you will find a row. The row will be labeled with the name field that was supplied with the ligand.

Clicking on any row will expand with information on that ligand. Clicking on that row again will close the information report for that ligand.

../_images/LP_Results_Open.jpg

There are several features of the view:

  • 2D representation: A 2D representation of the input ligand appears on the left.

  • Conformer boxes: Below the 2D structure is a series of conformer boxes. Each box represents one of the conformers generated for this structure that survived the energy and similarity filters. The boxes are organized so that the first box corresponds to the lowest energy structure for this input ligand, and the last box corresponds to the highest energy structure. You can choose a different box by clicking on it, or by using the left and right arrows below the 2D structure, which will step through the structures one at a time.

  • Left and right arrows: Step through the conformers for this ligand, one at a time. Provides the same behavior as clicking on the conformer boxes one step at a time.

  • Select: Select the currently shown conformer. This commits the conformer to the output set which will be exported into a concatenated SDF format file using the EXPORT button. The red outline will move to the selected conformer. If you make a mistake or change your mind, simply choose a different conformer and press the Select button again.

  • Optimized structure: This is a 3D view of the conformation of the molecule corresponding to the selected box. You can rotate this molecule in three dimensions using the mouse. The relative energy of the structure is given, relative to an energy of 0.00 for the lowest energy structure for this ligand.

  • Aligned structure with reference: This is a 3D view of the maximum common substructure overlap between the optimized structure and the reference ligand. The MCS size is the total number of atoms that are effectively invariant in the overlapped representation, and that will be treated as “single topology” during FEP. Atoms of the optimized structure that are not in the MCS are colored green. A higher value of MCS is generally better, as it will allow for better convergence during the FEP calculation. You can rotate the molecule in three dimensions using the mouse.

An example of an aligned structure is shown below. Note the atoms shown in green, which do not align for the two structures. In this case, this lack of overlap is due to a cyclohexyl ring that has flipped from “chair” to “boat” conformation. This reduces the MCS size for this conformation. There are other conformations for the same molecule where the conformation of the ring matches the reference, and these would be preferable as input.

../_images/LP_Results_ChairBoat.jpg

Ideally, one chooses the structure with both the lowest relative energy and the greatest MCS score. Often, both qualities are retained by the leftmost structure in the box table (i.e. the structure with the lowest energy also has the highest MCS score).

One should carefully examine the aligned overlap of the structure they will select with the reference, to ensure the overlap makes chemical sense.

Before you can export structural information, you need to select a conformer for every input ligand. You can either do this by picking the conformer for each individually and pressing the “Select” button for that molecule, or else by using the two buttons at the top of the Results part of the panel:

  • Select Lowest Energy Conformers: Choose the lowest energy conformer for each ligand (the first box) and select it automatically. For each ligand, the red outline will move to the selected conformer.

  • Select Highest MCS Conformers: Choose the conformer with the highest MCS value for each ligand, and select it automatically. If multiple conformers share the same highest value of MCS, the conformer with the highest value of the MCS and lowest energy among that set will be chosen. For each ligand, the red outline will move to the selected conformer.

  • Export: Exports a concatenated list of SDF files, one for each input ligand, reflecting the selected conformer for each input molecule. The Export button does not become active until you have made a conformer selection for every ligand, either individually, or using one of the automated selection buttons described above.