Explicit solvation (small molecule)

ASH allows you to easily create explicit solvation models for small molecules that can then be either used either for classical MD simulations or QM/MM MD simulations.

Creating a small-molecule forcefield

An important step is to acquire a forcefield for the small molecule of interest. Ideally we want a forcefield in the OpenMM XML format that can be easily used to define an OpenMMTheory object.

There are 2 approaches possible, a nonbonded-only forcefield and full-fledged forcefield.

write_nonbonded_FF_for_ligand

This function can create a simple nonbonded forcefield. This is useful for inorganic molecules (e.g. metal complexes) where a full forcefield is hard to create. Typically the easiest option if the final goal is to perform QM/MM MD simulations. Note that for a classical MD simulation with a nonbonded forcefield, the internal degrees of freedom of the molecule should be constrained (either by freezing atoms or constraining internal coordinates).

small_molecule_parameterizer

This function is capable of creating/deriving a full forcefield of an organic molecule with both bonded and nonbonded parameters. Only available for molecules with common light maingroup elements. Options are the GAFF or OpenFF forcefields.

See OpenMM interface for more details on how to use these functions.

Example 1. Modelling of an organic molecule in explicit water with a ligand forcefield

Note

Do note that when running OpenMM simulations you typically want to run simulations on the GPU if possible (much faster than the default CPU). You then have to set the platform keyword equal to either platform='CUDA' or platform='OpenCL' when defining the OpenMM_Theory object (default is 'CPU').

Create a forcefield file

create_ff.py:

from ash import *

#Parameterize small molecule using OpenFF using an XYZ-file present in current directory
#Note that isobutyraldehyde.xyz can be found in ASH database (change xyzfile to databasefile below)
small_molecule_parameterizer(xyzfile="isobutyraldehyde.xyz", forcefield_option="OpenFF", charge=0)

The function will create a file, called openff_LIG.xml.

Solvate the system

solvate.py:

from ash import *

mol = Fragment(xyzfile="isobutyraldehyde.xyz", charge=0, mult=1)
#Solvate molecule in a 30x30x30 Å TIP3P water box.
solvate_small_molecule(fragment=mol, xmlfile="openff_LIG.xml", watermodel='tip3p', solvent_boxdims=[30,30,30])

This creates files: system_aftersolvent.xyz and system_aftersolvent.pdb

Perform initial classical MD simulation to warmup and equilibrate

run_mm_md.py:

from ash import *

#Read in coordinates of the full system
pdbfile="system_aftersolvent.pdb"
fragment = Fragment(pdbfile=pdbfile)
#Create an OpenMMTheory object based on PDB-file and XML-files for water and small-molecule
omm =OpenMMTheory(xmlfiles=["openff_LIG.xml", "amber14/tip3p_standard.xml"],
            pdbfile=pdbfile, periodic=True, rigidwater=True, autoconstraints='HBonds')

#Gently warms up the system
Gentle_warm_up_MD(fragment=fragment, theory=omm)

#Equilibrates the system via a multi-step NPT simulation.
#This changes the box size of the system until volume and density have converged.
#Note that thresholds for volume and density may have to be adjusted
OpenMM_box_equilibration(fragment=fragment, theory=omm, datafilename="nptsim.csv", timestep=0.004,
                            numsteps_per_NPT=10000,max_NPT_cycles=10,traj_frequency=100,
                            volume_threshold=1.0, density_threshold=0.01, temperature=300)

You can use the script plot_md_data.py (present in the scripts directory of the main ASH directory) to conveniently plot the data (requires Matplotlib).

#Plot density, volume, temperature from nptsim.csv via MatplotLib
python3 plot_md_data.py nptsim.csv

Re-imaging of the trajectory is often desirable for visualization purposes. This centers the system on the molecule. This is also required for QM/MM MD simulation (where the molecule must be in the middle)

#Re-image trajectory so that protein is in middle
MDtraj_imagetraj("equilibration_NPT.dcd", "equilibration_NPT.pdb", format='DCD')

#Sometimes the procedure fails unless you specify that solute_anchor=True
MDtraj_imagetraj("warmup_MD_cycle2.pdb","warmup_MD_cycle2.pdb", solute_anchor=True)

Run QM/MM MD simulation

run_qmmm_md.py:

from ash import *

#Read in coordinates of the full system
#Note that for QM/MM you must use a box where the molecule is centered. Re-image the file using MDtraj_imagetraj if necessary
pdbfile="equilibration_NPT_imaged.pdb"
fragment = Fragment(pdbfile=pdbfile)
#Create an OpenMMTheory object based on PDB-file and XML-files for water and small-molecule
omm =OpenMMTheory(xmlfiles=["openff_LIG.xml", "amber14/tip3p_standard.xml"],
            pdbfile=pdbfile, periodic=True, rigidwater=True, autoconstraints='HBonds')
#Create a QM/MM object
qm = xTBTheory(xtbmethod='GFN2')
#Defining QM-atoms to be the solute.  Note that the atom indices are 0-based
qmatomlist = list(range(0,13))
#QM/MM from QM and MM objects. Setting QM-region charge and multiplicity
qm_mm = QMMMTheory(qm_theory=qm, mm_theory=omm, fragment=fragment, qmatoms=qmatomlist,
        qm_charge=0, qm_mult=1)

#Run a NVT MD simulation (NPT could also be performed if you add a barostat)
#Note: timesteps for QM/MM must be much smaller than in MM
OpenMM_MD(fragment=fragment, theory=qm_mm, timestep=0.001, simulation_time=10, traj_frequency=10,
    temperature=300, integrator='LangevinMiddleIntegrator', coupling_frequency=1,
    trajfilename='QM_MM_NVT-MD',trajectory_file_option='DCD')

Example 2. Modelling of an inorganic molecule in explicit water using a simple non-bonded forcefield

For inorganic molecules (e.g. metal complexes) it is trickier to parameterize a full forcefield and would have to be performed to some extent manually. However, it is possible to use a simple non-bonded forcefield, i.e. without any bonded parameters. This then requires classical simulations to be performed with internal degrees of freedom frozen (bonds, angles, dihedrals) while QM/MM MD simulations can be performed as normal.

Here we use the write_nonbonded_FF_for_ligand function to define the nonbonded parameters (charges and LJ parameters) and create an OpenMM XML-file. The molecule will be an FeCl4- complex (S=5/2).

Create a forcefield file

create_ff.py:

from ash import *

#Create a nonbonded FF for molecule
frag = Fragment(xyzfile="fecl4.xyz", charge=-1, mult=6)
#Defining QM-theory to be used for charge calculation
orca_theory = ORCATheory(orcasimpleinput="! r2SCAN-3c tightscf")
#
write_nonbonded_FF_for_ligand(fragment=frag, resname="LIG", theory=orca_theory,
        coulomb14scale=1.0, lj14scale=1.0, charge_model="CM5_ORCA")

The function will create a file, here called: "LIG.xml". By default it uses the ff_type to be "AMBER". This means the XML-file will use a form of the nonbonded potential that is compatible with Amber-style forcefield. This is recommended im general but can be changed to "CHARMM" or "None" if required. Do note that if you choose CHARMM option, the XML-file will be in CHARMM-format (includes both NonbondedForce and LennardJonesForce definitions), which will require a CHARMM-style water XML-file to be used later.

Solvate the system

We can solvate the system as we did before

solvate.py:

from ash import *

mol = Fragment(xyzfile="fecl4.xyz", charge=-1, mult=6)
#Solvate molecule in a 30x30x30 Å TIP3P water box.
solvate_small_molecule(fragment=mol, xmlfile="LIG.xml", watermodel='tip3p', solvent_boxdims=[30,30,30])

This creates files: system_aftersolvent.xyz and system_aftersolvent.pdb

Perform initial classical MD simulation to warmup and equilibrate.

Here we run some initial classical MD. Unlike before, however, we have to constrain the internal degrees of freedom of the ligand as there are no bonded parameters available. The simplest way is to add constraints for all the Fe-Cl bonds. Additional angle constraints or dihedral constraints may also be required for other molecules (or )

run_mm_md.py:

from ash import *

#Read in coordinates of the full system
pdbfile="system_aftersolvent.pdb"
fragment = Fragment(pdbfile=pdbfile)

#Constrain the bonds of the ligand.
#Note: additional angle and dihedral constraints may also be appropriate
bondconstraints = [[0,1],[0,2],[0,3],[0,4]]

#Create an OpenMMTheory object based on PDB-file and XML-files for water and small-molecule
omm =OpenMMTheory(xmlfiles=["LIG.xml", "amber14/tip3p_standard.xml"],
            pdbfile=pdbfile, periodic=True, rigidwater=True, autoconstraints='HBonds',
            constraints=bondconstraints)

#Gently warms up the system
Gentle_warm_up_MD(fragment=fragment, theory=omm, use_mdtraj=False)

#Equilibrates the system via a multi-step NPT simulation.
#This changes the box size of the system until volume and density have converged.
#Note that thresholds for volume and density may have to be adjusted
OpenMM_box_equilibration(fragment=fragment, theory=omm, datafilename="nptsim.csv", timestep=0.004,
                            numsteps_per_NPT=10000,max_NPT_cycles=10,traj_frequency=100,
                            volume_threshold=1.0, density_threshold=0.01, temperature=300)

You can use the script plot_md_data.py (present in the scripts directory of the main ASH directory) to conveniently plot the data (requires Matplotlib).

#Plot density, volume, temperature from nptsim.csv via MatplotLib
python3 plot_md_data.py nptsim.csv

Re-imaging of the trajectory is often desirable for visualization purposes. This centers the system on the molecule. This is also required for QM/MM MD simulation (where the molecule must be in the middle)

#Re-image trajectory so that protein is in middle
MDtraj_imagetraj("equilibration_NPT.dcd", "equilibration_NPT.pdb", format='DCD')
#Sometimes the procedure fails unless you specify that solute_anchor=True
MDtraj_imagetraj("warmup_MD_cycle2.pdb","warmup_MD_cycle2.pdb", solute_anchor=True)

This is required for the QM/MM MD.

Run QM/MM MD simulation

run_qmmm_md.py:

from ash import *

#Read in coordinates of the full system
#Note that for QM/MM you must use a box where the molecule is centered. Re-image the file using MDtraj_imagetraj if necessary
pdbfile="equilibration_NPT_imaged.pdb"
fragment = Fragment(pdbfile=pdbfile)
#No constraints necessary anymore as the solute will be in the QM-region

#Create an OpenMMTheory object based on PDB-file and XML-files for water and small-molecule
omm =OpenMMTheory(xmlfiles=["LIG.xml", "amber14/tip3p_standard.xml"],
            pdbfile=pdbfile, periodic=True, rigidwater=True, autoconstraints='HBonds')
#Create a QM/MM object
qm = xTBTheory(xtbmethod='GFN2')
#Defining QM-atoms to be the solute.  Note that the atom indices are 0-based
qmatomlist = list(range(0,5))
#QM/MM from QM and MM objects. Setting QM-region charge and multiplicity
qm_mm = QMMMTheory(qm_theory=qm, mm_theory=omm, fragment=fragment, qmatoms=qmatomlist,
        qm_charge=0, qm_mult=1)

#Run a NVT MD simulation (NPT could also be performed if you add a barostat)
#Note: timesteps for QM/MM must be much smaller than in MM
OpenMM_MD(fragment=fragment, theory=qm_mm, timestep=0.001, simulation_time=10, traj_frequency=10,
    temperature=300, integrator='LangevinMiddleIntegrator', coupling_frequency=1,
    trajfilename='QM_MM_NVT-MD',trajectory_file_option='DCD')

Example 3. Setting up an explicit non-aqueous solution system (with an inorganic solute)

It is also possible to easily setup a non-aqueous solution system thanks to an interface to Packmol program. Here we will use acetonitrile as an organic solvent and a metal complex as a solute. We will utilize a full forcefield for the solvent but a nonbonded forcefield for the solute. This type of workflow is e.g. suitable for modelling transition metal catalysis in explicit solution.

Acquire the forcefield for the solvent (acetonitrile).

This can be done as in Example 1.

create_solvent_ff.py:

#Parameterize small molecule using OpenFF
small_molecule_parameterizer(forcefield_option="OpenFF", xyzfile="acetonitrile.xyz",
            charge=0,resname="ACN")

The function will create an XML file, called openff_ACN.xml and a PDB-file called ACN.pdb

Creating a solvation box with a non-aqueous solvent via Packmol interface

See Helper-programs interfaces for details on the Packmol interface.

Starting from a PDB-file containing a single acetonitrile molecule, we can create a cubic box of acetonitrile molecules with a specified density. Here we specify the minimum and maximum coordinates of the box to be [0,0,0] and [50,50,50] Å.

#Create a 50 Å cubic box of acetonitrile molecules corresponding to a density of 0.786 g/ml
packmol_solvate(inputfiles=["ACN.pdb"], density=0.786,
    min_coordinates=[0,0,0], max_coordinates=[50,50,50])

#NPT equilibration. Will give optimal box dimensions
pdbfile="final_withcon.pdb"
fragment = Fragment(pdbfile=pdbfile)
#Note: using slightly larger box dimensions (55 instead of 50) to avoid initial periodicity problems at boundary
omm = OpenMMTheory(xmlfiles=["openff_ACN.xml"],pdbfile=pdbfile, platform="OpenCL",
            periodic=True, autoconstraints='HBonds', periodic_cell_dimensions=[55.0,55.0,55.0,90.0,90.0,90.0])
#NPT equilibration. Note: platform='OpenCL' (or CUDA if NVIDIA GPU) runs OpenMM on GPU, should run quite fast even on laptop.
#Use platform='CPU' if no GPU available
OpenMM_box_equilibration(fragment=fragment, theory=omm, datafilename="nptsim.csv", timestep=0.001,
                                numsteps_per_NPT=10000,max_NPT_cycles=10,traj_frequency=100,
                                volume_threshold=1.0, density_threshold=0.01, temperature=300)

packmol_solvate will create a PDB-file called "final_withcon.pdb" containing coordinates for a 50 Å cubic box of acetonitrile molecules. Note that connectivity lines are written at the end of the file which are necessary for OpenMM to recognize the topology. Here we also immediately launch a classical NPT-equilibration simulation both to check that the solvent XML-file works and to get optimal box dimensions. OpenMM_box_equilibration will create file equilibration_NPT_lastframe.pdb containing last frame of NPT-workflow and will also contain the equilibrated PBC box dimensions.

Note

It is important to read in the same single-molecule solvent PDB-file (here ACN.pdb) that you got from the small_molecule_parameterizer function, otherwise OpenMMTheory will probably not be able to recognize the topology.

Create a nonbonded forcefield for the solute

This can be done as in Example 2 above.

create_solute_ff.py:

from ash import *

#Create a nonbonded FF for molecule
frag = Fragment(xyzfile="fecl4.xyz", charge=-1, mult=6)
#Defining QM-theory to be used for charge calculation
orca_theory = ORCATheory(orcasimpleinput="! r2SCAN-3c tightscf")
#
write_nonbonded_FF_for_ligand(fragment=frag, resname="TMC", theory=orca_theory,
        coulomb14scale=1.0, lj14scale=1.0, charge_model="CM5_ORCA")
#Write PDB-file (TMC.pdb) for later use, skipping connectivity lines
frag.write_pdbfile_openmm("TMC", skip_connectivity=True)

This will create an XML-file of the solute called TMC.xml and a basic PDB-file called TMC.pdb.

Note

The PDB-file created here for the metal complex should not contain any connectivity lines (CONECT records), which is why we select the skip_connectivity=True option. This is because we have a nonbonded FF and no bonded parameters in the XML-file and presence of connectivity lines would cause OpenMM to expect bond terms present.

Inserting the solute into the box of solvent

See documentation of insert_solute_into_solvent at OpenMM interface for more details.

Finally we want to combine the solute and solvent into a single PDB-file of the solution with the solute in the center of the box. For this we need PDB-files for the single solute-molecule and solvent-box.

from ash import *

#Input files
solute_pdbfile="TMC.pdb" #should have no CONECT lines
solvent_pdbfile="equilibration_NPT_lastframe.pdb" # using NPT-equilibrated solvent-box
solute_xmlfile="TMC.xml"
solvent_xmlfile="openff_ACN.xml"

#Inserting solute into solvent-box and get new solution fragment and file solution.pdb
solution = insert_solute_into_solvent(solvent_pdb=solvent_pdbfile, solute_pdb=solute_pdbfile,
            write_pdb=True)

#Test to see if OpenMMTheory object can be defined from XML-files and final system file (solution.pdb)
omm = OpenMMTheory(xmlfiles=[solute_xmlfile,solvent_xmlfile],pdbfile="solution.pdb", periodic=True)

insert_solute_into_solvent will create a PDB-file called "solution.pdb" containing the set-up system. Note that the solution.pdb file will serve as both a topology for OpenMMTheory and as starting coordinates for an MD simulation.

We can now in principle run a QM/MM MD simulation of the system.

run_qmmm_md.py:

from ash import *

#Read in coordinates of the full system
pdbfile="solution.pdb"
fragment = Fragment(pdbfile=pdbfile)

solute_xmlfile="TMC.xml"
solvent_xmlfile="openff_ACN.xml"

omm = OpenMMTheory(xmlfiles=[solute_xmlfile,solvent_xmlfile],pdbfile=pdbfile, platform='OpenCL',
            periodic=True, autoconstraints='HBonds')
#Create a QM/MM object
qm = xTBTheory(xtbmethod='GFN2')
#Defining QM-atoms to be the solute.  Note that the atom indices are 0-based
qmatomlist = list(range(0,5))
#QM/MM from QM and MM objects. Setting QM-region charge and multiplicity
qm_mm = QMMMTheory(qm_theory=qm, mm_theory=omm, fragment=fragment, qmatoms=qmatomlist,
        qm_charge=-1, qm_mult=6)

#Run a NVT QM/MM MD simulation
OpenMM_MD(fragment=fragment, theory=qm_mm, timestep=0.001, simulation_time=10, traj_frequency=1,
    temperature=300, integrator='LangevinMiddleIntegrator', coupling_frequency=1,
    trajfilename='QM_MM_NVT-MD',trajectory_file_option='DCD')

Note

Rather than starting QM/MM MD directly like above, it may also be a good idea to equilibrate the system after insertion by e.g. a classical NVT simulation of the solution system with solute frozen.

Running QM/MM MD simulations with periodic boundary conditions

Do note that in the QM/MM MD simulation examples above, we typically run the MM-part in a periodic fashion while the QM-part runs without periodic boundary conditions (since the QM-code xtb does not support PBC). This is an approximation that is reasonably accurate as long as the QM-system remains in the center of the box (reasonably homogeneous QM-MM interactions). As the molecule may start to migrate to the box boundaries during a longer simulation it is important to deal with this issue in real-world simulations. The options are:

Run non-periodic simulations by setting periodicity=False in OpenMMTheory.

This may also require freezing parts of the box boundaries to avoid solvent molecules from evaporating or adding a boundary force. As the solute may migrate towards the frozen boundaries, resulting in artificial dynamics, a centering potential may also be required for the solute. Non-periodic MM may result in somewhat slower MM-part execution (which may not matter if the QM-code execution is the bottleneck).

Stick with periodic MM-part but force the molecule to be in the center of the box.

To make sure of this one can add a flat-bottom centering potential like below. See section "Adding a flat-bottom centering potential" in Molecular dynamics for more information.

# Here assuming a qm_mm theory, a solute+solvent system have already been created and that solute has indices 0-5.
# Force constant : 10 kcal/mol/Å^2 Distance from where force acts: 10.0 Å
MolecularDynamics(theory=qm_mm, fragment=solution, simulation_time=50, timestep=0.001, traj_frequency=100,
  add_centerforce=True, centerforce_atoms=[0,1,2,3,4,5], centerforce_constant=10, centerforce_distance=10.0)

Use a QM-code supporting periodic boundary conditions.

Currently CP2K is the primary QM-code in ASH supporting periodic boundary conditions.

Issues

If you get an error like this from OpenMM:

ValueError: Found multiple NonbondedForce tags with different 1-4 scales

This indicates that there is an incompatibility between the small-molecule XML-file and the solvent forcefield XML-file. Most likely you have selected the wrong XML-file for your solvent in OpenMMTheory. For XML-files created for the GAFF and OpenFF forcesfield (created by small_molecule_parameterizer ) you typically want to select the Amber water XML-file: e.g. for TIP3P "amber14/tip3p_standard.xml" . For nonbondedFF XML-files (created by write_nonbonded_FF_for_ligand), by default the files are also in Amber-style format and you should select: "amber14/tip3p_standard.xml"

If for some reason CHARMM-style format was selected (not recommended usually for explicit solvation) you then should select "charmm36/water.xml".

The OpenMM built-in Amber and CHARMM XML forcefield files are available in your environment and will be automatically found by ASH (if OpenMM was installed correctly) when you select: "amber14/tip3p_standard.xml" or "charmm36/water.xml" . These files can also be found and inspected on your system like this:

# Go into OpenMM data directory within conda environment (make sure the environment is loaded)
cd $(dirname $(which test-openmm-platforms))/../lib/python3.11/site-packages/openmm/app/data/
ls
# Result:
DLPC.pdb              amber03.xml           amber99Test.xml       amoeba2013.xml        glycam-hydrogens.xml  spce.pdb              tip4pew.pdb
DLPE.pdb              amber03_obc.xml       amber99_obc.xml       amoeba2013_gk.xml     hydrogens.xml         spce.xml              tip4pew.xml
DMPC.pdb              amber10.xml           amber99sb.xml         amoeba2018.xml        iamoeba.xml           swm4ndp.pdb           tip4pfb.xml
DOPC.pdb              amber10_obc.xml       amber99sbildn.xml     amoeba2018_gk.xml     implicit              swm4ndp.xml           tip5p.pdb
DPPC.pdb              amber14               amber99sbnmr.xml      charmm36              opc.xml               test.pdb              tip5p.xml
POPC.pdb              amber14-all.xml       amberfb15.xml         charmm36.xml          opc3.xml              tip3p.pdb
POPE.pdb              amber96.xml           amoeba2009.xml        charmm_polar_2013.xml pdbNames.xml          tip3p.xml
absinth.xml           amber96_obc.xml       amoeba2009_gk.xml     charmm_polar_2019.xml residues.xml          tip3pfb.xml

For CHARMM and normal OpenMM XML-files the NonbondedForce line should look like this:

<NonbondedForce coulomb14scale="1.0" lj14scale="1.0">

where coulomb14scale and lj14scale are set to 1.0

For Amber, GAFF and OpenFF XML-files the NonbondedForce line should look like this:

<NonbondedForce coulomb14scale="0.8333333333333334" lj14scale="0.5">

Additionally, CHARMM XML files contain in addition to NonBondedForce an extra block:

<LennardJonesForce lj14scale="1.0">