DL-FIND Optimizer
ASH also features an interface to the powerful DL-FIND library . DL-FIND is developed by Prof. Johannes Kaestner and includes various powerful optimization algorithms. If you use the interface, make sure to cite the DL-FIND article The ASH interface to DL-FIND utilizes the libdlfind C/Python API .
The interface supports most of the algorithms available in DL-FIND including :
Geometry optimizations using Cartesian, HDLC internal coordinates using L-BFGS algorithm.
Saddlepoint optimizations using P-RFO algorithm
Dimer-method for saddle-point searches requiring only gradients
Nudged elastic band calculations
Instanton calculations
Optimizations with active/frozen atoms as well as bond, angle and dihedral constraints.
The DL-FIND Optimizer can be used in ASH surface scans, see Surface Scan Since the interface supports active regions, DLFIND_optimizer can e.g. be used for QM/MM geometry optimizations in ASH.
As DL-FIND is written in Fortran, the execution speed of the optimizer is very fast compared to geomeTRIC. DLFIND_optimizer in ASH can hence be particulary useful if the execution speed of the energy+gradient by the Theory object is very fast (e.g. an OpenMMTheory, a semiempirical xTB-theory, an ML potential) as the speed of the Optimizer will then start to play a role for the overall wall-time and DL-FIND execution can be very fast.
Installation
To install DL-FIND one needs the DL-FIND Fortran library as well as the C-API/Python-API extension, libdlfind It is easiest to install both using:
pip install geometric
For problems associated with installation, see: DL-FIND and libdlfind Github repository
The DLFIND_optimizer function
def DLFIND_optimizer(jobtype=None, theory=None, fragment=None, fragment2=None, charge=None, mult=None,
maxcycle=250, tolerance=4.5E-4, tolerance_e=1E-6,
actatoms=None, frozenatoms=None, residues=None, constraints=None,
printlevel=2, NumGrad=False, delta=0.01,
icoord=None, iopt=None, nimage=None,
hessian_choice="numfreq", inithessian=0,
numfreq_npoint=1, numfreq_displacement=0.005, numfreq_hessatoms=None,
numfreq_force_projection=None, print_atoms_list=None,
force_noPBC=False, PBC_format_option='CIF'):
Keyword |
Type |
Default value |
Details |
|---|---|---|---|
|
ASH THeory |
None |
An ASH Theory. This will be used to supply energy+gradient information during the optimization. |
|
ASH Fragment |
None |
An ASH fragment. |
|
ASH Fragment |
None |
Some DL-FIND jobtypes require 2 Fragment objects. |
|
string |
None |
Type of DL-FIND job. Options: 'opt', 'tsopt', 'neb', 'dimer', 'qts'. |
|
integer |
None |
DLFIND-code: Alternative to jobtype, choose code for optimization algorithm, e.g. iopt=3 (L-BFGS optimization) |
|
integer |
None |
DLFIND-code: Alternative to jobtype, choose code for job-type, e.g. icoord=120 (NEB with frozen endpoints) |
|
integer |
None |
Number of images for NEB job. |
|
integer |
250 |
Max number of cycles(iterations). |
|
float |
4.5E-4 |
Tolerance on gradient in Eh/Bohr |
|
float |
1e-6 |
Tolerance on gradient in Eh. |
|
list |
None |
List of indices that should be active (rest is frozen) |
|
list |
None |
List of indices that should be active (rest is active) |
|
list of lists |
None |
For HDLCs, it is possible to provide HDLC residue definitions as a list of lists. |
|
dictionary |
None |
Dictionary of constraints. See section. |
|
Boolean |
False |
Whether to perform a numerical gradient optimization (uses NumGradClass) |
|
float |
0.01 |
Delta parameter used for dimer method. |
|
integer |
0 |
For P-RFO TS optimizations, the DL-FIND code for Hessian option. Options: 0 (external), 1(1-point FD), 2(2-point FD), 3(diagonal 1-point FD), 4(identity matrix) |
|
string or Numpy array |
numfreq |
The input Hessian, only if inithessian=0. If "numfreq" then an ASH Numfreq Hessian is calculated. Options : 'anfreq', 'xtb', 'file:hessianfile' or a Numpy array. |
|
integer |
1 |
For a NumFreq Hessian, whether to use 1-point or 2-point Hessian (see Numfreq documentation) |
|
float |
0.005 |
For a NumFreq Hessian, displacement (in Bohrs). |
|
list |
None |
For a NumFreq Hessian, list of atoms to use in a partial Hessian calculation. |
|
Boolean |
None |
For a NumFreq Hessian, whether to do translation-rotation projection or not. |
|
Boolean |
None |
For a NumFreq Hessian, whether to do translation-rotation projection or not. |
|
list |
None |
What atoms to use when printing coordinates in output. |
|
Boolean |
False |
Whether to force PBCs to not be activated. |
|
string |
'CIF' |
For a PBC optimization, what type of PBC fileformat to print in the end. |
Controlling DL-FIND
Controlling DL-FIND via jobtype
It is easiest to use DLFIND_optimizer together with the jobtype keyword, which will select recommended optimization algorithm and coordinate system. The available options are:
'opt'. Selects HDLC coordinates and L-BFGS optimizer.
'tsopt'. Selects HDLC coordinates and P-RFO optimizer.
'neb'. Selects a frozen endpoint NEB job with a L-BFGS optimizer.
'dimer'. Selects a dimer job with an L-BFGS optimizer.
'qts' (or 'instanton'). Selects instanton optimization with an L-BFGS optimizer.
Controlling DL-FIND via jobtype
Alternatively is also possible to control DL-FIND behaviour by not using a jobtype (jobtype=None) and specify both icoord and iopt keywords. Consult the DL-FIND manual or libdlfind README in this case:
See DL-FIND manual See also libdlfind README
Defining constraints
The DL-FIND library supports bond, angle, dihedral constraints as well as frozen atoms.
Similar to the geomeTRIC interface, a dictionary defining constraints (constraints keyword) should be provided. Syntax to use for the constraints dictionary:
constraints={'bond':[[0,1]]} #This defines a bond/distance constraint between atoms 0 and 1
constraints={'bond':[[0,1],[3,4]]} #This defines multiple bond constraints: between atoms 0 and 1 AND also between atoms 3 and 4
constraints={'angle':[[98,99,100]]} #This defines a angle constraint between atoms 98,99 and 100
constraints={'dihedral':[[98,99,100,101]]} #This defines a dihedral constraint between atoms 98,99,100 and 101.
constraints={'bond':[[0,1],[3,4]], 'angle':[[98,99,100]]} #This defines 2 bond constraints and 1 angle constraint.
constraints={'xyz':[5,6]} #This defines XYZ constraints for the indicated atoms (i.e. freeze atoms). Alternative to frozenatoms option.
constraints={'x':[5,6]} #This defines a partial X Cartesian constraint for the indicated atoms.
constraints={'xy':[5,6]} #This defines a partial XY Cartesian constraint for the indicated atoms.
Note that, unlike the geomeTRIC library, DL-FIND constraints act directly on the initial geometry provided. It is not possible to specify what the constraint should be (i.e. like the constrainvalue option in geomeTRIC).
If you want a specific value of a constraint to be set (e.g. a specific dihedral angle), and the initial geometry does not have that value of the constraint, you have to first change the geometry so that the geometry has that value, prior to starting the minimization. This can be done in a few ways:
Manually (e.g. in a GUI molecular editor).
Use the geomeTRIC Optimizer with the constrainvalue option.
Do a RestraintTheory minimization first. See more information on Geometry optimization page.
To freeze atoms in space it is easiest to provide a list of frozen atoms by the frozenatoms keyword. Alternatively, if a large number of atoms should be frozen it may be easier to provide a list of active atoms by the actatoms keyword.
HDLC coordinates in DL-FIND and residue definitions
The hybrid delocalized internal coordinates (HDLCs) in DL-FIND are generally recommended for geometry optimizations. Some information on HDLCs can be found on the HDLC Chemshell page (where HDLCs were first implemented) and in the original article on HDLCs .
By default, the ASH interface to DL-FIND will, however, define a single system of internal coordinates which are close to internal delocalized coordinate (with 6 external degrees of freedom retained). This is fine for small systems. For larger systems, this will not scale very well due to a large diagonalization step of the G-matrix (see article). To circumvent this unfavorable scaling it is possible to split the molecular system into fragments or residues, and generate internal coordinates for each residue. This ensures linear scaling of geometry optimizations in internal coordinates. Cartesian coordinates are used to specify position and relative orientation of the residues. This procedure defines HDLCs and requires a definition of residues, that currently has to be user-specified.
Generally, residue definitions should range from 3-50 atoms in size and one should avoid cutting through rings and multiple-bonds. Furthermore only one molecule should be in a residue. The residue definitions should be given as a list-of-lists and passed to DLFIND_optimizer via the residues keyword. A manual way of specifying residues is shown below:
# Manual definition of residues for an 11-atom system: two 3-atom residues and one 5-atom residue
# This might be a system of 2 water molecules and a formic acid molecule
residues = [[0,1,2], [3,4,5], [6,7,8,9,10]]
# Call the optimizer with the residue definition
DLFIND_optimizer(theory=theory, fragment=frag, jobtype="opt", residues=residues)
For larger systems (where the need for HDLC residue definitions is greater) it obviously becomes laborious to manually specify residues. For biomolecules like proteins it is often possible to automatically get the residue definitions from the forcefield definition (i.e. amino acid residues). It is also possible to use define_residues , a function that automatically fragments a system according to connectivity and where one can specify minimum and maximum size.
def define_residues(fragment=None, min_size=5, max_size=15):
The function can be used like this:
from ash.interfaces.interface_dlfind import define_residues
fragment = Fragment(xyzfile="somelargesystem.xyz")
# Automatic residue definition
residues = define_residues(fragment=fragment, min_size=5, max_size=15)
# Call the optimizer with the residue definition
DLFIND_optimizer(theory=theory, fragment=frag, jobtype="opt", residues=residues)
Periodic Boundary Condition Optimizations with DL-FIND
DL-FIND is normally intended only for molecular systems (i.e. no translational symmetry), not systems with periodic boundary conditions. In previous ASH versions, if the system was described by a Theory-interface supporting PBCs (e.g. CP2K) but optimized with DLFIND_optimizer, a frozen-lattice optimization was automatically performed (i.e. only atoms of the cell were optimized).
More recently, the ASH interface to DL-FIND, supports a way of coaxing the DL-FIND library to simultaneously minimize atom positions and cell vectors of a periodic system. This is performed by converting the cell-vectors into dummy atoms that are simultaneously optimized in DL-FIND's internal coordinates.
To utilize this option, one only needs to provide a Theory object that has native support for PBCs and has enabled PBCs in object (usually via periodic = True keyword) The DLFIND_optimizer will then automatically perform an atom+cellvector optimization, otherwise a molecular calculation is carried out.
If this behaviour is not desired one can turn it off (force_noPBC = False in geomeTRICOptimizer) which should then correspond to a frozen lattice calculation.
During the optimization ASH will write XYZ-files and XYZ trajectories as normal. Once the optimization finishes will additionally write the coordinates in a file-format suited for PBCs. This format can be chosen by keyword PBC_format_option which is by default set to 'CIF' (see CIF file format but other other options are 'XSF' (see XSF file format) or 'POSCAR' (see POSCAR file format)
See also: Periodic boundary conditions in ASH .
Example below shows a way of using geomeTRIC to optimize an ammonia crystal using CP2K as a PBC-DFT theory.
from ash import *
# Defining an ASH fragment by reading an XYZ-file
frag = Fragment(xyzfile="ammonia.xyz", charge=0, mult=1)
# Defining the cell vectors
cell_vectors = np.array([[5.01336,0.0,0.0],[0.0,5.01336,0.0],[0.0,0.0,5.01336]])
#periodic_cell_dimensions=[5.01336, 5.01336, 5.01336, 90.0, 90.0,90.0]
#CP2K basis set and pseudopotential information
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH', 'N':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1', 'N':'GTH-PBE-q5'}
#Periodic CP2KTheory definition with specified cell dimensions
theory = CP2KTheory(cp2k_bin_name="cp2k.psmp",basis_dict=basis_dict,potential_dict=potential_dict,
basis_method='GPW', functional='PBE', ngrids=4, cutoff=600, numcores=numcores,
periodic=True,cell_vectors=cell_vectors, psolver='periodic', stress_tensor=True)
# Calling the DL-FIND optimizer.
# The optimizer will check for PBC support of the theory object and enable PBC optimization in HDLC (only coordsystem supported)
DLFIND_optimizer(theory=theory, fragment=frag, PBC_format_option="XSF")
Examples
Example: Default geometry optimization in HDLC internal coordinates
Here we use jobtype="opt" option.
from ash import *
frag = Fragment(databasefile="h2o.xyz", charge=0, mult=1)
theory = xTBTheory()
DLFIND_optimizer(theory=theory, fragment=frag, jobtype="opt", maxcycle=300)
Example: Geometry optimization in HDLC internal coordinates using icoord/iopt syntax and with constraints
Here we manually select the DL-FIND options by specifying icoord=1 (HDLC coordinates) and iopt=3 (L-BFGS algorithm). Also showing how constraints can be provided by providing a dictionary (same format as used in geomeTRIC interface).
from ash import *
frag = Fragment(databasefile="h2o.xyz", charge=0, mult=1)
theory = xTBTheory()
DLFIND_optimizer(theory=theory, fragment=frag, icoord=1, iopt=3, maxcycle=300, constraints={'bond':[[0,1]]})
Example: Dimer saddlepoint optimization
from ash import *
frag = Fragment(xyzfile="knarr_saddle.xyz", charge=0, mult=1)
frag2 = Fragment(xyzfile="system102-sp.xyz", charge=0, mult=1)
theory = xTBTheory(xtbmethod="GFN2")
DLFIND_optimizer(theory=theory, fragment=frag, fragment2=frag2, icoord=210, iopt=3, maxcycle=300)
Example: P-RFO saddlepoint optimization
A P-RFO saddlepoint job requires an input-Hessian.
inithessian controls what Hessian DL-FIND will use:
0: external program. if failure we go to 2-point FD
1: 1-point FD
2: 2-point FD
3: diagonal 1-point FD
4: identity matrix
For inithessian=0, ASH computes the Hessian in one of various ways. The hessian_choice keyword can be set to "numfreq", "anfreq", "xtb", "file:Hessianfilename" (read from file) or defined as a 2d numpy-array. For the "numfreq" option we can control the approximate Hessian calculated via numfreq_npoint, numfreq_displacement, numfreq_hessatoms keywords.
from ash import *
frag = Fragment(xyzfile="saddle_guess.xyz", charge=0, mult=1)
theory = xTBTheory(xtbmethod="GFN2")
# Read previously calculated Hessian from file
hessian = np.loadtxt("Hessian")
# Start P-RFO job with this input Hessian
DLFIND_optimizer(theory=theory, fragment=frag, jobtype="tsopt", inithessian=0,
hessian_choice=hessian, maxcycle=300)
Example: NEB
jobtype="neb" selects a climbing-image NEB job with frozen endpoints (same as icoord=120).
from ash import *
frag = Fragment(xyzfile="system10-react.xyz", charge=0, mult=1)
frag2 = Fragment(xyzfile="system10-prod.xyz", charge=0, mult=1)
theory = xTBTheory(xtbmethod="GFN2")
DLFIND_optimizer(theory=theory, fragment=frag, fragment2=frag2, jobtype="neb", maxcycle=300, nimage=30)
Example: QM/MM geometry optimization of an active-region around a metalloprotein active-site
Here we provide a list of active atoms to the DL-FIND optimizer (everything else will be frozen).
from ash import *
#Define number of cores variable
numcores=1
#Fe(SCH2)4 indices (inspect system_aftersolvent.pdb file to get indices)
qmatoms=[93,94,95,96,133,134,135,136,564,565,566,567,604,605,606,607,755]
#Defining fragment containing coordinates (can be read from XYZ-file, ASH fragment, PDB-file)
lastpdbfile="final_MDfrag_laststep_imaged.pdb"
fragment=Fragment(pdbfile=lastpdbfile)
#Creating new OpenMM object from OpenMM XML files (built-in CHARMM36 and a user-defined one)
omm = OpenMMTheory(xmlfiles=["charmm36.xml", "charmm36/water.xml", "./specialresidue.xml"], pdbfile=lastpdbfile, periodic=True,
platform='CPU', numcores=numcores, autoconstraints=None, rigidwater=False)
#QM theory
xtbobject = xTBTheory(xtbmethod="GFN1", numcores=numcores)
#QM/MM theory
qmmm = QMMMTheory(qm_theory=xtbobject, mm_theory=omm, fragment=fragment,
embedding="Elstat", qmatoms=qmatoms, printlevel=1, qm_charge=-1, qm_mult=6)
# QM/MM geometry optimization
actatoms=read_intlist_from_file("active_atoms")
DLFIND_optimizer(jobtype="opt", theory=qmmm, fragment=fragment, actatoms=actatoms, maxcycle=200)
The DLFIND_optimizerClass
The DLFIND_optimizer described above is actually a wrapper function around a class: DLFIND_optimizerClass.
It is generally recommended to use the function call above directly, most of the time. However, if you do require more flexibility for your ASH script then it is also possible to create an object from the class directly and use the built-in run method.
class DLFIND_optimizerClass:
def __init__(self,jobtype=None, fragment=None, fragment2=None, theory=None, charge=None, mult=None,
maxcycle=250, tolerance=4.5E-4, tolerance_e=1E-6,
printlevel=2, result_write_to_disk=True, actatoms=None, frozenatoms=None, residues=None, constraints=None,
icoord=None, iopt=None, nimage=None, delta=0.01,
hessian_choice='numfreq', inithessian=None,
numfreq_npoint=1,numfreq_displacement=0.005,numfreq_force_projection=None,
numfreq_hessatoms=None, print_atoms_list=None,
force_noPBC=False, PBC_format_option='CIF'):
Example on how to use:
#Create optimizer object
optimizer = DLFIND_optimizerClass(theory=theory, fragment=fragment)
#Run the optimizer object
result = optimizer.run()