CP2K interface

CP2K is a very popular and powerful periodic electronic structure program, particulary known for its unique mixed Gaussian and plane waves (GPW) approach and efficient parallelization which enables large-scale DFT MD simulations to be carried out.

ASH features a reasonably flexible interface to CP2K allowing the use of CP2K both as a QM-only code within ASH or as part of QM/MM Theory in ASH. Energies and gradients are available in the interface so the CP2KTheory class in ASH can be used for single-point energies, geometry optimizations, numerical frequencies, surface scans, NEB and molecular dynamics. Neither the CP2K optimizer,MD or NEB routines are used, instead ASH handles all of this. Both periodic and non-periodic systems are supported to some extent. Additionally pointcharge-embedding is supported by the interface via the GEEP (Gaussian Expansion of Electrostatic Potential) approach in CP2K. This allows QM/MM calculations with CP2K as QM-code and OpenMM as MM-code within ASH.

If the purpose is to primarily carry out periodic DFT MD simulations, running CP2K via the ASH interface will not offer many benefits over CP2K directly, however, running geometry optimizations, surface scans and NEB via ASH may be more convenient than using CP2K directly. Furthermore the QM/MM capabilities within ASH and the flexible forcefield support by OpenMM may be preferable to the CP2K options.

CP2KTheory class:

class CP2KTheory:
    def __init__(self, cp2kdir=None, cp2k_bin_name=None, filename='cp2k', printlevel=2, basis_dict=None, potential_dict=None, label="CP2K",
                periodic=False, periodic_type='XYZ', qm_periodic_type=None,cell_dimensions=None, cell_vectors=None,
                qm_cell_dims=None, qm_cell_shift_par=6.0, wavelet_scf_type=40,
                functional=None, psolver='wavelet', potential_file='POTENTIAL', basis_file='BASIS',
                basis_method='GAPW', ngrids=4, cutoff=250, rel_cutoff=60,
                method='QUICKSTEP', numcores=1, parallelization='OMP', mixed_mpi_procs=None, mixed_omp_threads=None,
                center_coords=True, scf_maxiter=50, outer_SCF=True, outer_scf_maxiter=10, scf_convergence=1e-6, eps_default=1e-10,
                coupling='GAUSSIAN', GEEP_num_gauss=6, MM_radius_scaling=1, mm_radii=None,
                OT=True, OT_minimizer='DIIS', OT_preconditioner='FULL_ALL',
                OT_linesearch='3PNT', outer_SCF_optimizer='DIIS', OT_energy_gap=0.08):

Keyword	Type	Default value	Details
`cp2kdir`	string	None	Directory where CP2K binaries are.
`cp2k_bin_name`	string	None	Name of CP2K binary to use. If not provided, ASH will search for cp2k.ssmp, cp2k.sopt, cp2k.psmp in the cp2kdir directory or PATH.
`printlevel`	integer	2	Printlevel
`filename`	string	'cp2k'	Name of CP2K inputfile created by ASH.
`numcores`	integer	1	The number of CPU cores used.
`parallelization`	string	'OMP'	Parallelization strategy. Options: 'OMP', 'MPI', 'Mixed'
`mixed_mpi_procs`	integer	1	If parallelization='Mixed': The number of MPI processes active.
`mixed_omp_threads`	integer	None	If parallelization='Mixed': The number of OpenMP threads per MPI process.
`basis_dict`	dict	None	Dictionary defining basis set information for each element.
`potential_dict`	dict	None	Dictionary defining pseudopotential information for each element.
`label`	string	'CP2K'	Label for the CP2KTheory object.
`periodic`	Boolean	False	Whether to use periodic boundary conditions or not.
`periodic_type`	string	'XYZ'	Type of PBC used. Options: 'XYZ', 'XY', 'XZ', 'YZ' etc.
`cell_dimensions`	list	None	The cell dimensions defining the box (applies to both PBC and no-PBC). List of cell-lengths in Angstroms and cell-angles in degrees.
`cell_vectors`	list of lists	None	Alternative way of specifying the cell dimensions. List of lists of cell-vectors in Angstroms.
`wavelet_scf_type`	integer	40	Wavelet-only: Scaling function. Possible values: 8,14,16,20,24,30,40,50,60,100
`functional`	string	None	Name of DFT functional to use.
`psolver`	string	'wavelet'	Name of Poisson solver to use. Options: 'PERIODIC', 'MT', 'WAVELET', 'MULTIPOLE'.
`potential_file`	string	'POTENTIAL'	Name of potential file.
`basis_file`	string	'BASIS'	Name of basis set file.
`basis_method`	string	'GAPW'	Type of CP2K basis-set method to use. Options: 'GPW' (Gaussian-planewave with pseudopoentials), 'GAPW' (Gaussian augmented planewave).
`ngrids`	integer	4	Number of real-space grids to use.
`cutoff`	integer	250	Planewave cutoff (in Ry) used.
`rel_cutoff`	integer	60	Relative cutoff (in Ry) used. Controls which product Gaussians are mapped onto which level of the multi-grid.
`center_coords`	Boolean	True	Whether CP2K centers coordinates or not. Usually necessary.
`scf_convergence`	float	1e-6	SCF convergence in Hartree.
`scf_maxiter`	integer	50	Max number of (inner) SCF iterations
`eps_default`	float	1e-10	Overall CP2K convergence setting (see manual). Probably more useful than scf_convergence
`OT`	Boolean	True	Whether the OT SCF method is on or not
`OT_minimizer`	string	'DIIS'	Type of minimizer method. Options: 'DIIS', 'CG', 'SD', 'BROYDEN'
`OT_preconditioner`	string	'FULL_ALL'	OT preconditioner. Options: 'FULL_ALL', 'FULL_SINGLE_INVERSE' and more
`OT_linesearch`	string	'3PNT'	OT linesearch option. Options: '3PNT', '2PNT', 'NONE', 'GOLD'.
`outer_SCF`	Boolean	True	Whether outer-SCF loop is on or not.
`outer_scf_maxiter`	integer	10	How many outer SCF iterations to perform.
`outer_SCF_optimizer`	string	'DIIS'	Outer SCF optimizer method. Options: 'DIIS', 'CG'
`OT_energy_gap`	float	0.08	Energy gap to use in OT method.
`method`	string	'QUICKSTEP'	The CP2K module to use.
`qm_periodic_type`	string	None	QM/MM only: Type of PBC used for QM-part. Options: 'XYZ', 'XY', 'XZ', 'YZ' etc.
`qm_cell_dims`	list	None	QM/MM only: List of cell-lengths in Angstroms for the QM region. Cell dimensions estimated if not provided.
`qm_cell_shift_par`	float	None	QM/MM only: Parameter used to shift the QM cell dimensions based on molecule size. Only used if qm_cell_dims is not provided.
`coupling`	string	'GAUSSIAN'	QM/MM only: The type of QM/MM coupling used. Only 'GAUSSIAN' is supported for DFT. 'COULOMB' available for semi-empiricical systems.
`GEEP_num_gauss`	integer	6	QM/MM only: Number of Gaussians used to expand each MM point charge.
`MM_radius_scaling`	float	1.0	QM/MM only: Optional scaling factor of the MM radii.
`mm_radii`	dict	None	QM/MM only: Optional dictionary of MM radii for each element. If not provided, radii are estimated internal table.

CP2K installation

CP2K can be installed in several different ways, see https://www.cp2k.org/download It can be tricky to compile. It is easiest to either download binaries (see link, though MPI-parallel version not typically available) or install via conda/mamba (see https://anaconda.org/conda-forge/cp2k). Alternatively you can compile CP2K from source: https://github.com/cp2k/cp2k/blob/master/INSTALL.md

Note that downloaded or compiled CP2K binaries may come in a few different forms: e.g. cp2k.ssmp, cp2k.sopt, ccp2k.popt, cp2k.psmp where sopt means serial-optimized, ssmp means single-process with OpenMP, popt means parallel-optimized with MPI and psmp means parallel-optimized with MPI and OpenMP. The cp2k.psmp binary is the most flexible and is recommended to use if available.

We have had success installing the latest CP2K version (2024.1) via conda/mamba like this:

#CP2K 2024.1 with OpenMPI and OpenBLAS: installs cp2k.psmp binary
#Note: mamba install cp2k will install non-MPI version: cp2k.ssmp binary
mamba install cp2k=2024.1=openblas_openmpi_h7c9ef3d_1

ASH will then try find the CP2K installation to use according to this logic:

if cp2kdir variable provided (containing path to where the binaries are) and cp2k_bin_name provided: use that binary in that directory
if cp2kdir variable provided but cp2k_bin_name NOT provided: search for cp2k.X executables in the cp2kdir directory
if cp2kdir variable NOT provided but cp2k_bin_name provided: search for cp2k_bin_name in PATH
if cp2kdir variable NOT provided and cp2k_bin_name not provided: search for cp2k.X executables in PATH

Note that the search for executables will only work if the binaries are named: cp2k.X where X is one of: ssmp, sopt, popt, psmp. ASH will search for executables in this order: ["cp2k.psmp", "cp2k.popt", "cp2k.ssmp","cp2k.sopt"]

Parallelization

CP2K binaries differ in their parallelization capabilities:

sopt: no parallelization
ssmp: only OpenMP parallelization
popt: only MPI parallelization
psmp: mixed MPI and OpenMP parallelization. Primarily useful for massive parallelization (>10K cores).

The CP2K manual advises to use the cp2k.psmp executable as it is the most flexible (can be used for serial, MPI, OpenMP and Mixed calculations). If this executable is available it is best to specify this in the CP2KTheory object using the cp2k_bin_name keyword.

The parallelization that ASH will tell CP2K to use is controlled by the parallelization keyword when defining the CP2KTheory object in ASH. It can be se set to 'OMP' (OpenMP threading, default), 'MPI' or 'Mixed'. The best parallelization strategy depends on the system and the hardware and you may have to do your own benchmarks.

OpenMP parallelization

This is the easiest parallelization strategy to start using and is hence the default (parallelization='OMP') in ASH. It requires either a cp2k.ssmp or cp2k.psmp executable. One simply has to specify the number of CPU cores to be used via the numcores keyword in CP2KTheory. 1 CP2K process (either cp2k.ssmp or cp2k.psmp) executable will be launched which will be capable of OpenMP threading up to the chosen number of cores.

MPI parallelization

For MPI-parallelization one should set parallelization='MPI'. It requires either the cp2k.popt or cp2k.psmp executable. Additionally a CP2K-compatible MPI program needs to be installed and in PATH. ASH assumes OpenMPI and will search for this. If numcores=4 and parallelization='MPI' then 4 CP2K processes will be launched by the mpirun program.

As discussed on https://www.cp2k.org/faq:mpi_vs_openmp CP2K is primarily MPI-parallelized and is thus probably a faster option than OpenMP overall.

Mixed OMP/MPI parallelization

The mixed OMP/MPI parallelization is only possible using the cp2k.psmp executable. This parallelization strategy is particularly useful for massively parallel calculations (thousands of cores) and may not be as beneficial for small systems or a small amount of CPU cores. But this depends on the system and hardware so test it out.

To use one should set: parallelization='Mixed' , specify the total number of CPU cores by the numcores keyword and additionally one must specify how many MPI processes and how many OMP threads per process via the mixed_mpi_procs and mixed_omp_threads keywords. For example, if numcores=8, mixed_mpi_procs=4 and mixed_omp_threads=2 then 4 MPI processes will be used with 2 OMP threads used per process, for a total of 8 utilized CPU cores. Note that ASH will give an error if numcores is not equal to mixed_mpi_procs*mixed_omp_threads.

Controlling the basis set

The primary purpose of using CP2K is probably to take advantage of the efficient mixed Gaussian and plane wave (GPW) approach where Gaussians are used to calculate the 1-electron integrals and plane waves are used to calculate the 2-electron integrals. Furthermore the user should specify whether the standard GPW (Gaussian and planewaves) or GAPW (Gaussian augmented GPW) method should be used. Pseudopotential-based calculations can be performed with both methods, however, all-electron calculations can only be performed with GAPW. GAPW may have more stable forces and require reduced cutoff but may be more expensive.

Depending on whether GPW or GAPW is used, suitable basis set and pseudopotential information should be provided. This is controlled by defining the basis_dict and potential_dict keywords in the CP2KTheory object. The chosen basis sets and pseudopotentials must be available in the specified basis and potential files. For all-electron GAPW calculations one should set value for each element in the potential_dict to 'ALL'.

#Defining MOLOPT basis sets and GTH pseudopotentials for each element
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'}
cp2k_object = CP2KTheory(basis_method='GPW', basis_dict=basis_dict,potential_dict=potential_dict,
        potential_file='POTENTIAL', basis_file='BASIS',)

Note that if the specified basis-file or potential-file is not in the current dir (or parent dir) then ASH will automatically copy a file containing GTH pseudopotentials (renamed from GTH_POTENTIALS to POTENTIAL) and MOLOPT basis sets (renamed from BASIS_MOLOPT to BASIS). This will only work if MOLOPT basis sets are being used. For all other basis sets, then the user must provide the basis and potential files.

For the planewave part of the basis set, the cutoff and rel_cutoff keywords can be used to control the cutoffs used. The number of grids also play a role in the accuracy of the calculation and can be controlled by the ngrids keyword (default is 4). Suitable cutoff values and grids require some experience or testing. See https://www.cp2k.org/howto:converging_cutoff for some information on how to choose cutoffs and grids. A reasonable value for the Cutoff is 250 Ry and a good value for the rel_cutoff is usually 60 Ry. These cutoff should be varied simultaneously. These are the ASH defaults but we don't have a lot of experience with CP2K so don't rely on these values. Some system setups (depends on elements, basis set and pseudopotential) may require larger values and other systems will run more efficiently with smaller values.

Periodic vs. non-periodic calculations

CP2K is a code first and foremost developed for the purpose of periodic calculations. It is nonetheless possible to perform non-periodic calculations and this is probably preferable for calculations on molecules in vacuum (to avoid PBC artifacts) and may also be beneficial for some QM/MM applications within ASH.

Regardless of whether the system is periodic or not, the system cell needs to be specified. The cell_dimensions (e.g. cell_dimensions=[10.0,10.0,10.0,90.0,90.0,90.0] or cell_vectors (cell_vectors=[[10.0,0.0,0.0],[0.0,10.0,0.0],[0.0,0.0,10.0]]) keywords should be used to define the box size. For non-periodic calculations this is necessary as the basis set and solver are based on the box dimensions. However, if cell information is not provided, then by default a cell size will be automatically estimated (by ASH) based on the molecule size and the qm_cell_shift_par parameter will extend the box by an additional amount (6.0 Angstrom by default).

Non-periodic calculations

For non-periodic calculations, the CP2KTheory object should be defined with periodic=False, this is the default. The cell should be specified as described above. Poisson solver should also be specified with the psolver keyword. The default is 'wavelet' which is probably the most efficient for non-periodic calculations. The accuracy of the solver can be controlled by the wavelet_scf_type keyword (see CP2K-manual-wavelet ). Another Poisson solver option is 'MT' (CP2K-manual-MT ). In the case of the MT solver the cell should be at least 2 as large as the charge density (i.e. the molecule). The cell can be smaller for the wavelet solver.

Periodic calculations

For periodic calculations, the CP2KTheory object should be defined with periodic=True. The periodic_type is by default 'XYZ' (i.e. PBC in all directions). The cell size should be specified as described above. Poisson solver options are : 'PERIODIC', 'WAVELET', 'MULTIPOLE' or 'IMPLICIT'. The PERIODIC solver is recommended (only available for full 3D periodicity).

QM/MM

QM/MM calculations are possible in the ASH interface to CP2K. Unlike most other QM-codes, however, regular electrostatic embedding is not available for DFT-methods in CP2K so instead we use the GEEP (Gaussian Expansion of Electrostatic Potential) approach available in CP2K. This approach expands the MM pointcharges as Gaussians. The GEEP approach is overall an improvement over traditional electrostatic embedding as it should prevent charge-leakage onto MM atoms (electron spill-out effect) which can be a larger problem for planewave-basis calculations. The GEEP approach, however, requires definition of radii on the MM-atoms which control the width of the Gaussians used to expand the MM pointcharges.

To use CP2K as QM-code in an ASH QM/MM calculation one needs be aware of a few things:

The cell size must be specified (either cell_dimension or cell_vectors) but counterintuitively it needs to be specified for the whole system (QM+MM) and not just the QM-part as CP2K needs this information.
Additionally the QM-cell size should be specified (where the electrons and basis sets are) and this should be a box encompassing the whole QM-region (slightly larger). The qm_cell_dims keyword can be used to specify this or alternatively ASH can also estimate the QM-cell size based on the QM-region size and the qm_cell_shift_par extension parameter (default 6). If the Poisson solver is wavelet, the QM-cell needs to be cubic (automatically done if the QM-cell size is estimated from the QM-region).
A QM/MM job with CP2K in ASH can either be periodic or non-periodic. For non-periodic calculations it is recommended to use the wavelet Poisson solver.
For periodic QM/MM calculations, one should typically set: periodic=True and psolver='PERIODIC'

One then should specify the QM/MM electrostatic coupling. For DFT only the Gaussian-based GEEP approach is available (coupling='GAUSSIAN') while coupling='COULOMB' is available for semi-empirical systems. GEEP can only be used with the wavelet or periodic Poisson solver (not 'MT'). The number of Gaussians used to expand each MM-center is controlled by the GEEP_num_gauss keyword (default=6). The width of the Gaussians depends on the defined MM-radius for each MM site which should vary according to the element. Element information of the MM-region is automatically passed onto CP2K and default MM-radii will be used:

#Element radii in Angstrom (will be converted to Bohrs by CP2K)
element_radii_for_cp2k = {'H':0.44,'He':0.44,'Li':0.6,'Be':0.6,'B':0.78,'C':0.78,'N':0.78,'O':0.78,'F':0.78,'Ne':0.78,
                    'Na':1.58,'Mg':1.58,'Al':1.67,'Si':1.67,'P':1.67,'S':1.67,'Cl':1.67,'Ar':1.67,
                    'K':1.52,'Ca':1.6,'Sc':1.6,'Ti':1.6,'V':1.6,'Cr':1.6,'Mn':1.6,'Fe':1.6,'Co':1.6,
                    'Ni':1.6,'Cu':1.6,'Zn':1.6,'Br':1.6, 'Mo':1.7}

If the user wants to use different MM radii for each element, then this can be specified with the mm_radii keyword which should point to a dictionary containing radii for each element present in the system. It is also possible to use the MM_radius_scaling keyword to scale the radii by a factor (default=1.0).

ASH handles the creation of linkatoms and charge-shifting at a QM-MM boundary and this information is provided to CP2K as a modified XYZ-file. It is unclear whether the automatic dipole-correction (addition of charges to maintain dipole), commonly employed in charge-shifted electrostatic embedding QM/MM is useful when combined with the GEEP procedure of CP2K. It is thus possible to turn it off with the dipole_correction keyword in the QMMMTheory object.

Examples

In the examples below it is assumed that the CP2K binaries are already in PATH (no need to use cp2kdir)

Minimal non-periodic geometry optimization of MeOH in vacuum:

from ash import *

numcores=2
frag = Fragment(xyzfile="MeOH.xyz",charge=0, mult=1)

#Basis set and pseudopotential information per element
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'}

#Minimal CP2KTheory definition: no periodicity, psolver=wavelet by default, basis_method='GAPW', cutoff=250,rel_cutoff=60
#Cell dimensions are estimated from molecule size
qm = CP2KTheory(cp2k_bin_name="cp2k.ssmp",basis_dict=basis_dict,potential_dict=potential_dict,functional='PBE',numcores=numcores)
#Geometry optimization
Optimizer(theory=qm, fragment=frag)

Periodic geometry optimization of MeOH in vacuum:

from ash import *

numcores=2
frag = Fragment(xyzfile="MeOH.xyz",charge=0, mult=1)

#Basis set and pseudopotential information per element
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'}

#Periodic CP2KTheory definition with specified cell dimensions
qm = CP2KTheory(cp2k_bin_name="cp2k.ssmp",basis_dict=basis_dict,potential_dict=potential_dict,functional='PBE',numcores=numcores,
                periodic=True,cell_dimensions=[10,10,10,90,90,90], psolver='periodic',basis_method='GPW', ngrids=4, cutoff=450, rel_cutoff=50)
#Geometry optimization
Optimizer(theory=qm, fragment=frag)

Non-periodic QM/MM geometry optimization and frequencies of a protein

Here using the simple solvated lysozyme protein as a test system with a threonine sidechain in the QM-region.

from ash import *

numcores=4

#Defining path to dir containing forcefield files and coordinates
forcefielddir="./"
psffile=forcefielddir+"step3_pbcsetup.psf"
topfile=forcefielddir+"top_all36_prot.rtf"
prmfile=forcefielddir+"par_all36_prot.prm"
xyzfile=forcefielddir+"coordinates.xyz"

#Read coordinates from XYZ-file
frag = Fragment(xyzfile=xyzfile)

#Creating OpenMM object from CHARMM-files
openmmobject = OpenMMTheory(psffile=psffile, CHARMMfiles=True, charmmtopfile=topfile,
    charmmprmfile=prmfile, periodic=True, periodic_cell_dimensions=[80.0, 80.0, 80.0, 90.0, 90.0, 90.0],
    autoconstraints=None, rigidwater=False)
#CP2KTheory object
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','N':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','N':'GTH-PBE-q5', 'O':'GTH-PBE-q6','H':'GTH-PBE-q1'}
functional='PBE'
#cell_dimensions are for full system (slight expansion was necessary
#QM-cell dimensions here defined manually
qm = CP2KTheory(basis_dict=basis_dict,potential_dict=potential_dict,functional=functional, psolver='wavelet', coupling='GAUSS',
    periodic=False, cell_dimensions=[82.0, 82.0, 82.0, 90.0, 90.0, 90.0], qm_cell_dims=[12.0,12.0,12.0], numcores=numcores)

#act and qmatoms lists. Defines QM-region (atoms described by QM) and Active-region (atoms allowed to move)
#IMPORTANT: atom indices begin at 0.
#Here selecting the side-chain of a threonine residue
qmatoms = [569,570,571,572,573,574,575,576]
actatoms = qmatoms

# Create QM/MM OBJECT by combining QM and MM objects above. Dipole-correction turned off.
qmmmobject = QMMMTheory(qm_theory=qm, mm_theory=openmmobject, printlevel=2, dipole_correction=False,
                        fragment=frag, embedding="Elstat", qmatoms=qmatoms, qm_charge=0, qm_mult=1)

#Run geometry optimization using geomeTRIC optimizer and HDLC coordinates. Using QM-region as active-region.
Optimizer(theory=qmmmobject, fragment=frag, ActiveRegion=True, actatoms=actatoms,
                    maxiter=200, coordsystem='hdlc', charge=0,mult=1)

#Partial numerical Hessian calculation
NumFreq(theory=qmmmobject, fragment=frag, hessatoms=actatoms)

Periodic QM/MM MD simulation of a protein

from ash import *

numcores=2

#Defining path to dir containing forcefield files and coordinates
forcefielddir="./"
psffile=forcefielddir+"step3_pbcsetup.psf"
topfile=forcefielddir+"top_all36_prot.rtf"
prmfile=forcefielddir+"par_all36_prot.prm"
xyzfile=forcefielddir+"coordinates.xyz"

#Read coordinates from XYZ-file
frag = Fragment(xyzfile=xyzfile)

#Creating OpenMM object
openmmobject = OpenMMTheory(psffile=psffile, CHARMMfiles=True, charmmtopfile=topfile,
    charmmprmfile=prmfile, periodic=True, periodic_cell_dimensions=[80.0, 80.0, 80.0, 90.0, 90.0, 90.0],
    do_energy_decomposition=True, autoconstraints=None, rigidwater=False)
#QM
basis_dict={'C':'DZVP-MOLOPT-SR-GTH','N':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'}
potential_dict={'C':'GTH-PBE-q4','N':'GTH-PBE-q5', 'O':'GTH-PBE-q6','H':'GTH-PBE-q1'}
functional='PBE'
qm = CP2KTheory(basis_dict=basis_dict,potential_dict=potential_dict,functional=functional, psolver='periodic', coupling='GAUSS',
    periodic=True, cell_dimensions=[82.0, 82.0, 82.0, 90.0, 90.0, 90.0], qm_cell_dims=[12.0,12.0,12.0], numcores=numcores)

#act and qmatoms lists. Defines QM-region (atoms described by QM) and Active-region (atoms allowed to move)
#IMPORTANT: atom indices begin at 0.
#Here selecting the side-chain of threonine
qmatoms = [569,570,571,572,573,574,575,576]
actatoms = qmatoms


# Create QM/MM OBJECT by combining QM and MM objects above
qmmmobject = QMMMTheory(qm_theory=qm, mm_theory=openmmobject, printlevel=2, dipole_correction=False,
                        fragment=frag, embedding="Elstat", qmatoms=qmatoms, qm_charge=0, qm_mult=1)

#MD simulation with 0.5 fs timestep for 2 ps. enforcePeriodicBox=False to prevent OpenMM PBC wrapping
OpenMM_MD(fragment=frag, theory=qmmmobject, timestep=0.0005, simulation_time=2, traj_frequency=1, temperature=300,
    integrator='LangevinIntegrator', coupling_frequency=1, enforcePeriodicBox=False)