CP2K interface ====================================== `CP2K `_ is a very popular and powerful periodic electronic structure program, particulary known for its unique mixed Gaussian and plane waves (GPW) approach and efficient parallelization which enables large-scale DFT MD simulations to be carried out. ASH features a reasonably flexible interface to CP2K allowing the use of CP2K both as a QM-only code within ASH or as part of QM/MM Theory in ASH. Energies and gradients are available in the interface so the CP2KTheory class in ASH can be used for single-point energies, geometry optimizations, numerical frequencies, surface scans, NEB and molecular dynamics. Neither the CP2K optimizer,MD or NEB routines are used, instead ASH handles all of this. Both periodic and non-periodic systems are supported to some extent. Additionally pointcharge-embedding is supported by the interface via the GEEP (Gaussian Expansion of Electrostatic Potential) approach in CP2K. This allows QM/MM calculations with CP2K as QM-code and OpenMM as MM-code within ASH. If the purpose is to primarily carry out periodic DFT MD simulations, running CP2K via the ASH interface may not offer many benefits over CP2K directly, however, running geometry optimizations, surface scans and NEB via ASH may be more convenient than using CP2K directly. Furthermore the QM/MM capabilities within ASH and the flexible forcefield support by OpenMM may be preferable to the CP2K options. **CP2KTheory class:** .. code-block:: python class CP2KTheory: def __init__(self, cp2kdir=None, cp2k_bin_name=None, filename='cp2k', printlevel=2, basis_dict=None, potential_dict=None, label="CP2K", periodic=False, periodic_type='XYZ', qm_periodic_type=None, xtb_periodic=False, xtb_type='GFN2', xtb_tblite=False, user_input_dft=None, vdwpotential=None, cell_dimensions=None, cell_vectors=None, qm_cell_dims=None, qm_cell_shift_par=6.0, wavelet_scf_type=40, functional=None, psolver='wavelet', potential_file='POTENTIAL', basis_file='BASIS', basis_method='GAPW', ngrids=4, cutoff=250, rel_cutoff=60, method='QUICKSTEP', numcores=1, parallelization='OMP', mixed_mpi_procs=None, mixed_omp_threads=None, center_coords=True, scf_maxiter=50, outer_scf_maxiter=10, scf_convergence=1e-6, eps_default=1e-10, coupling='GAUSSIAN', GEEP_num_gauss=6, MM_radius_scaling=1, mm_radii=None, OT=True, OT_minimizer='DIIS', OT_preconditioner='FULL_ALL', OT_linesearch='3PNT', outer_SCF=True, outer_SCF_optimizer='SD', OT_energy_gap=0.08): .. list-table:: :widths: 15 15 15 60 :header-rows: 1 * - Keyword - Type - Default value - Details * - ``cp2kdir`` - string - None - Directory where CP2K binaries are. * - ``cp2k_bin_name`` - string - None - Name of CP2K binary to use. If not provided, ASH will search for cp2k.ssmp, cp2k.sopt, cp2k.psmp in the cp2kdir directory or PATH. * - ``printlevel`` - integer - 2 - Printlevel * - ``filename`` - string - 'cp2k' - Name of CP2K inputfile created by ASH. * - ``numcores`` - integer - 1 - The number of CPU cores used. * - ``parallelization`` - string - 'OMP' - Parallelization strategy. Options: 'OMP', 'MPI', 'Mixed' * - ``mixed_mpi_procs`` - integer - 1 - If parallelization='Mixed': The number of MPI processes active. * - ``mixed_omp_threads`` - integer - None - If parallelization='Mixed': The number of OpenMP threads per MPI process. * - ``basis_dict`` - dict - None - Dictionary defining basis set information for each element. * - ``potential_dict`` - dict - None - Dictionary defining pseudopotential information for each element. * - ``label`` - string - 'CP2K' - Label for the CP2KTheory object. * - ``periodic`` - Boolean - False - Whether to use periodic boundary conditions or not. * - ``periodic_type`` - string - 'XYZ' - Type of PBC used. Options: 'XYZ', 'XY', 'XZ', 'YZ' etc. * - ``cell_dimensions`` - list - None - The cell dimensions defining the box (applies to both PBC and no-PBC). List of cell-lengths in Angstroms and cell-angles in degrees. * - ``cell_vectors`` - list of lists - None - Alternative way of specifying the cell dimensions. List of lists of cell-vectors in Angstroms. * - ``wavelet_scf_type`` - integer - 40 - Wavelet-only: Scaling function. Possible values: 8,14,16,20,24,30,40,50,60,100 * - ``functional`` - string - None - Name of DFT functional to use. * - ``psolver`` - string - 'wavelet' - Name of Poisson solver to use. Options: 'PERIODIC', 'MT', 'WAVELET', 'MULTIPOLE'. * - ``potential_file`` - string - 'POTENTIAL' - Name of potential file. * - ``basis_file`` - string - 'BASIS' - Name of basis set file. * - ``basis_method`` - string - 'GAPW' - Type of CP2K basis-set method to use. Options: 'GPW' (Gaussian-planewave with pseudopoentials), 'GAPW' (Gaussian augmented planewave) or 'XTB'. * - ``xtb_type`` - string - 'GFN2' - xTB-method type. Options: 'GFN2', 'GFN1', 'GFN0'. Only valid if basis_method='XTB'. * - ``xtb_tblite`` - Boolean - False - Whether to use the tblite library for xTB calculations instead of built-in xTB. Only valid if basis_method='XTB'. * - ``xtb_periodic`` - Boolean - False - Whether to use periodic boundary conditions for xTB calculations. Only valid if basis_method='XTB'. * - ``user_input_dft`` - string - None - Whether to use a user-defined DFT section in the CP2K input file. If None, ASH will generate the DFT section based on the other keywords. If a string is provided, it should either be a multi-line string containing the DFT section or a filename pointing to a file containing the DFT section. * - ``vdwpotential`` - string - None - Whether to use a built-in dispersion correction. Options: 'DFTD3', 'DFTD3BJ', 'DFTD4'. * - ``ngrids`` - integer - 4 - Number of real-space grids to use. * - ``kpoint_settings`` - list - None - k-point values to use, e.g. [2,2,2]. * - ``cutoff`` - integer - 250 - Planewave cutoff (in Ry) used. * - ``rel_cutoff`` - integer - 60 - Relative cutoff (in Ry) used. Controls which product Gaussians are mapped onto which level of the multi-grid. * - ``center_coords`` - Boolean - True - Whether CP2K centers coordinates or not. Usually necessary. * - ``scf_convergence`` - float - 1e-6 - SCF convergence in Hartree. * - ``scf_maxiter`` - integer - 50 - Max number of (inner) SCF iterations * - ``eps_default`` - float - 1e-10 - Overall CP2K convergence setting (see manual). Probably more useful than scf_convergence * - ``OT`` - Boolean - True - Whether the OT SCF method is on or not * - ``OT_minimizer`` - string - 'DIIS' - Type of minimizer method. Options: 'DIIS', 'CG', 'SD', 'BROYDEN' * - ``OT_preconditioner`` - string - 'FULL_ALL' - OT preconditioner. Options: 'FULL_ALL', 'FULL_SINGLE_INVERSE' and more * - ``OT_linesearch`` - string - '3PNT' - OT linesearch option. Options: '3PNT', '2PNT', 'NONE', 'GOLD'. * - ``outer_SCF`` - Boolean - True - Whether outer-SCF loop is on or not. * - ``outer_scf_maxiter`` - integer - 10 - How many outer SCF iterations to perform. * - ``outer_SCF_optimizer`` - string - 'DIIS' - Outer SCF optimizer method. Options: 'DIIS', 'CG' * - ``OT_energy_gap`` - float - 0.08 - Energy gap to use in OT method. * - ``method`` - string - 'QUICKSTEP' - The CP2K module to use. * - ``qm_periodic_type`` - string - None - QM/MM only: Type of PBC used for QM-part. Options: 'XYZ', 'XY', 'XZ', 'YZ' etc. * - ``qm_cell_dims`` - list - None - QM/MM only: List of cell-lengths in Angstroms for the QM region. Cell dimensions estimated if not provided. * - ``qm_cell_shift_par`` - float - None - QM/MM only: Parameter used to shift the QM cell dimensions based on molecule size. Only used if qm_cell_dims is not provided. * - ``coupling`` - string - 'GAUSSIAN' - QM/MM only: The type of QM/MM coupling used. Only 'GAUSSIAN' is supported for DFT. 'COULOMB' available for semi-empiricical systems. * - ``GEEP_num_gauss`` - integer - 6 - QM/MM only: Number of Gaussians used to expand each MM point charge. * - ``MM_radius_scaling`` - float - 1.0 - QM/MM only: Optional scaling factor of the MM radii. * - ``mm_radii`` - dict - None - QM/MM only: Optional dictionary of MM radii for each element. If not provided, radii are estimated internal table. ################################################################################ CP2K installation ################################################################################ CP2K can be installed in several different ways, see https://www.cp2k.org/download It can be tricky to compile. It is easiest to either download binaries (see link, though MPI-parallel version not typically available) or install via conda/mamba (see https://anaconda.org/conda-forge/cp2k). Github releases are here: https://github.com/cp2k/cp2k/releases/tag/v2026.1 Alternatively you can compile CP2K from source: https://github.com/cp2k/cp2k/blob/master/INSTALL.md Note that downloaded or compiled CP2K binaries may come in a few different forms: e.g. cp2k.ssmp, cp2k.sopt, ccp2k.popt, cp2k.psmp where sopt means serial-optimized, ssmp means single-process with OpenMP, popt means parallel-optimized with MPI and psmp means parallel-optimized with MPI and OpenMP. The cp2k.psmp binary is the most flexible and is recommended to use if available. We have had success installing CP2K via conda/mamba. However, the latest CP2K versions may not yet be available on conda-forge. and may have to be downloaded manually (Linux binaries are available on https://github.com/cp2k/cp2k/releases/tag/v2026.1 ). .. code-block:: shell #CP2K 2024.1 with OpenMPI and OpenBLAS: installs cp2k.psmp binary #Note: mamba install cp2k will install non-MPI version: cp2k.ssmp binary mamba install cp2k=2024.1=openblas_openmpi_h7c9ef3d_1 Once CP2K is installed you can try to use the ASH interface. ASH will try find the CP2K installation to use according to this logic: 1. if cp2kdir variable provided (containing path to where the binaries are) and cp2k_bin_name provided: use that binary in that directory 2. if cp2kdir variable provided but cp2k_bin_name NOT provided: search for cp2k.X executables in the cp2kdir directory 3. if cp2kdir variable NOT provided but cp2k_bin_name provided: search for cp2k_bin_name in PATH 4. if cp2kdir variable NOT provided and cp2k_bin_name not provided: search for cp2k.X executables in PATH Note that the search for executables will only work if the binaries are named: cp2k.X where X is one of: ssmp, sopt, popt, psmp. ASH will search for executables in this order: ["cp2k.psmp", "cp2k.popt", "cp2k.ssmp","cp2k.sopt"] ################################################################################ Parallelization ################################################################################ CP2K binaries differ in their parallelization capabilities: - **sopt**: no parallelization - **ssmp**: only OpenMP parallelization - **popt**: only MPI parallelization - **psmp**: mixed MPI and OpenMP parallelization. Primarily useful for massive parallelization (>10K cores). The CP2K manual advises to use the cp2k.psmp executable as it is the most flexible (can be used for serial, MPI, OpenMP and Mixed calculations). If this executable is available it is best to specify this in the CP2KTheory object using the *cp2k_bin_name* keyword. The parallelization that ASH will tell CP2K to use is controlled by the *parallelization* keyword when defining the CP2KTheory object in ASH. It can be se set to 'OMP' (OpenMP threading, default), 'MPI' or 'Mixed'. The best parallelization strategy depends on the system and the hardware and you may have to do your own benchmarks. **OpenMP parallelization** This is the easiest parallelization strategy to start using and is hence the default (*parallelization='OMP'*) in ASH. It requires either a cp2k.ssmp or cp2k.psmp executable. One simply has to specify the number of CPU cores to be used via the *numcores* keyword in CP2KTheory. 1 CP2K process (either cp2k.ssmp or cp2k.psmp) executable will be launched which will be capable of OpenMP threading up to the chosen number of cores. **MPI parallelization** For MPI-parallelization one should set *parallelization='MPI'*. It requires either the cp2k.popt or cp2k.psmp executable. Additionally a CP2K-compatible MPI program needs to be installed and in PATH. ASH assumes OpenMPI and will search for this. If *numcores=4* and *parallelization='MPI'* then 4 CP2K processes will be launched by the mpirun program. As discussed on https://www.cp2k.org/faq:mpi_vs_openmp CP2K is primarily MPI-parallelized and is thus probably a faster option than OpenMP overall. **Mixed OMP/MPI parallelization** The mixed OMP/MPI parallelization is only possible using the cp2k.psmp executable. This parallelization strategy is particularly useful for massively parallel calculations (thousands of cores) and may not be as beneficial for small systems or a small amount of CPU cores. But this depends on the system and hardware so test it out. To use one should set: *parallelization='Mixed'* , specify the total number of CPU cores by the *numcores* keyword and additionally one must specify how many MPI processes and how many OMP threads per process via the *mixed_mpi_procs* and *mixed_omp_threads* keywords. For example, if *numcores=8*, *mixed_mpi_procs=4* and *mixed_omp_threads=2* then 4 MPI processes will be used with 2 OMP threads used per process, for a total of 8 utilized CPU cores. Note that ASH will give an error if numcores is not equal to mixed_mpi_procs*mixed_omp_threads. ################################################################################ xTB in CP2K ################################################################################ CP2K is traditionally a DFT-code but also offer some semi-empirical tightbinding functionality, e.g. xTB. xTB methods come with a built-in basis set. To use CP2K for xTB calculations in ASH, the *basis_method* keyword should be set to 'XTB' (instead of default GAPW). and specify the method to use (e.g. 'GFN2-xTB') in the *functional* keyword. By default the internal xTB implementations in CP2K will be used (check CP2K manual for what is available) unless the *xtb_tblite* is set to True, in which case xtB will be run using the tblite library. Do note that xTB functionality in CP2K is undergoing changes. .. code-block:: python # GFN1-xTB built-in example cp2k_object = CP2KTheory(basis_method='XTB', xtb_type='GFN1') # GFN1-xTB built-in example with periodicity cp2k_object = CP2KTheory(basis_method='XTB', xtb_type='GFN1', xtb_periodic=True) # GFN2-xTB tblite example cp2k_object = CP2KTheory(basis_method='XTB', xtb_type='GFN2', xtb_tblite=True) ################################################################################ Controlling the basis set in DFT calculations ################################################################################ The primary purpose of using CP2K is probably to take advantage of the efficient mixed Gaussian and plane wave (GPW) approach where Gaussians are used to calculate the 1-electron integrals and plane waves are used to calculate the 2-electron integrals. Furthermore the user should specify whether the standard GPW (Gaussian and planewaves) or GAPW (Gaussian augmented GPW) method should be used via the *basis_method* keyword. Pseudopotential-based calculations can be performed with both methods, however, all-electron calculations can only be performed with GAPW. GAPW may have more stable forces and require reduced cutoff but may be more expensive. Depending on whether GPW or GAPW is used, suitable basis set and pseudopotential information should be provided. This is controlled by defining the *basis_dict* and *potential_dict* keywords in the CP2KTheory object. The chosen basis sets and pseudopotentials must be available in the specified basis and potential files. For all-electron GAPW calculations one should set value for each element in the potential_dict to 'ALL'. .. code-block:: python #Defining MOLOPT basis sets and GTH pseudopotentials for each element basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'} cp2k_object = CP2KTheory(basis_method='GPW', basis_dict=basis_dict,potential_dict=potential_dict, potential_file='POTENTIAL', basis_file='BASIS',) Note that if the specified basis-file or potential-file is not in the current dir (or parent dir) then ASH will automatically copy a file containing GTH pseudopotentials (renamed from GTH_POTENTIALS to POTENTIAL) and MOLOPT basis sets (renamed from BASIS_MOLOPT to BASIS). This will only work if MOLOPT basis sets are being used. For all other basis sets, then the user must provide the basis and potential files. For the planewave part of the basis set, the *cutoff* and *rel_cutoff* keywords can be used to control the cutoffs used. The number of grids also play a role in the accuracy of the calculation and can be controlled by the *ngrids* keyword (default is 4). Suitable cutoff values and grids require some experience or testing. See https://www.cp2k.org/howto:converging_cutoff for some information on how to choose cutoffs and grids. A reasonable value for the Cutoff is 250 Ry and a good value for the rel_cutoff is usually 60 Ry. These cutoff should be varied simultaneously. These are the ASH defaults but we don't have a lot of experience with CP2K so don't rely on these values. Some system setups (depends on elements, basis set and pseudopotential) may require larger values and other systems will run more efficiently with smaller values. ################################################################################ Periodic vs. non-periodic calculations ################################################################################ CP2K is a code first and foremost developed for the purpose of periodic calculations. It is nonetheless possible to perform non-periodic calculations and this is probably preferable for calculations on molecules in vacuum (to avoid PBC artifacts) and may also be beneficial for some QM/MM applications within ASH. Regardless of whether the system is periodic or not, the system cell needs to be specified. The *cell_dimensions* (e.g. *cell_dimensions=[10.0,10.0,10.0,90.0,90.0,90.0]* or *cell_vectors* (*cell_vectors=[[10.0,0.0,0.0],[0.0,10.0,0.0],[0.0,0.0,10.0]]*) keywords should be used to define the box size. For non-periodic calculations this is necessary as the basis set and solver are based on the box dimensions. However, if cell information is not provided, then by default a cell size will be automatically estimated (by ASH) based on the molecule size and the *qm_cell_shift_par* parameter will extend the box by an additional amount (6.0 Angstrom by default). **Non-periodic calculations** For non-periodic calculations, the CP2KTheory object should be defined with *periodic=False*, this is the default. The cell should be specified as described above. Poisson solver should also be specified with the psolver keyword. The default is 'wavelet' which is probably the most efficient for non-periodic calculations. The accuracy of the solver can be controlled by the *wavelet_scf_type* keyword (see `CP2K-manual-wavelet `_ ). Another Poisson solver option is 'MT' (`CP2K-manual-MT `_ ). In the case of the MT solver the cell should be at least 2 as large as the charge density (i.e. the molecule). The cell can be smaller for the wavelet solver. **Periodic calculations** For periodic calculations, the CP2KTheory object should be defined with *periodic=True*. The *periodic_type* is by default 'XYZ' (i.e. PBC in all directions). The cell size should be specified as described above. Poisson solver options are : 'PERIODIC', 'WAVELET', 'MULTIPOLE' or 'IMPLICIT'. The PERIODIC solver is recommended (only available for full 3D periodicity). **Dispersion corrections** CP2K supports a few dispersion correction options (see manual) including D3, D3BJ and D4 that can be used for functionals that have parameters available. To use these in ASH, use the *vdwpotential* keyword which should be set to either 'DFTD3', 'DFTD3BJ' or 'DFTD4'. For other dispersion corrections, the user has to provide the necessary input via the *user_input_dft* keyword as described in the next section. **K-point sampling** For large unit cells, the default gamma-point calculations are sufficient. For smaller or anisotropic unit cells, k-point sampling may be required. k-point sampling settings can be adjusted in ASH by the *kpoint_settings* keyword. By setting the *kpoint_settings* keyword to be equal to [2,2,2] in CP2KTheory, the CP2K inputfile is modified to include the block: .. code-block:: text &KPOINTS SCHEME MONKHORST-PACK 2 2 2 &END KPOINTS If more advanced k-point settings are required, see CP2K manual and use the advanced control option discussed below. ################################################################################ Advanced control of the CP2K input ################################################################################ The CP2K inputfile is complicated and the ASH interface was written partially for the purpose of simplifying the use of CP2K for common use-cases but also to make sure the ASH controls specifically how a CP2K energy+gradient calculation is carried out. However, since CP2K has a lot of options, it is impossible to cover all possible use-cases via the keyword syntax. To allow for more flexibility for advanced CP2K usage, ASH allows the user to provide their own DFT-section of the CP2K input file (the section that begins and ends with the &DFT ... &END DFT block). This option is enabled by the *user_input_dft* keyword and the user can provide either a file or a string containing the DFT section of the CP2K input file. If this option is used then ASH will not write it's own DFT-section (all DFT-related keywords will be ignored) but will still write the rest of the CP2K input file according to the ASH settings. Below is an example of how to use the *user_input_dft* keyword to provide a custom DFT section for a CP2KTheory object in ASH. In this example, the DFT section is defined via a multi-line string but it could also be read from a file. .. code-block:: python from ash import * numcores=2 frag = Fragment(xyzfile="meoh.xyz",charge=0, mult=1) #Basis set and pseudopotential information per element basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'} # Define whole DFT-section via multi-line string user_input_dft=""" &DFT &SCF SCF_GUESS RESTART MAX_SCF 500 EPS_SCF 1e-06 &OUTER_SCF OPTIMIZER SD MAX_SCF 10 &END OUTER_SCF &END SCF CHARGE 0 MULTIPLICITY 1 BASIS_SET_FILE_NAME BASIS POTENTIAL_FILE_NAME POTENTIAL &POISSON PERIODIC NONE PSOLVER wavelet &WAVELET wavelet SCF_TYPE 40 &END WAVELET wavelet &END POISSON &PRINT &MO EIGENVALUES .TRUE. &END MO &END PRINT &XC &XC_FUNCTIONAL None &END XC_FUNCTIONAL &END XC &END DFT """ #NOTE ALTERNATIVE: Define DFT section in a separate file and point to it with the user_input_dft keyword #user_input_dft="dftsection.inp" #CP2KTheory definition with user-defined DFT section qm = CP2KTheory(cp2k_bin_name="cp2k.ssmp",basis_dict=basis_dict,potential_dict=potential_dict, functional='PBE',numcores=numcores, user_input_dft=user_input_dft, periodic=True,cell_dimensions=[10,10,10,90,90,90], psolver='periodic',basis_method='GPW', ngrids=4, cutoff=450, rel_cutoff=50) Only the &DFT section can currently be modified by the user, the rest of the input file is still generated by ASH and cannot be modified by the user (this is partially to make sure user-options won't interfere with ASH's CP2K control). To make sure the user-input DFT section is compatible with ASH's control of the CP2K input one can first run a regular single-point dummy job without *user_input_dft* (the job can be stopped after 2 seconds), copy the ASH-generated CP2K input file (cp2k.inp) to a new file dftsection.inp, delete everything except the &DFT section (i.e. delete &GLOBAL, &FORCE_EVAL, &SUBSYS sections). The file should begin with "&DFT" and end with "&END DFT" and the indentation can be kept as is. This makes sure that all the necessary settings for the DFT section are included (e.g. charge and multiplicity) and compatible with ASH's control of the CP2K input file. This file can then be modified as desired (SCF convergence settings, grids, alternative methods etc.) and will be read by ASH as long as the *user_input_dft* keyword is set to point to this file. ################################################################################ QM/MM ################################################################################ QM/MM calculations are possible in the ASH interface to CP2K. Unlike most other QM-codes, however, regular electrostatic embedding is not available for DFT-methods in CP2K so instead we use the GEEP (Gaussian Expansion of Electrostatic Potential) approach available in CP2K. This approach expands the MM pointcharges as Gaussians. The GEEP approach is overall an improvement over traditional electrostatic embedding as it should prevent charge-leakage onto MM atoms (electron spill-out effect) which can be a larger problem for planewave-basis calculations. The GEEP approach, however, requires definition of radii on the MM-atoms which control the width of the Gaussians used to expand the MM pointcharges. To use CP2K as QM-code in an ASH QM/MM calculation one needs be aware of a few things: - The cell size must be specified (either *cell_dimension* or *cell_vectors*) but counterintuitively it needs to be specified for the whole system (QM+MM) and not just the QM-part as CP2K needs this information. - Additionally the QM-cell size should be specified (where the electrons and basis sets are) and this should be a box encompassing the whole QM-region (slightly larger). The *qm_cell_dims* keyword can be used to specify this or alternatively ASH can also estimate the QM-cell size based on the QM-region size and the *qm_cell_shift_par* extension parameter (default 6). If the Poisson solver is wavelet, the QM-cell needs to be cubic (automatically done if the QM-cell size is estimated from the QM-region). - A QM/MM job with CP2K in ASH can either be periodic or non-periodic. For non-periodic calculations it is recommended to use the wavelet Poisson solver. - For periodic QM/MM calculations, one should typically set: *periodic=True* and *psolver='PERIODIC'* One then should specify the QM/MM electrostatic coupling. For DFT only the Gaussian-based GEEP approach is available (*coupling='GAUSSIAN'*) while *coupling='COULOMB'* is available for semi-empirical systems. GEEP can only be used with the wavelet or periodic Poisson solver (not 'MT'). The number of Gaussians used to expand each MM-center is controlled by the *GEEP_num_gauss keyword* (default=6). The width of the Gaussians depends on the defined MM-radius for each MM site which should vary according to the element. Element information of the MM-region is automatically passed onto CP2K and default MM-radii will be used: .. code-block:: python #Element radii in Angstrom (will be converted to Bohrs by CP2K) element_radii_for_cp2k = {'H':0.44,'He':0.44,'Li':0.6,'Be':0.6,'B':0.78,'C':0.78,'N':0.78,'O':0.78,'F':0.78,'Ne':0.78, 'Na':1.58,'Mg':1.58,'Al':1.67,'Si':1.67,'P':1.67,'S':1.67,'Cl':1.67,'Ar':1.67, 'K':1.52,'Ca':1.6,'Sc':1.6,'Ti':1.6,'V':1.6,'Cr':1.6,'Mn':1.6,'Fe':1.6,'Co':1.6, 'Ni':1.6,'Cu':1.6,'Zn':1.6,'Br':1.6, 'Mo':1.7} If the user wants to use different MM radii for each element, then this can be specified with the *mm_radii* keyword which should point to a dictionary containing radii for each element present in the system. It is also possible to use the *MM_radius_scaling* keyword to scale the radii by a factor (default=1.0). ASH handles the creation of linkatoms and charge-shifting at a QM-MM boundary and this information is provided to CP2K as a modified XYZ-file. It is unclear whether the automatic dipole-correction (addition of charges to maintain dipole), commonly employed in charge-shifted electrostatic embedding QM/MM is useful when combined with the GEEP procedure of CP2K. It is thus possible to turn it off with the *dipole_correction* keyword in the QMMMTheory object. ################################################################################ Examples ################################################################################ In the examples below it is assumed that the CP2K binaries are already in PATH (no need to use cp2kdir) **Minimal non-periodic geometry optimization of MeOH in vacuum:** .. code-block:: python from ash import * numcores=2 frag = Fragment(xyzfile="MeOH.xyz",charge=0, mult=1) #Basis set and pseudopotential information per element basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'} #Minimal CP2KTheory definition: no periodicity, psolver=wavelet by default, basis_method='GAPW', cutoff=250,rel_cutoff=60 #Cell dimensions are estimated from molecule size qm = CP2KTheory(cp2k_bin_name="cp2k.ssmp",basis_dict=basis_dict,potential_dict=potential_dict,functional='PBE',numcores=numcores) #Geometry optimization Optimizer(theory=qm, fragment=frag) **Periodic geometry optimization of MeOH in vacuum:** .. code-block:: python from ash import * numcores=2 frag = Fragment(xyzfile="MeOH.xyz",charge=0, mult=1) #Basis set and pseudopotential information per element basis_dict={'C':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','O':'GTH-PBE-q6','H':'GTH-PBE-q1'} #Periodic CP2KTheory definition with specified cell dimensions qm = CP2KTheory(cp2k_bin_name="cp2k.ssmp",basis_dict=basis_dict,potential_dict=potential_dict,functional='PBE',numcores=numcores, periodic=True,cell_dimensions=[10,10,10,90,90,90], psolver='periodic',basis_method='GPW', ngrids=4, cutoff=450, rel_cutoff=50) #Geometry optimization Optimizer(theory=qm, fragment=frag) **Non-periodic QM/MM geometry optimization and frequencies of a protein** Here using the simple solvated lysozyme protein as a test system with a threonine sidechain in the QM-region. .. code-block:: python from ash import * numcores=4 #Defining path to dir containing forcefield files and coordinates forcefielddir="./" psffile=forcefielddir+"step3_pbcsetup.psf" topfile=forcefielddir+"top_all36_prot.rtf" prmfile=forcefielddir+"par_all36_prot.prm" xyzfile=forcefielddir+"coordinates.xyz" #Read coordinates from XYZ-file frag = Fragment(xyzfile=xyzfile) #Creating OpenMM object from CHARMM-files openmmobject = OpenMMTheory(psffile=psffile, CHARMMfiles=True, charmmtopfile=topfile, charmmprmfile=prmfile, periodic=True, periodic_cell_dimensions=[80.0, 80.0, 80.0, 90.0, 90.0, 90.0], autoconstraints=None, rigidwater=False) #CP2KTheory object basis_dict={'C':'DZVP-MOLOPT-SR-GTH','N':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','N':'GTH-PBE-q5', 'O':'GTH-PBE-q6','H':'GTH-PBE-q1'} functional='PBE' #cell_dimensions are for full system (slight expansion was necessary #QM-cell dimensions here defined manually qm = CP2KTheory(basis_dict=basis_dict,potential_dict=potential_dict,functional=functional, psolver='wavelet', coupling='GAUSS', periodic=False, cell_dimensions=[82.0, 82.0, 82.0, 90.0, 90.0, 90.0], qm_cell_dims=[12.0,12.0,12.0], numcores=numcores) #act and qmatoms lists. Defines QM-region (atoms described by QM) and Active-region (atoms allowed to move) #IMPORTANT: atom indices begin at 0. #Here selecting the side-chain of a threonine residue qmatoms = [569,570,571,572,573,574,575,576] actatoms = qmatoms # Create QM/MM OBJECT by combining QM and MM objects above. Dipole-correction turned off. qmmmobject = QMMMTheory(qm_theory=qm, mm_theory=openmmobject, printlevel=2, dipole_correction=False, fragment=frag, embedding="Elstat", qmatoms=qmatoms, qm_charge=0, qm_mult=1) #Run geometry optimization using geomeTRIC optimizer and HDLC coordinates. Using QM-region as active-region. Optimizer(theory=qmmmobject, fragment=frag, ActiveRegion=True, actatoms=actatoms, maxiter=200, coordsystem='hdlc', charge=0,mult=1) #Partial numerical Hessian calculation NumFreq(theory=qmmmobject, fragment=frag, hessatoms=actatoms) **Periodic QM/MM MD simulation of a protein** .. code-block:: python from ash import * numcores=2 #Defining path to dir containing forcefield files and coordinates forcefielddir="./" psffile=forcefielddir+"step3_pbcsetup.psf" topfile=forcefielddir+"top_all36_prot.rtf" prmfile=forcefielddir+"par_all36_prot.prm" xyzfile=forcefielddir+"coordinates.xyz" #Read coordinates from XYZ-file frag = Fragment(xyzfile=xyzfile) #Creating OpenMM object openmmobject = OpenMMTheory(psffile=psffile, CHARMMfiles=True, charmmtopfile=topfile, charmmprmfile=prmfile, periodic=True, periodic_cell_dimensions=[80.0, 80.0, 80.0, 90.0, 90.0, 90.0], do_energy_decomposition=True, autoconstraints=None, rigidwater=False) #QM basis_dict={'C':'DZVP-MOLOPT-SR-GTH','N':'DZVP-MOLOPT-SR-GTH','O':'DZVP-MOLOPT-SR-GTH','H':'DZVP-MOLOPT-SR-GTH'} potential_dict={'C':'GTH-PBE-q4','N':'GTH-PBE-q5', 'O':'GTH-PBE-q6','H':'GTH-PBE-q1'} functional='PBE' qm = CP2KTheory(basis_dict=basis_dict,potential_dict=potential_dict,functional=functional, psolver='periodic', coupling='GAUSS', periodic=True, cell_dimensions=[82.0, 82.0, 82.0, 90.0, 90.0, 90.0], qm_cell_dims=[12.0,12.0,12.0], numcores=numcores) #act and qmatoms lists. Defines QM-region (atoms described by QM) and Active-region (atoms allowed to move) #IMPORTANT: atom indices begin at 0. #Here selecting the side-chain of threonine qmatoms = [569,570,571,572,573,574,575,576] actatoms = qmatoms # Create QM/MM OBJECT by combining QM and MM objects above qmmmobject = QMMMTheory(qm_theory=qm, mm_theory=openmmobject, printlevel=2, dipole_correction=False, fragment=frag, embedding="Elstat", qmatoms=qmatoms, qm_charge=0, qm_mult=1) #MD simulation with 0.5 fs timestep for 2 ps. enforcePeriodicBox=False to prevent OpenMM PBC wrapping OpenMM_MD(fragment=frag, theory=qmmmobject, timestep=0.0005, simulation_time=2, traj_frequency=1, temperature=300, integrator='LangevinIntegrator', coupling_frequency=1, enforcePeriodicBox=False)