Skip to content
Snippets Groups Projects
Commit f931a9df authored by James Graham's avatar James Graham
Browse files

Mostly complete tutorial - needs rgyr section

parent 2656bfb5
No related branches found
No related tags found
No related merge requests found
......@@ -2,7 +2,8 @@ PyCGTOOL Tutorial
=================
This tutorial follows the complete process of parametrising a new molecule within the MARTINI forcefield, covering aspects of mapping design, model generation and model validation.
PyCGTOOL is used at multiple stages, showing its use in several different situations.
PyCGTOOL is used at multiple stages, showing its use in several different situations
All files required to follow this tutorial are present in the ``doc/tutorial_files`` directory.
The molecule chosen as a target for this parametrisation is the :math:`\beta_1` antagonist atenolol.
......@@ -10,12 +11,12 @@ The molecule chosen as a target for this parametrisation is the :math:`\beta_1`
Atomistic Simulation
--------------------
The reference simulation for the parametrisation of atenolol was performed using the GROMOS 54A7 united atom forcefield with a topology from the `ATB database <https://atb.uq.edu.au/molecule.py?molid=23433>`_.
A single molecule of atenolol was solvated and equilibrated, before collecting a 50 ns trajectory.
Currently, PyCGTOOL is limited to trajectories in the GROMACS XTC format, with a single frame GRO for residue information.
A single molecule of atenolol was solvated and equilibrated, before collecting a 50 ns trajectory using the GROMACS molecular dynamics simulator.
A reduced copy of this trajectory is provided in the tutorial files since the original is prohibitively large.
Mapping Design
--------------
Designing a suitable mapping from the atomistic to the coarse-grained representation requires some experience and a degree of `chemical intuition`, but the ease with which the mapping may be modified using PyCGTOOL allows the mapping to be iterated much more quickly.
Designing a suitable mapping from the atomistic to the coarse-grained representation requires some experience and a degree of `chemical intuition`, but the ease with which the mapping may be modified using PyCGTOOL allows the mapping to be iterated much more quickly than if the construction of the CG model were to be performed manually.
The process of designing a mapping involves splitting the molecule into fragments, each of which contains approximately four heavy atoms.
Start by finding functional groups such as amides or carboxylates, each of which may become a single bead.
......@@ -24,18 +25,110 @@ Finally, divide any remaining parts of the molecule into beads of four heavy ato
The ideal bead will contain four heavy atoms and be nearly spherical, but this is not always possible.
If any atoms remain after clusters of four have been allocated, it may be required to use a mapping of three-to-one for some beads.
After the atoms have been allocated to beads, determine which beads should be bonded by copying the bonds between their component atoms.
After the atoms have been allocated to beads, determine which beads should be bonded by inspection of the atomistic structure; bonds between functional groups in the atomistic structure become bonds between beads in the CG structure.
It will probably be the case that there is no obviously best mapping and bond topology, which is not at this point a major issue as multiple mappings can be assessed easily.
Once the mapping and bond topology have been determined, they must be put into a format readable by PyCGTOOL.
This format is as described in the introduction to PyCGTOOL.
The mapping file must contain a molecule header which is the residue name enclosed in square brackets.
Each subsequent line represents a single CG bead and is of the structure (where items in square brackets are optional)::
<bead name> <MARTINI type> [<bead charge>] <component atom 1> [<component atom 2> ...]
The bonding file has a similar structure with a molecule header followed by lines defining each bond in the format::
<bead 1> <bead 2>
It is not necessary to define angles since PyCGTOOL is able to infer these from the bond network.
If only a subset of the possible angles require parameters then they may be listed, preventing automatic enumeration of all possible angles.
The mapping used in the paper to describe atenolol (just one of the several possible mappings) is described by the following input files.
``atenolol.map``::
[36KB]
N1 P1 N1 C12 C13 C14
O1 P1 O3 C1 C2 C3
O2 SP3 O2 C4
C1 SC3 C5 C6
C2 SC3 C8 C9
C3 SC3 C7 C10
N2 P5 C11 O1 N2
``atenolol.bnd``::
[36KB]
N1 O1
O1 O2
O2 C1
O2 C2
C1 C2
C1 C3
C2 C3
C3 N2
Model Generation
----------------
The process of model generation after having created the mapping and bond definition files is automated by PyCGTOOL.
In the simplest case, a parameter set may be generated simply by passing the four input files to PyCGTOOL::
pycgtool.py -g atenolol.gro -x atenolol.xtc -m atenolol.map -b atenolol.bnd
This will create two output files ``out.gro``, the mapped CG coordinates, and ``out.itp``, the calculated CG model parameters.
Running the CG Simulation
-------------------------
Note: These instructions assume the use of GROMACS 5.0 or newer.
From this stage the usual preparation of MARTINI/GROMACS simulations applies.
The output coordinates ``out.gro`` must be solvated using the GROMACS tool `gmx solvate` with the options::
gmx solvate -cp out.gro -cs ../../data/water.gro -o solv.gro -radius 0.21
Since MARTINI water cannot be automatically added to the `.top` file, this must be done manually.
Add the line "W 251" to the bottom of the `.top` file, since 251 should be the number of water molecules added by `gmx solvate`.
The three stages of simulation: minimisation, equilibration, and production may then be run::
gmx grompp -f em.mdp -c solv.gro -p topol.top -o em.tpr
gmx mdrun -deffnm em
gmx grompp -f npt.mdp -c em.gro -p topol.top -o npt.tpr
gmx mdrun -deffnm npt -v
gmx grompp -f md.mdp -c em.gro -p topol.top -o md.tpr
gmx mdrun -deffnm md -v
These simulations should take a few minutes on a modern desktop.
Model Validation
----------------
It is recommended to perform validation before using a generated CG model for production simulations, so that we may have confidence in its ability to replicate the behaviour of the molecule being studied.
The methods of validation applied in the PyCGTOOL paper are a comparison of the distribution of bonded term measurements between the CG test simulation and the atomistic reference simulation, and a comparison of the radius of gyration between these two simulations.
Additionally, other methods of validation should be applied relevant to the class of molecule being studied; for instance, validation of membrane lipids should compare the membrane thickness and surface area per lipid to the atomistic reference.o
To compare the distribution of bonded terms, we must first rerun PyCGTOOL to generate samples of the bonded measurements.
For the atomistic reference simulation, this can be done by running::
pycgtool.py -g atenolol.gro -x atenolol.xtc -m atenolol.map -b atenolol.bnd --advanced
In the menu, set the advanced option `dump_measurements` to `True` by selecting it with the arrow keys and toggling with the enter key.
Once this option has been set, continue by pressing the q key.
PyCGTOOL will now output a sample of each measured bond length and angle (but since the reference trajectory is short, the target sample size is not met and all values are collected), in the files ``36KB_length.dat`` and ``36KB_angle.dat``.
Since we will be collecting samples of the same measurements from the CG simulation, these files should be renamed to, for instance, `ref_length.dat` and `ref_angle.dat`.
Collect the same samples for the CG simulation using::
pycgtool.py -g md.gro -x md.xtc -b atenolol.bnd
Since we provide a bond file, but not a mapping file, PyCGTOOL will know that this is intended to simply collect bond measurements and will automatically set the `dump_measurements` option to `True`.
Again, the files created will be called ``36KB_length.dat`` and ``36KB_angle.dat``.
These samples were compared in the paper using an R script to generate a series of boxplots, but a simpler Python script is provided which may be used to compare the mean and standard deviations of the samples::
average_columns.py ref_length.dat 36KB_length.dat
average_columns.py ref_angle.dat 36KB_angle.dat
If the automatically generated parameters provide an accurate representation of the reference structure, the percentage error between the two samples will be small.
Validation of the more general molecular conformation may be performed by comparison of the radius of gyration of the reference and CG models.
all: pycgtool cgprepare em npt md
pycgtool:
../../pycgtool.py -g ref.gro -x ref.xtc -m atenolol.map -b atenolol.bnd
cgprepare:
gmx solvate -cp out.gro -cs ../../data/water.gro -o solv.gro -radius 0.21
cp template.top topol.top
echo "W 251" >> topol.top
em:
gmx grompp -f em.mdp -c solv.gro -p topol.top -o em.tpr
gmx mdrun -deffnm em
npt:
gmx grompp -f npt.mdp -c em.gro -p topol.top -o npt.tpr
gmx mdrun -deffnm npt -v
md:
gmx grompp -f md.mdp -c npt.gro -p topol.top -o md.tpr
gmx mdrun -deffnm md -v
rgyr:
gmx gyrate -f ref.xtc -s ref.tpr -o ref-gyr.xvg
gmx gyrate -f md.xtc -s md.tpr -o cg-gyr.xvg
clean:
git clean -i
[36KB]
N1 O1
O1 O2
O2 C1
O2 C2
C1 C2
C1 C3
C2 C3
C3 N2
[36KB]
N1 P1 N1 C12 C13 C14
O1 P1 O3 C1 C2 C3
O2 SP3 O2 C4
C1 SC2 C5 C6
C2 SC2 C8 C9
C3 SC3 C7 C10
N2 P5 C11 O1 N2
#!/usr/bin/env python3
"""
Calculate mean and standard deviation of column data using Welford's one pass algorithm
"""
import sys
import numpy as np
np.set_printoptions(formatter={"float": lambda x: "{0:10.3f}".format(x)})
def average_columns(filename):
with open(filename) as f:
mean, sdev = None, None
nlines = 0
for line in f:
if not line.strip():
continue
try:
cols = np.array(list(map(float, line.split())))
except ValueError:
continue
if mean is None:
mean = np.zeros(len(cols))
sdev = np.zeros(len(cols))
nlines += 1
delta = cols - mean
mean += delta / nlines
sdev += delta * (cols - mean)
sdev = np.sqrt(sdev / (nlines - 1))
print("File: {0}".format(filename))
print("Mean: {0}".format(mean))
print("Sdev: {0}".format(sdev))
print()
return (mean, sdev)
def compare_files(filename1, filename2):
means = (average_columns(filename1)[0], average_columns(filename2)[0])
print("Diff: {0}".format(means[1] - means[0]))
print("%Dif: {0}".format(100. * (means[1] - means[0]) / means[0]))
if __name__ == "__main__":
if len(sys.argv) == 2:
average_columns(sys.argv[1])
elif len(sys.argv) == 3:
compare_files(sys.argv[1], sys.argv[2])
else:
print("Give one filename to average columns, or two filenames to compare.")
; 7.3.2 Preprocessing
define = ; defines to pass to the preprocessor
; 7.3.3 Run Control
integrator = steep ; steepest descents energy minimization
nsteps = 2000 ; maximum number of steps to integrate
; 7.3.5 Energy Minimization
emtol = 0 ; [kJ/mol/nm] minimization is converged when max force is < emtol
emstep = 0.01 ; [nm] initial step-size
; 7.3.8 Output Control
nstxout = 100 ; [steps] freq to write coordinates to trajectory
nstvout = 100 ; [steps] freq to write velocities to trajectory
nstfout = 100 ; [steps] freq to write forces to trajectory
nstlog = 1 ; [steps] freq to write energies to log file
nstenergy = 1 ; [steps] freq to write energies to energy file
energygrps = System ; group(s) to write to energy file
; 7.3.9 Neighbor Searching
nstlist = 1 ; [steps] freq to update neighbor list
ns_type = grid ; method of updating neighbor list
pbc = xyz ; periodic boundary conditions in all directions
rlist = 0.8 ; [nm] cut-off distance for the short-range neighbor list
cutoff-scheme = verlet
; 7.3.10 Electrostatics
coulombtype = cut-off ; Particle-Mesh Ewald electrostatics
rcoulomb = 0.8 ; [nm] distance for Coulomb cut-off
; 7.3.11 VdW
vdwtype = cut-off ; twin-range cut-off with rlist where rvdw >= rlist
rvdw = 0.8 ; [nm] distance for LJ cut-off
;
; STANDARD MD INPUT OPTIONS FOR MARTINI 2.x
; Updated 15 Jul 2015 by DdJ
;
; for use with GROMACS 5
; For a thorough comparison of different mdp options in combination with the Martini force field, see:
; D.H. de Jong et al., Martini straight: boosting performance using a shorter cutoff and GPUs, submitted.
title = Martini
; TIMESTEP IN MARTINI
; Most simulations are numerically stable with dt=40 fs,
; however better energy conservation is achieved using a
; 20-30 fs timestep.
; Time steps smaller than 20 fs are not required unless specifically stated in the itp file.
integrator = md
dt = 0.02
nsteps = 2500000
nstcomm = 100
comm-grps =
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstenergy = 1000
nstxout-compressed = 1000
compressed-x-precision = 100
compressed-x-grps = System
energygrps = System
imd-group = Other
; NEIGHBOURLIST and MARTINI
; To achieve faster simulations in combination with the Verlet-neighborlist
; scheme, Martini can be simulated with a straight cutoff. In order to
; do so, the cutoff distance is reduced 1.1 nm.
; Neighborlist length should be optimized depending on your hardware setup:
; updating ever 20 steps should be fine for classic systems, while updating
; every 30-40 steps might be better for GPU based systems.
; The Verlet neighborlist scheme will automatically choose a proper neighborlist
; length, based on a energy drift tolerance.
;
; Coulomb interactions can alternatively be treated using a reaction-field,
; giving slightly better properties.
; Please realize that electrostVatic interactions in the Martini model are
; not considered to be very accurate to begin with, especially as the
; screening in the system is set to be uniform across the system with
; a screening constant of 15. When using PME, please make sure your
; system properties are still reasonable.
;
; With the polarizable water model, the relative electrostatic screening
; (epsilon_r) should have a value of 2.5, representative of a low-dielectric
; apolar solvent. The polarizable water itself will perform the explicit screening
; in aqueous environment. In this case, the use of PME is more realistic.
cutoff-scheme = Verlet
nstlist = 20
ns_type = grid
pbc = xyz
verlet-buffer-tolerance = 0.005
coulombtype = reaction-field
rcoulomb = 1.1
epsilon_r = 15 ; 2.5 (with polarizable water)
epsilon_rf = 0
vdw_type = cutoff
vdw-modifier = Potential-shift-verlet
rvdw = 1.1
; MARTINI and TEMPERATURE/PRESSURE
; normal temperature and pressure coupling schemes can be used.
; It is recommended to couple individual groups in your system separately.
; Good temperature control can be achieved with the velocity rescale (V-rescale)
; thermostat using a coupling constant of the order of 1 ps. Even better
; temperature control can be achieved by reducing the temperature coupling
; constant to 0.1 ps, although with such tight coupling (approaching
; the time step) one can no longer speak of a weak-coupling scheme.
; We therefore recommend a coupling time constant of at least 0.5 ps.
; The Berendsen thermostat is less suited since it does not give
; a well described thermodynamic ensemble.
;
; Pressure can be controlled with the Parrinello-Rahman barostat,
; with a coupling constant in the range 4-8 ps and typical compressibility
; in the order of 10e-4 - 10e-5 bar-1. Note that, for equilibration purposes,
; the Berendsen barostat probably gives better results, as the Parrinello-
; Rahman is prone to oscillating behaviour. For bilayer systems the pressure
; coupling should be done semiisotropic.
tcoupl = v-rescale
tc-grps = System
tau_t = 1.0
ref_t = 310
pcoupl = parrinello-rahman
pcoupltype = isotropic
tau_p = 12.0 ;parrinello-rahman is more stable with larger tau-p, DdJ, 20130422
compressibility = 3e-4 3e-4
ref_p = 1.0 1.0
refcoord_scaling = all
gen_vel = no
; MARTINI and CONSTRAINTS
; for ring systems and stiff bonds constraints are defined
; which are best handled using Lincs.
constraints = none
constraint_algorithm = Lincs
;
; STANDARD MD INPUT OPTIONS FOR MARTINI 2.x
; Updated 15 Jul 2015 by DdJ
;
; for use with GROMACS 5
; For a thorough comparison of different mdp options in combination with the Martini force field, see:
; D.H. de Jong et al., Martini straight: boosting performance using a shorter cutoff and GPUs, submitted.
title = Martini
; TIMESTEP IN MARTINI
; Most simulations are numerically stable with dt=40 fs,
; however better energy conservation is achieved using a
; 20-30 fs timestep.
; Time steps smaller than 20 fs are not required unless specifically stated in the itp file.
integrator = md
dt = 0.01
nsteps = 1000000
nstcomm = 100
comm-grps =
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstenergy = 1000
nstxout-compressed = 1000
compressed-x-precision = 100
compressed-x-grps =
energygrps = System
; NEIGHBOURLIST and MARTINI
; To achieve faster simulations in combination with the Verlet-neighborlist
; scheme, Martini can be simulated with a straight cutoff. In order to
; do so, the cutoff distance is reduced 1.1 nm.
; Neighborlist length should be optimized depending on your hardware setup:
; updating ever 20 steps should be fine for classic systems, while updating
; every 30-40 steps might be better for GPU based systems.
; The Verlet neighborlist scheme will automatically choose a proper neighborlist
; length, based on a energy drift tolerance.
;
; Coulomb interactions can alternatively be treated using a reaction-field,
; giving slightly better properties.
; Please realize that electrostVatic interactions in the Martini model are
; not considered to be very accurate to begin with, especially as the
; screening in the system is set to be uniform across the system with
; a screening constant of 15. When using PME, please make sure your
; system properties are still reasonable.
;
; With the polarizable water model, the relative electrostatic screening
; (epsilon_r) should have a value of 2.5, representative of a low-dielectric
; apolar solvent. The polarizable water itself will perform the explicit screening
; in aqueous environment. In this case, the use of PME is more realistic.
cutoff-scheme = Verlet
nstlist = 20
ns_type = grid
pbc = xyz
verlet-buffer-tolerance = 0.005
coulombtype = reaction-field
rcoulomb = 1.1
epsilon_r = 15 ; 2.5 (with polarizable water)
epsilon_rf = 0
vdw_type = cutoff
vdw-modifier = Potential-shift-verlet
rvdw = 1.1
; MARTINI and TEMPERATURE/PRESSURE
; normal temperature and pressure coupling schemes can be used.
; It is recommended to couple individual groups in your system separately.
; Good temperature control can be achieved with the velocity rescale (V-rescale)
; thermostat using a coupling constant of the order of 1 ps. Even better
; temperature control can be achieved by reducing the temperature coupling
; constant to 0.1 ps, although with such tight coupling (approaching
; the time step) one can no longer speak of a weak-coupling scheme.
; We therefore recommend a coupling time constant of at least 0.5 ps.
; The Berendsen thermostat is less suited since it does not give
; a well described thermodynamic ensemble.
;
; Pressure can be controlled with the Parrinello-Rahman barostat,
; with a coupling constant in the range 4-8 ps and typical compressibility
; in the order of 10e-4 - 10e-5 bar-1. Note that, for equilibration purposes,
; the Berendsen barostat probably gives better results, as the Parrinello-
; Rahman is prone to oscillating behaviour. For bilayer systems the pressure
; coupling should be done semiisotropic.
tcoupl = v-rescale
tc-grps = System
tau_t = 1.0
ref_t = 310
pcoupl = berendsen
pcoupltype = isotropic
tau_p = 12.0 ;parrinello-rahman is more stable with larger tau-p, DdJ, 20130422
compressibility = 3e-4
ref_p = 1.0
refcoord_scaling = all
gen_vel = no
; MARTINI and CONSTRAINTS
; for ring systems and stiff bonds constraints are defined
; which are best handled using Lincs.
constraints = none
constraint_algorithm = Lincs
File added
Giant Rising Ordinary Mutants for A Clerical Setup in water
29
136KB HN2 1 1.183 2.091 3.336 0.4004 3.0041 0.1665
136KB N2 2 1.086 2.097 3.307 0.3140 0.3540 -0.3132
136KB HN1 3 1.030 2.159 3.362 2.2814 1.9625 0.0126
136KB C11 4 1.044 2.011 3.212 -0.3294 -0.2374 0.4940
136KB O1 5 1.123 1.933 3.159 0.1558 0.0691 0.7519
136KB C10 6 0.893 2.015 3.195 -0.2902 0.4953 0.2717
136KB C7 7 0.816 1.942 3.085 0.1506 0.2969 0.0944
136KB C8 8 0.842 1.807 3.057 0.1665 0.4792 -0.8131
136KB H8 9 0.894 1.751 3.134 0.6246 1.2449 -0.5555
136KB C6 10 0.715 2.005 3.012 -0.0948 -0.2399 -0.0337
136KB H6 11 0.689 2.108 3.036 -0.6736 -0.5463 0.6814
136KB C5 12 0.636 1.937 2.920 -0.4231 0.1308 -0.0297
136KB H5 13 0.552 1.987 2.872 -0.3641 1.3976 1.1161
136KB C4 14 0.675 1.806 2.887 0.0542 0.3155 -0.2048
136KB O2 15 0.614 1.751 2.776 0.3195 -0.0247 -0.1830
136KB C9 16 0.778 1.739 2.954 -0.1834 -0.0778 -0.2348
136KB H9 17 0.792 1.632 2.941 -1.4326 -0.0191 -2.4834
136KB C1 18 0.652 1.627 2.715 -0.0574 0.0433 -0.5566
136KB C2 19 0.551 1.526 2.661 -0.3248 0.0834 -0.1427
136KB H2 20 0.504 1.575 2.573 2.1871 2.6350 -0.2157
136KB O3 21 0.453 1.486 2.757 -0.1564 0.0997 0.0375
136KB HO 22 0.393 1.562 2.775 -0.7999 -0.4396 0.1706
136KB C3 23 0.627 1.400 2.615 -0.5193 -0.0460 -0.1100
136KB N1 24 0.731 1.419 2.513 -0.6071 -0.2673 -0.2422
136KB HN 25 0.820 1.457 2.546 -2.0855 3.1942 0.2367
136KB C12 26 0.750 1.318 2.407 0.4805 -0.2676 -0.0564
136KB H121 27 0.831 1.247 2.428 0.2918 -1.0021 -1.6561
136KB C14 28 0.776 1.394 2.277 -0.7249 -0.3819 -0.3773
136KB C13 29 0.623 1.237 2.379 0.0035 0.4471 0.0400
3.33938 3.33938 3.33938
File added
; Include forcefield parameters
#include "../../data/martini_v2.2.itp"
#include "../../data/w.itp"
#include "out.itp"
[ system ]
36KB
[ molecules ]
36KB 1
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment