MDPOW documentation¶
MDPOW is a python package that automates the calculation of solvation free energies via molecular dynamics (MD) simulations. In particular, it facilitates the computation of partition coeffcients. Currently implemented:
- water-octanol partition coefficient (POW)
- water-cyclohexane partition coefficient (PCW)
Calculations are performed with the Gromacs MD software package [#GromacsWrapper]_. Currently, OPLS-AA force field parameters are supported.
As input, the user only needs to provide a structure file (PDB or GRO) and a Gromacs ITP file containing the parametrization of the small molecule (e.g. from LigandBook or ParamChem).
Contents:
Quick installation instructions for POW¶
Use pip.
Standard installation¶
Example for installation as a user:
pip install --user MDPOW/
Check that you can import the module:
python
>>> import mdpow
>>> help(mdpow)
In case of problems file an issue at https://github.com/Becksteinlab/MDPOW/issues
Developer installation¶
A development install is useful while hacking away on the code:
cd MDPOW
python setup.py develop --user
mdpow
— Computing the octanol/water partitioning coefficient¶
The mdpow
module helps in setting up and analyzing absolute
free energy calculations of small molecules by molecular dynamics (MD)
simulations. By computing the hydration free energy and the solvation
free energy in octanol one can compute the octanol/water partitioning
coefficient, an important quantity that is used to characterize
drug-like compounds.
The MD simulations are performed with Gromacs 4.x
How to use the module¶
Before you can start you will need
- a coordinate file for the small molecule
- a Gromacs OPLS/AA topology (itp) file
- an installation of Gromacs 4.6.x.
Basic work flow¶
You will typically calculate two solvation free energies (free energy of transfer of the solute from the liquid into the vacuum phase):
- solvent = water
- set up a short equilibrium simulation of the molecule in a water box (and run the MD simulation);
- set up a free energy perturbation calculation of the ligand in water , which will yield the hydration free energy;
- solvent = octanol
- set up a short equilibrium simulation of the molecule in a octanol box (and run the MD simulation);
- set up a free energy perturbation calculation of the ligand in octanol , which will yield the solvation free energy in octanol;
- run these simulations on a cluster;
- analyze the output and combine the free energies to arrive at an estimate of the octanol-water partition coefficient;
- plot results using
mdpow.analysis.plot_exp_vs_comp()
.
Customized submission scripts for queuing systems¶
One can also generate run scripts for various queuing systems; check
the documentation for gromacs.qsub
and in particular the
section on writing queuing system templates . You will have to
add a template script to your private GromacsWrapper template directory (
~/.gromacswrapper/qscripts
); in this example we call itmy_script.sge
;add the keyword qscript to the
mdpow.equil.Simulation.MD()
andmdpow.fep.Gsolv.setup()
invocations; e.g. asqscript = [‘my_script.sge’, ‘local.sh’]
submit the generated queuing system script to your queuing system, e.g.
cd Equilibrium/water qsub my_script.sh
The mdpow scripts¶
Some tasks are simplified by using scripts, which are installed in a bin
directory (or the directory pointed to by python setup.py
--install-scripts
). See The mdpow-* scripts for details and
Tutorial: Using the mdpow scripts to compute log POW of benzene for an example. The scripts essentially execute
the steps shown in Tutorial: Manual session: 1-octanol as a solute so in order to gain a better
understanding of MDpow it is suggested to look at both tutorials.
Tutorial: Using the mdpow scripts to compute log POW of benzene¶
The most straightforward use of MDpow is through the Python scripts described in The mdpow-* scripts. This tutorial shows how to use them to calculate log POW for benzene.
The examples/benzene
directory of the distribution contains a Gromacs
OPLS/AA topology for benzene (benzene.itp
) and a starting structure
(benzene.pdb
) together with a
run input configuration file for the mdpow scripts
(benzene.yml
).

Make a directory
benzene
and collect the files in the directory.Move into the parent directory of
benzene
; in this tutorial all files will be generated underbenzene
but the configuration file has recorded the location of the input files asbenzene/filename
. (For more on to make best use of file names in the configuration file see Advanced: Separating input from output data.)Set up Gromacs (e.g. by sourcing
GMXRC
).Note
Gromacs must be set up so that all Gromacs commands (such as mdrun or grompp can be found on the shell’s
PATH
). If this is not the case then the following will fail. Look at the log file (named mdpow.log) for error messages.)Generate the input files for a equilibrium simulation of benzene in water (TIP4P) and run the simulation:
mdpow-equilibrium --solvent water benzene/benzene.yml
The example file provided will only set up very short simulations and it will also directly call mdrun to run the simulations. This is configured via the
runlocal = True
parameter in the[MD_relaxed]
and[MD_NPT]
sections.In fact, the following steps are carried out:
- generate a system topology (section
[Setup]
) - solvate the compound (section
[Setup]
) - energy minimise the system (section
[Setup]
) - run a short relaxation with a time step of 0.1 fs in order to remove any
steric clashes; this step was found to enable much more robust
simulations, especially when using octanol as a solvent. Section
[MD_relaxed]
controls this step. - run a equilibrium MD simulation at constant pressure and temperature
using a time step of 2 fs. Section
[MD_NPT]
controls this step.
- generate a system topology (section
Generate the input files for a equilibrium simulation of benzene in octanol (OPLS/AA parameters) and run the simulation:
mdpow-equilibrium --solvent octanol benzene/benzene.yml
The steps are analogous to the ones described under 4.
Use the last frame of the equilibrium simulation as a starting point for the free energy perturbation (FEP) calculations. This step is controlled by the
[FEP]
section in the run input configuration file.The example only runs very short windows (
runtime = 25
ps) but it will run all 21 individual simulations sequentially (runlocal = True
). Thus it is recommended to run this example on a fast multi-core workstation and at least Gromacs 4.5.x, which has thread support for mdrun.The FEP windows for benzene in water are generated and run by
mdpow-fep --solvent water benzene/benzene.yml
(In order to run the FEP windows on a cluster see Advanced: Generating queuing system scripts.)
The FEP windows for benzene in octanol are generated and run by
mdpow-fep --solvent octanol benzene/benzene.yml
Analyse the simulation output with mdpow-pow. It collects the raw data from the FEP simulations and computes the free energies of solvation using thermodynamic integration (TI) together with error estimates. log POW is calculated from the difference of the octanol and water solvation free energies:
mdpow-pow benzene
benzene
is the directory name under which the FEP simulations are stored. By default, results are appended to the filesenergies.txt
andpow.txt
.See also
The output formats are explained with examples in Output data file formats.
Advanced: Generating queuing system scripts¶
(To be written)
See also
gromacs.qsub
for the reference on how queuing system script
templates are handled and processed
Advanced: Separating input from output data¶
Make another directory under which simulation data will be stored; in this
tutorial it will be called WORK
and we assume that all directories reside
in ~/Projects/POW
. We will end up with a directory layout under
~/Projects/POW
like this:
benzene/
benzene.itp
benzene.pdb
benzene.yml
WORK/
benzene/
water/
octanol/
Edit benzene.yml
to put in absolute paths to the input
files: In the setup
section use absolute paths to the itp
and pdb files of benzene:
[setup]
name = benzene
molecule = BNZ
structure = ~/Projects/POW/benzene/benzene.pdb
itp = ~/Projects/POW/benzene/benzene.itp
With absolute paths defined, it is easy to generate all other files under
WORK/benzene
(the directory name “benzene” is the name entry from the
setup
section of the configuration file):
cd WORK
mdpow-equilibrium --solvent water ../benzene/benzene.yml
mdpow-equilibrium --solvent octanol ../benzene/benzene.yml
mdpow-fep --solvent water ../benzene/benzene.yml
mdpow-fep --solvent octanol ../benzene/benzene.yml
Finally, calculate log POW:
cd WORK
mdpow-pow benzene
Tutorial: Manual session: 1-octanol as a solute¶
In the following interactive python session we use octanol as an example for a solute; all files are present in the package so one can work through the example immediately.
Before starting python (preferrably ipython) make sure that the
Gromacs 4.6.x tools can be found, e.g. which grompp
should show you the
path to grompp.
Water¶
Equilibrium simulation¶
Make a directory octanol
and copy the octanol.itp
and
octanol.gro
file into it. Launch ipython from this
directory and type:
import mdpow.equil
S = mdpow.equil.WaterSimulation(molecule="OcOH")
S.topology(itp="octanol.itp")
S.solvate(struct="octanol.gro")
S.energy_minimize()
S.MD_relaxed()
# run the simulation in the MD_relaxed/ directory
S.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial
S.save("water.simulation") # save setup for later (analysis stage)
Background (Ctrl-Z) or quit (Ctrl-D) python and run the
simulations in the MD_relaxed
and MD_NPT
subdirectory. You can modify
the local.sh
script to your ends or use qscript to generate queuing system
scripts.
Note
Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.
Hydration free energy¶
Reopen the python session and set up a Ghyd
object:
import mdpow.fep
gwat = mdpow.fep.Ghyd(molecule="OcOH", top="Equilibrium/water/top/system.top", struct="Equilibrium/water/MD_NPT/md.pdb", ndx="Equilibrium/water/solvation/main.ndx", runtime=100)
Alternatively, one can save some typing if we continue the last session and use
the mdpow.equil.Simulation
object (which we can re-load from its saved
state file from disk):
import mdpow.equil
S = mdpow.equil.WaterSimulation(filename="water.simulation") # only needed when quit
gwat = mdpow.fep.Ghyd(simulation=S, runtime=100)
This generates all the input files under FEP/water
.
Note
Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns).
Then set up all input files:
gwat.setup(qscript=['my_script.sge', 'local.sh'])
(The details of the FEP runs can be customized by setting some keywords (such as
lambda_vdw, lamda_coulomb, see mdpow.fep.Gsolv
for details) or by
deriving a new class from the mdpow.fep.Ghyd
base class but this is not
covered in this tutorial.)
Octanol¶
Equilibrium simulation¶
Almost identical to the water case:
O = mdpow.equil.OctanolSimulation(molecule="OcOH")
O.topology(itp="octanol.itp")
O.solvate(struct="octanol.gro")
O.energy_minimize()
O.MD_relaxed()
O.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial
O.save()
Note
Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.
Octanol solvation free energy¶
Almost identical setup as in the water case:
goct = mdpow.fep.Goct(simulation=O, runtime=100)
goct.setup(qscript=['my_script.sge', 'local.sh'])
This generates all the input files under FEP/octanol
.
Note
Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns)
Running the FEP simulations¶
The files are under the FEP/water
and FEP/octanol
directories
in separate sub directories.
Either run job arrays that should have been generated from the
my_script.sge
template
qsub Coul_my_script.sge
qsub VDW_my_script.sge
Or run each job in its own directory. Note that mdrun should be called with at least the following options
mdrun -deffnm $DEFFNM -dgdl
where DEFFNM is typically “md”; see the run local.sh
script in
each direcory for hints on what needs to be done.
Analyze output and log POW calculation¶
For the water and octanol FEPs do
gwat.collect()
gwat.analyze()
goct.collect()
goct.analyze()
The analyze step reports the estimate for the free energy difference.
Calculate the free energy for transferring the solute from water to octanol and octanol-water partition coefficient log P_OW
mdpow.fep.pOW(gwat, goct)
(see mdpow.fep.pOW()
for details and definitions).
All individual results can also accessed as a dictionary
gwat.results.DeltaA
Free energy of transfer from water to octanol:
goct.results.DeltaA.Gibbs - gwat.results.DeltaA.Gibbs
The individual components are
- Gibbs
total free energy difference of transfer from solvent to vacuum at the Ben-Naim standard state (i.e. 1M solution/1M gas phase) in kJ/mol;
DeltaG0 = (G_solv - G_vac)We calculate the Gibbs free energy (at constant pressure P) by using the NPT ensemble for all MD simulations.
- coulomb
- contribution of the de-charging process to DeltaG
- vdw
- contribution of the de-coupling process to DeltaG
To plot the data (de-charging and de-coupling):
import pylab
gwat.plot()
pylab.figure()
goct.plot()
For comparison to experimental values see mdpow.analysis
.
Error analysis¶
The data points are the (time) average <G> of G = dV/dl over each
window. The error bars s_G are the error of the mean <G>. They are computed
from the auto-correlation time of the fluctuations and the standard deviation
(see Frenkel and Smit, p526 and numkit.timeseries.tcorrel()
):
s_G**2 = 2*tc*acf(0)/T
where tc is the decay time of the ACF of <(G-<G>)**2> (assumed to follow f(t) = exp(-t/tc) and hence calculated from the integral of the ACF to its first root); T is the total runtime.
Errors on the energies are calculated via the propagation of the errors s_G
through the thermodynamic integration and the subsequent thermodynamic sums
(see numkit.integration.simps_error()
numkit.observables.QuantityWithError
for details).
- If the graphs do not look smooth or the errors are large then a longer runtime is definitely required. It might also be necessary to add additional lambda values in regions where the function changes rapidly.
- The errors on the Coulomb and VDW free energies should be of similar magnitude because there is no point in being very accurate in one if the other is inaccurate.
- For water the “canonical” lambda schedule produces errors <0.5 kJ/mol (or sometimes much better) in the Coulomb and VDW free energy components.
- For octanol the errors on the coulomb dis-charging free energy can become large (up to 4 kJ/mol) and thus completely swamp the final estimate. Additional lambdas 0.125 and 0.375 should improve the precision of the calculations.
The mdpow-*
scripts¶
A number of python scripts are installed together with the
mdpow
package. They simplify some common tasks (especially at
the analysis stage) but they make some assumptions about directory
layout and filenames. If one uses defaults for all directory and
filename options then it should “just work”.
In particular, a directory hierarchy such as the following is assumed:
moleculename/
water.simulation
octanol.simulation
Equilibrium/
water/
octanol/
FEP/
water/
Ghyd.fep
Coulomb/
VDW/
octanol/
Goct.fep
Coulomb/
VDW/
moleculename is, for instance, “benzene” or “amantadine”; in the run
input file (see Equilibrium simulations) is the value of the
variable name
in the [setup]
section.
Equilibrium simulations¶
The mdpow-equilibrium script
- sets up equilibrium MD simulations for the solvents water or octanol
- runs energy minimization, MD_relaxed, and MD_NPT protocols; the user can choose if she wants to launch mdrun herself (e.g. on a cluster) or let the script do it locally on the workstation
The script runs essentially the same steps as described in the tutorial Tutorial: Manual session: 1-octanol as a solute but it gathers all required parameters from a run input file and it allows one to stop and continue and the protocol transparently.
It requires as at least Gromacs 4.0.5 ready to run (check that all commands can be found). The required input is
- a run configuration file;
- a structure file (PDB or GRO) for the compound
- a Gromacs ITP file for the compound (OPLS/AA force field)
A template RUNFILE can be generated with
mdpow-equilibrium --get-template
.
For an example of a RUNFILE see benzene_runinput.cfg
. It is recommended to use absolute paths
to file names. The run input file uses ConfigParser
, which describes the
syntax of the file.
The script keeps track of the stages of the simulation protocol and allows the user to restart from the last completed stage. For instance, one can use the script to set up a simulation, then run the simulation on a cluster, transfer back the generated files, and start mdpow-equilibrium again with the exact same input to finish the protocol. Since Gromacs 4.5 it is also possible to interrupt a running mdrun process (e.g. with Control-c) and then resume the simulation at the last saved trajectory checkpoint by running mdpow-equilibrium again.
If in doubt, just try running mdpow-equilibrium running again and let it figure out the best course of action. Look at the log file to see what has been done. Lines reading “Fast forwarding: stage” indicate that results from stage are available.
Usage of the command:
mdpow-equilibrium [options] RUNFILE
Set up (and possibly run) the equilibration equilibrium simulation for one compound and one solvent. All parameters except the solvent are specified in the RUNFILE.
Options:
-h
,
--help
¶
show this help message and exit
--get-template
¶
generate a template RUNFILE and exit
-S
<NAME>
,
--solvent
=<NAME>
¶solvent
<NAME>
for compound, can be ‘water’ or ‘octanol’ [water]
-d
<DIRECTORY>
,
--dirname
=<DIRECTORY>
¶generate files and directories in
<DIRECTORY>
, which is created if it does not already exist. The default is to use the molecule name from the run input file.
FEP simulations¶
The mdpow-fep script sets up (and can also run) the free energy perturbation calculations for one compound and one solvent. It uses the results from mdpow-equilibrium together with the run input file.
You will require:
- at least Gromacs 4.0.5 ready to run (check that all commands can be found)
- the results from a previous complete run of mdpow-equilibrium
Usage of the command:
mdpow-fep [options] RUNFILE
Options:
-h
,
--help
¶
show this help message and exit
--get-template
¶
generate a template RUNFILE and exit
-S
<NAME>
,
--solvent
=<NAME>
¶solvent
<NAME>
for compound, can be ‘water’ or ‘octanol’ [water]
-d
<DIRECTORY>
,
--dirname
=<DIRECTORY>
¶generate files and directories in
<DIRECTORY>
, which is created if it does not already exist. The default is to use the molecule name from the run input file.
Running analysis¶
The mdpow-pow script
- collects data from FEP simulations
- calculates desolvation free energies for octanol –> vacuum and water –> vacuum via thermodynamic integration (TI)
- calculates transfer free energy water –> octanol
- calculates log POW
- plots dV/dlambda graphs
- appends results to
pow.txt
andenergies.txt
(when the default names are chosen), see Output data file formats.
Usage of the command:
mdpow-pow [options] DIRECTORY [DIRECTORY …]
Run the free energy analysis for water and octanol in DIRECTORY/FEP and return the octanol-water partition coefficient log POW.
DIRECTORY should contain all the files resulting from running
mdpow.fep.Goct.setup()
andmdpow.fep.Goct.setup()
and the results of the MD FEP simulations. It relies on the canonical naming scheme (basically: just use the defaults as in the tutorial).The dV/dlambda plots can be produced automatically (
mdpow-pow -p
auto). If multiple DIRECTORY arguments are provided then one has to choose the auto option (orNone
).Options:
-h
,
--help
¶
show this help message and exit
-p
<FILE>
,
--plotfile
=<FILE>
¶plot dV/dlambda to
FILE
; use png or pdf suffix to determine the file type. ‘auto’ generates a pdf fileDIRECTORY/dVdl.pdf
and ‘None’ disables it [auto]
-o
<FILE>
,
--outfile
=<FILE>
¶append one-line results summary to
<FILE>
[pow.txt
]
-e
<FILE>
,
--energies
=<FILE>
¶append solvation free energies to
<FILE>
[energies.txt
]
--force
¶
force rereading all data [False]
--ignore-corrupted
¶
skip lines in the md.xvg files that cannot be parsed, perhaps because to an intermittent file system problem. In order to salvage the existing data, this option is provided for diagnostic purposes. [False]
Warning
Only use this option with care; skipped lines can indicate that other parts of the file have been corrupted but still pass the line corruption test. USE AT YOUR OWN RISK.
It is recommended to simply rerun the affected simulation(s).
Output data file formats¶
Results are appended to data files.
Note
Energies are always output in kJ/mol.
POW summary file¶
The pow.txt
output file summarises the results from the water and
octanol solvation calculations. Its name set with the option mdpow-pow -o
.
It contains fixed column data in the following order and all energies are
in kJ/mol. See the Table of computed water-octanol transfer energies
and logPow as an example.
itp_name
molecule identifier from the itp file
DeltaG0
transfer free energy from water to octanol (difference between DeltaG0 for octanol and water), in kJ/mol; >0: partitions into water, <0: partitions into octanol,
errDG0
error on DeltaG0; errors are calculated through propagation of errors from the errors on the means <dV/dlambda>
logPOW
log POW, base-10 logarithm of the octanol-water partition coefficient; >0: partitions into octanol, <0: partitions preferrably into water
errlogP
error on logPow
directory
directory under which all data files are stored; by default this is the name of the molecule and hence it can be used to identify the compound.
itp_name | DeltaG0 | errDG0 | logPow | errlogP | directory |
---|---|---|---|---|---|
BNZ | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
OC9 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
URE | +4.66 | 1.13 | -0.81 | 0.20 | urea |
902 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
Energy file¶
The energy.txt
output file collects all computed energy terms together
with the results also found in the summary file pow.txt
. Its name is
set with the option mdpow-pow -e
. It contains fixed column data in
the following order and all energies are in kJ/mol. As an example see
Table of Solvation Energies for the same
compounds as above.
molecule
molecule identifier from the itp file
solvent
solvent name (water, octanol)
DeltaG0
solvation free energy difference in kJ/mol (Ben-Naim standard state, i.e. 1M gas/1M solution)
errDG0
error on DeltaG0, calculated through propagation of errors from the errors on the mean <dV/dlambda>
coulomb
Coulomb (discharging) contribution to the solvation free energy DeltaG0
errCoul
error on coulomb
VDW
Van der Waals/Lennard-Jones (decoupling) contribution to DeltaG0
errVDW
error on VDW
Vdp
correction because the FEP are run at constant volume but the desired solvation free energy is the Gibbs energy (currently neglected and set to 0)
errVdp
error on Vdp
w2oct
transfer free energy from water to octanol (difference between DeltaG0 for octanol and water)
errw2oct
error on w2oct
logPOW
log POW
errlogP
error on logPow
directory
directory under which all data files are stored; by default this is the name of the molecule and hence it can be used to identify the compound.
molecule | solvent | DeltaG0 | errDG0 | coulomb | errCoul | VDW | errVDW | Vdp | errVdp | w2oct | errw2oct | logPOW | errlogP | directory |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BNZ | water | -2.97 | 0.21 | +7.65 | 0.07 | -4.68 | 0.20 | +0.00 | 0.00 | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
BNZ | octanol | -15.84 | 0.37 | +1.40 | 0.19 | +14.44 | 0.32 | +0.00 | 0.00 | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
OC9 | water | -16.03 | 0.32 | +27.35 | 0.09 | -11.32 | 0.31 | +0.00 | 0.00 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
OC9 | octanol | -32.28 | 1.08 | +11.32 | 0.92 | +20.96 | 0.56 | +0.00 | 0.00 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
URE | water | -53.52 | 0.17 | +56.94 | 0.11 | -3.41 | 0.14 | +0.00 | 0.00 | +4.66 | 1.13 | -0.81 | 0.20 | urea |
URE | octanol | -48.86 | 1.12 | +35.75 | 1.09 | +13.11 | 0.25 | +0.00 | 0.00 | +4.66 | 1.13 | -0.81 | 0.20 | urea |
902 | water | -25.46 | 0.11 | +34.93 | 0.10 | -9.48 | 0.06 | +0.00 | 0.00 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
902 | octanol | -16.11 | 1.05 | +21.16 | 1.05 | -5.05 | 0.09 | +0.00 | 0.00 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
House-keeping scripts¶
A number of scripts are provided to complete simple tasks; they can be used to check that all required files are present or they can help in fixing small problems without one having to write Python code to do this oneself (e.g. by manipulating the checlpoint files). They generally make the same assumptions about file system layout as the other mdpow scripts.
Checking if the simulation is complete¶
Run mdpow-check in order to check if all necessary files are available.
Usage of the command:
mdpow-check [options] DIRECTORY [DIRECTORY …]
Check status of the progress of the project in DIRECTORY.
Output is only written to the status file (
status.txt
). A quick way to find problems is to do acat status.txt | gawk -F '|' '$2 !~ /OK/ {print $0}'Options:
-h
,
--help
¶
show this help message and exit
-o
<FILE>
,
--outfile
=<FILE>
¶write status results to
FILE
[status.txt
]
Changing paths in water.simulation
and octanol.simulation
¶
It can become necessary to recreate the solvent.simulation
status/checkpoint files in order to change paths, e.g. when one moved
the directories or transferred all files to a different file system.
Typically, one would execute the mdpow-rebuild-simulation command in the parent directory of moleculename.
Usage of the command:
Re-building Ghyd.fep
and Goct.fep
¶
It can become necessary to recreate the name.fep
status/checkpoint
files (e.g. if the files became corrupted due to a software error or in order
to change paths).
Typically, one would execute the mdpow-rebuild-fep command in the parent directory of moleculename.
Usage of the command:
mdpow-rebuild-fep [options] DIRECTORY [DIRECTORY …]
Re-create the
Goct.fep
orGhyd.fep
file using the appropriate equilibrium simulation file under DIRECTORY/FEP.This should only be necessary when the fep file was destroyed due to a software error or when the files are transferred to a different file system and some of the paths stored in the
name.fep
files have to be changed.Options:
-h
,
--help
¶
show this help message and exit
--solvent
=<NAME>
¶rebuild fep for ‘water’, ‘octanol’, or ‘all’ [all]
--setup
=<LIST>
¶Re-generate queuing system scripts with appropriate paths: runs
fep.Gsolv.setup()
with argument qscript=[LIST] after fixing Gsolv.
LIST
should contain a comma-separated list of queing system templates. For example:'icsn_8pd.sge,icsn_2pd.sge,local.sh'
. It is an error if--setup
is set without aLIST
.
mdpow.equil
— Setting up and running equilibrium MD¶
The mdpow.equil
module facilitates the setup of equilibrium
molecular dynamics simulations of a compound molecule in a simulation
box of water or other solvent such as octanol.
It requires as input
- the itp file for the compound
- a coordinate (structure) file (in pdb or gro format)
By default it uses the OPLS/AA forcefield and the TIP4P water model.
Warning
Other forcefields than OPLS/AA are currently not officially supported; it is not hard to do but requires tedious changes to a few paths in template scripts.
-
class
mdpow.equil.
Simulation
(molecule=None, **kwargs)¶ Simple MD simulation of a single compound molecule in water.
Typical use
S = Simulation(molecule='DRUG') S.topology(itp='drug.itp') S.solvate(struct='DRUG-H.pdb') S.energy_minimize() S.MD_relaxed() S.MD()
Note
The OPLS/AA force field and the TIP4P water molecule is the default; changing this is possible but will require provision of customized itp, mdp and template top files at various stages.
Set up Simulation instance.
The molecule of the compound molecule should be supplied. Existing files (which have been generated in previous runs) can also be supplied.
Keywords: - molecule
Identifier for the compound molecule. This is the same as the entry in the
[ molecule ]
section of the itp file. [“DRUG”]- filename
If provided and molecule is
None
then load the instance from the pickle file filename, which was generated withsave()
.- dirname
base directory; all other directories are created under it
- solvent
‘water’ or ‘octanol’ or ‘cyclohexane’
- solventmodel
None
chooses the default (e.g,mdpow.forcefields.DEFAULT_WATER_MODEL
forsolvent == "water"
. Other options are the models defined inmdpow.forcefields.GROMACS_WATER_MODELS
. At the moment, there are no alternative parameterizations included for other solvents.- mdp
dict with keys corresponding to the stages
energy_minimize
,MD_restrained
,MD_relaxed
,MD_NPT
and values mdp file names (if no entry then the package defaults are used)- kwargs
advanced keywords for short-circuiting; see
mdpow.equil.Simulation.filekeys
.
-
MD
(**kwargs)¶ Short NPT MD simulation.
See documentation of
gromacs.setup.MD()
for details such as runtime or specific queuing system options. The following keywords can not be changed: top, mdp, ndx, mainselection.Note
If the system crashes (with LINCS errors), try initial equilibration with timestep dt = 0.0001 ps (0.1 fs instead of 2 fs) and runtime = 5 ps as done in
MD_relaxed()
Keywords: - struct
starting conformation; by default, the struct is the last frame from the position restraints run, or, if this file cannot be found (e.g. because
Simulation.MD_restrained()
was not run) it falls back to the relaxed and then the solvated system.- mdp
MDP run parameter file for Gromacs
- runtime
total run time in ps
- qscript
list of queuing system scripts to prepare; available values are in
gromacs.config.templates
or you can provide your own filename(s) in the current directory (seegromacs.qsub
for the format of the templates)- qname
name of the job as shown in the queuing system
- startdir
advanced uses: path of the directory on a remote system, which will be hard-coded into the queuing system script(s); see
gromacs.setup.MD()
andgromacs.manager.Manager
-
MD_NPT
(**kwargs)¶ Short NPT MD simulation.
See documentation of
gromacs.setup.MD()
for details such as runtime or specific queuing system options. The following keywords can not be changed: top, mdp, ndx, mainselection.Note
If the system crashes (with LINCS errors), try initial equilibration with timestep dt = 0.0001 ps (0.1 fs instead of 2 fs) and runtime = 5 ps as done in
MD_relaxed()
Keywords: - struct
starting conformation; by default, the struct is the last frame from the position restraints run, or, if this file cannot be found (e.g. because
Simulation.MD_restrained()
was not run) it falls back to the relaxed and then the solvated system.- mdp
MDP run parameter file for Gromacs
- runtime
total run time in ps
- qscript
list of queuing system scripts to prepare; available values are in
gromacs.config.templates
or you can provide your own filename(s) in the current directory (seegromacs.qsub
for the format of the templates)- qname
name of the job as shown in the queuing system
- startdir
advanced uses: path of the directory on a remote system, which will be hard-coded into the queuing system script(s); see
gromacs.setup.MD()
andgromacs.manager.Manager
-
MD_relaxed
(**kwargs)¶ Short MD simulation with timestep = 0.1 fs to relax strain.
Energy minimization does not always remove all problems and LINCS constraint errors occur. A very short runtime = 5 ps MD with very short integration time step dt tends to solve these problems.
Keywords: - struct
starting coordinates (typically guessed)
- mdp
MDP run parameter file for Gromacs
- qscript
list of queuing system submission scripts; probably a good idea to always include the default “local.sh” even if you have your own [“local.sh”]
- qname
name of the job as shown in the queuing system
- startdir
advanced uses: path of the directory on a remote system, which will be hard-coded into the queuing system script(s); see
gromacs.setup.MD()
andgromacs.manager.Manager
-
MD_restrained
(**kwargs)¶ Short MD simulation with position restraints on compound.
See documentation of
gromacs.setup.MD_restrained()
for details. The following keywords can not be changed: top, mdp, ndx, mainselectionNote
Position restraints are activated with
-DPOSRES
directives forgromacs.grompp()
. Hence this will only work if the compound itp file does indeed contain a[ posres ]
section that is protected by a#ifdef POSRES
clause.Keywords: - struct
starting coordinates (leave empty for inspired guess of file name)
- mdp
MDP run parameter file for Gromacs
- qscript
list of queuing system submission scripts; probably a good idea to always include the default “local.sh” even if you have your own [“local.sh”]
- qname
name of the job as shown in the queuing system
- startdir
advanced uses: path of the directory on a remote system, which will be hard-coded into the queuing system script(s); see
gromacs.setup.MD()
andgromacs.manager.Manager
-
energy_minimize
(**kwargs)¶ Energy minimize the solvated structure on the local machine.
kwargs are passed to
gromacs.setup.energ_minimize()
but ifsolvate()
step has been carried out previously all the defaults should just work.
-
get_last_checkpoint
()¶ Returns the checkpoint of the most advanced step in the protocol. Relies on md.gro being present from previous simulation, assumes that checkpoint is then present.
-
get_last_structure
()¶ Returns the coordinates of the most advanced step in the protocol.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
-
make_paths_relative
(prefix='.')¶ Hack to be able to copy directories around: prune basedir from paths.
Warning
This is not guaranteed to work for all paths. In particular, check :attrib:`mdpow.equil.Simulation.dirs.includes` and adjust manually if necessary.
-
processed_topology
(**kwargs)¶ Create a portable topology file from the topology and the solvated system.
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to.
-
solvate
(struct=None, **kwargs)¶ Solvate structure struct in a box of solvent.
The solvent is determined with the solvent keyword to the constructor.
Keywords: - struct
pdb or gro coordinate file (if not supplied, the value is used that was supplied to the constructor of
Simulation
)- kwargs
All other arguments are passed on to
gromacs.setup.solvate()
, but set to sensible default values. top and water are always fixed.
-
topology
(itp='drug.itp', **kwargs)¶ Generate a topology for compound molecule.
Keywords: - itp
Gromacs itp file; will be copied to topology dir and included in topology
- dirname
name of the topology directory [“top”]
- kwargs
see source for top_template, topol
-
class
mdpow.equil.
WaterSimulation
(molecule=None, **kwargs)¶ Equilibrium MD of a solute in a box of water.
Set up Simulation instance.
The molecule of the compound molecule should be supplied. Existing files (which have been generated in previous runs) can also be supplied.
Keywords: - molecule
Identifier for the compound molecule. This is the same as the entry in the
[ molecule ]
section of the itp file. [“DRUG”]- filename
If provided and molecule is
None
then load the instance from the pickle file filename, which was generated withsave()
.- dirname
base directory; all other directories are created under it
- solvent
‘water’ or ‘octanol’ or ‘cyclohexane’
- solventmodel
None
chooses the default (e.g,mdpow.forcefields.DEFAULT_WATER_MODEL
forsolvent == "water"
. Other options are the models defined inmdpow.forcefields.GROMACS_WATER_MODELS
. At the moment, there are no alternative parameterizations included for other solvents.- mdp
dict with keys corresponding to the stages
energy_minimize
,MD_restrained
,MD_relaxed
,MD_NPT
and values mdp file names (if no entry then the package defaults are used)- kwargs
advanced keywords for short-circuiting; see
mdpow.equil.Simulation.filekeys
.
-
class
mdpow.equil.
OctanolSimulation
(molecule=None, **kwargs)¶ Equilibrium MD of a solute in a box of octanol.
Set up Simulation instance.
The molecule of the compound molecule should be supplied. Existing files (which have been generated in previous runs) can also be supplied.
Keywords: - molecule
Identifier for the compound molecule. This is the same as the entry in the
[ molecule ]
section of the itp file. [“DRUG”]- filename
If provided and molecule is
None
then load the instance from the pickle file filename, which was generated withsave()
.- dirname
base directory; all other directories are created under it
- solvent
‘water’ or ‘octanol’ or ‘cyclohexane’
- solventmodel
None
chooses the default (e.g,mdpow.forcefields.DEFAULT_WATER_MODEL
forsolvent == "water"
. Other options are the models defined inmdpow.forcefields.GROMACS_WATER_MODELS
. At the moment, there are no alternative parameterizations included for other solvents.- mdp
dict with keys corresponding to the stages
energy_minimize
,MD_restrained
,MD_relaxed
,MD_NPT
and values mdp file names (if no entry then the package defaults are used)- kwargs
advanced keywords for short-circuiting; see
mdpow.equil.Simulation.filekeys
.
-
mdpow.equil.
DIST
= {'cyclohexane': 1.5, 'octanol': 1.5, 'water': 1.0}¶ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v- dict(**kwargs) -> new dictionary initialized with the name=value pairs
- in the keyword argument list. For example: dict(one=1, two=2)
mdpow.fep
– Calculate free energy of solvation¶
Set up and run free energy perturbation (FEP) calculations to calculate the free energy of hydration of a solute in a solvent box. The protocol follows the works of D. Mobley (Free Energy Tutorial) and M. Shirts, and uses Gromacs 4.0.x.
Required Input:
- topology
- equilibrated structure of the solvated molecule
See the docs for Gsolv
for details on what is calculated.
Differences to published protocols¶
Some notes on how the approach encoded here differs from what others (notabley Mobley) did:
We use Gromacs 4.x and use the new decoupling feature
couple-intramol = no
which indicates that “intra-molecular interactions remain”. It should (as I understand it) allow us to only calculate the free energy changes in solution so that we do not have to do an extra calculation in vacuo. http://www.mail-archive.com/gmx-users@gromacs.org/msg18803.html
Mobley does an extra discharging calculation in vacuo and calculates
\[\Delta A = \Delta A_{\mathrm{coul}}(\mathrm{vac}) - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\](but also annihilates the interactions on the solute, corresponding to
couple-intramol = yes
) whereas we do\[\Delta A = - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\]simulations are run as NPT (but make sure to use Langevin dynamics for temperature control)
User reference¶
Simulation setup and analysis of all FEP simulations is encapsulated
by a mdpow.fep.Gsolv
object. For the hydration free energy
there is a special class Ghyd
and for the
solvation free energy in octanol there is
Goct
. See the description of
Gsolv
for methods common to both.
-
class
mdpow.fep.
Gsolv
(molecule=None, top=None, struct=None, method='BAR', **kwargs)¶ Simulations to calculate and analyze the solvation free energy.
\(\Delta A\) is computed from the decharging and the decoupling step. With our choice of
lambda=0
being the fully interacting andlambda=1
the non-interacting state, it is computed as\[\Delta A = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]With this protocol, the concentration in the liquid and in the gas phase is the same. (Under the assumption of ideal solution/ideal gas behaviour this directly relates to the Ben-Naim 1M/1M standard state.)
(We neglect the negligible correction \(-kT \ln V_x/V_{\mathrm{sim}} = -kT \ln(1 - v_s/V_{\mathrm{sim}})\) where \(V_x\) is the volume of the system without the solute but the same number of water molecules as in the fully interacting case [see Michael Shirts’ Thesis, p82].)
Typical work flow:
G = Gsolv(simulation='drug.simulation') # continue from :mod:`mdpow.equil` G.setup(qscript=['my_template.sge', 'local.sh']) # my_template.sge is user supplied G.qsub() # run SGE job arrays as generated from my_template.sge G.analyze() G.plot()
See
gromacs.qsub
for notes on how to write templates for queuing system scripts (in particular queuing system templates).Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True)¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A* &= -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean (see
gromacs.formats.XVG.error
). - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean (see
-
arraylabel
(component)¶ Batch submission script name for a job array.
-
collect
(stride=None, autosave=True, autocompress=True)¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
compress_dgdl_xvg
()¶ Compress all dgdl xvg files with bzip2.
-
contains_corrupted_xvgs
()¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
dgdl_xvg
(*args)¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.- g
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
fep_dirs
()¶ Generator for all simulation sub directories
-
frombase
(*args)¶ Return path with
Gsolv.basedir
prefixed.
-
has_dVdl
()¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
label
(component)¶ Simple label for component, e.g. for use in filenames.
-
logger_DeltaA0
()¶ Print the free energy contributions (errors in parentheses).
-
plot
(**kwargs)¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.
-
qsub
(script=None)¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
setup
(**kwargs)¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
tasklabel
(component, lmbda)¶ Batch submission script name for a single task job.
-
wdir
(component, lmbda)¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
wname
(component, lmbda)¶ Return name of the window directory itself.
Typically something like
VDW/0000
,VDW/0500
, …,Coulomb/1000
-
write_DeltaA0
(filename, mode='w')¶ Write free energy components to a file.
Arguments: - filename
name of the text file
- mode
‘w’ for overwrite or ‘a’ for append [‘w’]
- Format::
- . —– kJ/mol —— molecule solvent total coulomb vdw
-
class
mdpow.fep.
Ghyd
(molecule=None, top=None, struct=None, method='BAR', **kwargs)¶ Sets up and analyses MD to obtain the hydration free energy of a solute.
Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.
-
class
mdpow.fep.
Goct
(molecule=None, top=None, struct=None, method='BAR', **kwargs)¶ Sets up and analyses MD to obtain the solvation free energy of a solute in octanol.
The coulomb lambda schedule is enhanced compared to water as the initial part of the dV/dl curve is quite sensitive. By adding two additional points we hope to reduce the overall error on the dis-charging free energy.
Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.
-
class
mdpow.fep.
Gcyclo
(molecule=None, top=None, struct=None, method='BAR', **kwargs)¶ Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.
-
mdpow.fep.
pOW
(G1, G2, **kwargs)¶ Compute water-octanol partition coefficient from two
Gsolv
objects.transfer free energy from water into octanol:
DeltaDeltaG0 = DeltaG0_oct - DeltaG0_water
octanol/water partition coefficient:
log P_oct/wat = log [X]_oct/[X]_wat
Arguments: Returns: (transfer free energy, log10 of the octanol/water partition coefficient = log Pow)
-
mdpow.fep.
pCW
(G1, G2, **kwargs)¶ Compute water-cyclohexane partition coefficient from two
Gsolv
objects.transfer free energy from water into cyclohexane:
DeltaDeltaG0 = DeltaG0_cyclohexane - DeltaG0_water
cyclohexane/water partition coefficient:
log P_CW = log [X]_cyclohexane/[X]_water
Arguments: Returns: (transfer free energy, log10 of the cyclohexane/water partition coefficient = log Pcw)
Developer notes¶
A user really only needs to access classes derived from
mdpow.fep.Gsolv
; all other classes and functions are auxiliary and
only of interest to developers.
Additional objects that support mdpow.fep.Gsolv
.
-
class
mdpow.fep.
FEPschedule
¶ Describe mdp parameter choices as key - value pairs.
The FEP schedule can be loaded from a configuration parser with the static method
FEPschedule.load()
.See the example runinput file for an example. It contains the sections:
[FEP_schedule_Coulomb] name = Coulomb description = dis-charging vdw+q --> vdw label = Coul couple_lambda0 = vdw-q couple_lambda1 = vdw # soft core alpha: linear scaling for coulomb sc_alpha = 0 lambdas = 0, 0.25, 0.5, 0.75, 1.0 [FEP_schedule_VDW] name = vdw description = decoupling vdw --> none label = VDW couple_lambda0 = vdw couple_lambda1 = none # recommended values for soft cores (Mobley, Shirts et al) sc_alpha = 0.5 sc_power = 1.0 sc_sigma = 0.3 lambdas = 0.0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1
-
static
load
(cfg, section)¶ Initialize a
FEPschedule
from the section in the configuration cfg
-
mdp_dict
¶ Dict of key-values that can be set in a mdp file.
-
static
-
mdpow.fep.
molar_to_nm3
(c)¶ Convert a concentration in Molar to nm|^-3|.
-
mdpow.fep.
kcal_to_kJ
(x)¶ Convert a energy in kcal to kJ.
-
mdpow.fep.
kJ_to_kcal
(x)¶ Convert a energy in kJ to kcal.
-
mdpow.fep.
N_AVOGADRO
¶ Avogadro’s constant NA in mol-1 (NA NIST value).
-
mdpow.fep.
kBOLTZ
¶ Boltzmann’s constant kB in kJ mol-1 (kB NIST value).
TODO¶
run minimization, NVT-equil, NPT-equil prior to production (probably use preprocessed topology from
grompp -pp
for portability)See Free Energy Tutorial.
mdpow.analysis
— Collection of analysis and plotting functions¶
Simple functions to quickly plot data. Typically, it works best if ran interactively from the top level of the POW directory!
Experimental values are loaded from the targets list (targets.csv
)
and computed values from the table in pow.txt
. See
plot_exp_vs_comp()
for details.
Prepare data¶
First copy the computed results , the pow.txt
and energies.txt
files that are produced by mdpow-pow, into the data
directories.
Then format them:
./lib/scripts/make_tables.sh data/*
The experimental data are taken from targets.numbers. In
numbers export the table as UTF-8 in CSV format to
experimental/targets.csv
. This is only necessary if the experimental data
were changed. We only plot entries for which
- a id number (first column no) is defined (should be unique)
- a logPow value exists
- a itp_name exists, which must correspond to the molecule name used in
mdpow.fep.Gsolv
Making graphs¶
For the following, import the module:
import mdpow.analysis
Octanol-water partition coefficients¶
Plot results and save to a pdf file with plot_exp_vs_comp()
:
mdpow.analysis.plot_exp_vs_comp(figname="figs/logPow.pdf")
By default we also include the SAMPL2 and reference (Ref) set results. In practice some manual adjustments are required, e.g.
mdpow.analysis.plot_exp_vs_comp()
# resize window so that (huge!) legend fits
ylim(-22,10)
savefig("figs/logPow.pdf")
Using a file named exclusions.txt
in the same directory as the
data file, one can exclude certain runs from appearing in the graph:
set the exclusions keyword to True
:
pylab.clf()
mdpow.analysis.plot_exp_vs_comp(exclusions=True, figname='figs/logPow_best.pdf')
In practice, manual fiddling is required such as resizing the graph:
mdpow.analysis.plot_exp_vs_comp(exclusions=True)
ylim(-14,10)
savefig('figs/logPow_best.pdf')
The exclusions.txt
files must contain a table such as
Table[exclusions]: These sims are ignored.
======== ===========================================
itp_name directory_regex
======== ===========================================
AXX .*
5FH benzylhyd$
======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis.
Solvation energies¶
Plots that compare experimental hydration and octanol-solvation free energies to computed values. DeltaG_hyd are only available for a few compounds so we only plot a subset of all the compounds that we have done.
Experimental octanol solvation free energies are computed from experimental logPow and DeltaGhyd from
logPow = -(DeltaGoct-DeltaGhyd)/kT * log10(e)
(see also gsolv2logpow()
) as
DeltaGoct = DeltaGhyd - kT*logPow / log10(e)
Warning
In principle the temperature T of the logPow measurement and the DeltaGhyd measurement must be the same. For the time being we just assume that both were done at T=300K. Also, the error on DeltaGoct is not calculated properly at the moment, either, because the error on logPow is hard to quantify (based on the logKow data). We are estimating the error on kT*logPow/log10(e) error as 0.5 kcal/mol and combine it with the known experimental error for DeltaGhyd.
Plotting uses the GsolvData.plot()
method from GsolvData
:
from pylab import *
clf()
G = mdpow.analysis.GsolvData()
G.plot('hyd')
# adjust things such as manually increasing window ...
ylim(-180,20)
savefig("figs/ghyd.pdf")
clf()
G.plot('oct')
ylim(-150,20)
savefig("figs/goct.pdf")
Right now, the plots are a bit messy but I opted to include the legend to make it easier for us to understand the data. I had to manually increase the plotting window to make things fit properly.
GsolvData
also honours the exlusions = True
keyword argument.
Functions¶
-
mdpow.analysis.
plot_exp_vs_comp
(**kwargs)¶ Plot computed logPow against experimental values from default files.
Experimental values are stored in the reST table referenced byt the experiments keyword. data contains a list of
pow.txt
tables for the calculated values.
-
class
mdpow.analysis.
GsolvData
(exp='experimental/targets.csv', **kwargs)¶ Solvation energies organized as a database.
Plot either “hyd” or “oct” with
GsolvData.plot()
.Load experimental and simulation data.
The defaults load all the data generated in the project so far. See
DEFAULTS_E
in the python code.Keywords: - exp
path to the experimental
targets.csv
file- data
list of simulation results (typically stored as reST
energies.txt
).- temperature
temperature at which experimental measurements of logPow were presumed to be taken; needed for the calculations of the experimental DeltaG_oct from logPow and experimental [300.0] DeltaG_hyd.
- exclusions
False
does nothing special.True
: look for exclusions.txt in same directory as each data file. If it contains a table such as:Table[exclusions]: These sims are ignored. ======== =========================================== itp_name directory_regex ======== =========================================== AXX .* 5FH benzylhyd$ ======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [
False
]
-
plot
(mode, **kwargs)¶ Plot individual data points with legend.
- mode
- “hyd” or “oct”
- compoundnames
False
puts the directory names in the legend,True
uses the chemical compound names; “auto” choosesTrue
if exclusions were applied [“auto”]- figname
- write figure to filename; suffix determines file type
- ymin, ymax
- limits of the plot in the y direction (=computational results)
-
class
mdpow.analysis.
ExpComp
(**kwargs)¶ Database combining experimental with computed values.
Keywords: - experiments
path to
targets.csv
- data
list of
pow.txt
paths; default are the files for Ref, run01, SAMPL2 (stored inDEFAULTS_POW
)- exclusions
False
Does nothing special.
True
Look for exclusions.txt in same directory as each data file. If it contains a table such as:
Table[exclusions]: These sims are ignored. ======== =========================================== itp_name directory_regex ======== =========================================== AXX .* 5FH benzylhyd$ ======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [
False
]
-
plot
(**kwargs)¶ Plot individual data points with legend.
By default, the following should work:
Run from the top
mdpow
directory.Prepare
data/run01/pow.txt
(must prepend header and append footer so that it is proper table). Seeload_data()
.Prepare
experimental/targets.csv
if it does not exist or if something changed. Seeload_exp()
for details.- compoundnames
False
puts the directory names in the legend,True
uses the chemical compound names; “auto” choosesTrue
if exclusions were applied [“auto”]- figname
write figure to filename; suffix determines file type
- ymin, ymax
limits of the plot in the y direction (=computational results)
-
class
mdpow.analysis.
ExpData
(filename='experimental/targets.csv', **kwargs)¶ Object that represents our experimental data.
Access the raw data via
ExpData.rawdata
and a table enriched with statistics asExpData.data
(which is arecsql.SQLarray
).Load experimental values table and return
recsql.SQLarray
.To obtain
targets.csv
exporttargets.numbers
in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)
-
class
mdpow.analysis.
ComputedData
(filename='data/run01/pow.txt', **kwargs)¶ Object that represents computed data.
Access via
ComputedData.data
.Load computed POW table and return
recsql.SQLarray
.The data file is typically
pow.txt
. It must contain a proper reST table. Use the_header
and_footer
files if you only have the raw output from mdpow-pow.Furthermore, the column names are important because we use them here.
-
mdpow.analysis.
load_data
(filename='data/run01/pow.txt', **kwargs)¶ Load computed POW table and return
recsql.SQLarray
.The data file is typically
pow.txt
. It must contain a proper reST table. Use the_header
and_footer
files if you only have the raw output from mdpow-pow.Furthermore, the column names (defined in the header and footer files!) are important because we use them here.
-
mdpow.analysis.
load_exp
(filename='experimental/targets.csv', **kwargs)¶ Load experimental values table and return
recsql.SQLarray
.To obtain
targets.csv
exporttargets.numbers
in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)
-
mdpow.analysis.
gsolv2logpow
(Gwat, Goct, unit='kcal/mol', temperature=300.0)¶ Calculate logPow from the solvation free energies.
logPow = -(Goct-Ghyd)/kT * log10(e)Note
Default unit is kcal/mol, unlike the rest of mdpow, which uses kJ/mol. The reason is that most solvation free energies in the literature are quoted in kcal/mol.
Arguments: - Gwat
hydration free energy
- Goct
octanol solvation free energy
- temperature
temperature in K [300]
- unit
unit of the energies, either “kcal/mol” or “kJ/mol”; [“kcal/mol”]
Helper modules¶
The code described here is only relevant for developers.
mdpow.config
– Configuration for POW¶
The config module provides configurable options for the whole package; eventually it might grow into a more sophisticated configuration system but right now it mostly serves to define which gromacs tools and other scripts are offered in the package and where template files are located. If the user wants to change anything they will still have to do it here in source until a better mechanism with a global configuration file has been implemented.
Force field¶
By default, MDPOW uses a collection of OPLS/AA force field files based on the Gromacs 4.5.3 distribution, with the following differences:
- For ions we use the new alkali and halide ion parameters from Table 2 in [Jensen2006] which had shown some small improvements in the paper. They should only be used with the TIP4P water model.
- OPLS/AA parameters for 1-octanol were added. These parameters were validated against experimental data by computing the density (neat), hydration free energy and logP (the latter being a self consistency check).
The force field files are found in the directory pointed to by the environment
variable GMXLIB
. By default, mdpow.config
sets
GMXLIB
to includedir
unless GMXLIB
has already been
set. This mechanism allows the user to override the choice of location of force
field.
At the moment, only OPLS/AA is tested with MDPOW although in principle it is possible to use other force fields by supplying appropriately customized template files.
References
[Jensen2006] | K.P. Jensen and W.L. Jorgensen, J Comp Theor Comput 2 (2006), 1499. doi:10.1021/ct600252r |
Location of template files¶
Template variables list files in the package that can be used as templates such as run input files. Because the package can be a zipped egg we actually have to unwrap these files at this stage but this is completely transparent to the user.
-
mdpow.config.
templates
= {'NPT_opls.mdp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/NPT_opls.mdp', 'bar_opls.mdp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/bar_opls.mdp', 'em_opls.mdp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/em_opls.mdp', 'fep_opls.mdp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/fep_opls.mdp', 'runinput.cfg': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/runinput.cfg', 'runinput.yml': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/runinput.yml', 'system.top': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/system.top'}¶ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v- dict(**kwargs) -> new dictionary initialized with the name=value pairs
- in the keyword argument list. For example: dict(one=1, two=2)
-
mdpow.config.
topfiles
= {'1cyclo.gro': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/1cyclo.gro', '1cyclo.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/1cyclo.itp', '1oct.gro': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/1oct.gro', '1oct.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/1oct.itp', '1propanol.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/1propanol.itp', 'aminoacids.c.tdb': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.c.tdb', 'aminoacids.hdb': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.hdb', 'aminoacids.n.tdb': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.n.tdb', 'aminoacids.r2b': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.r2b', 'aminoacids.rtp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.rtp', 'aminoacids.vsd': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/aminoacids.vsd', 'atommass.dat': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/atommass.dat', 'atomname2type.n2t': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/atomname2type.n2t', 'atomtypes.atp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/atomtypes.atp', 'elements.dat': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/elements.dat', 'ethanol.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/ethanol.itp', 'ffbonded.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/ffbonded.itp', 'ffnonbonded.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/ffnonbonded.itp', 'ffoplsaa.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/ffoplsaa.itp', 'forcefield.doc': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/forcefield.doc', 'forcefield.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/forcefield.itp', 'gbsa.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/gbsa.itp', 'ions.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/ions.itp', 'ions_opls.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/ions_opls.itp', 'methanol.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/methanol.itp', 'oplsaa.ff': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff', 'residuetypes.dat': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/residuetypes.dat', 'spc.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/spc.itp', 'spc216.gro': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/spc216.gro', 'spce.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/spce.itp', 'tip3p.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/tip3p.itp', 'tip4p.gro': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/tip4p.gro', 'tip4p.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/tip4p.itp', 'tip5p.gro': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/tip5p.gro', 'tip5p.itp': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/tip5p.itp', 'vdwradii.dat': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/vdwradii.dat', 'watermodels.dat': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top/oplsaa.ff/watermodels.dat'}¶ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v- dict(**kwargs) -> new dictionary initialized with the name=value pairs
- in the keyword argument list. For example: dict(one=1, two=2)
-
mdpow.config.
includedir
= '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/top'¶ str(object=’’) -> string
Return a nice string representation of the object. If the argument is a string, the return value is the same object.
-
mdpow.config.
defaults
= {'runinput': '/home/docs/.cache/Python-Eggs/POW-0.6.0-py2.7.egg-tmp/mdpow/templates/runinput.yml'}¶ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v- dict(**kwargs) -> new dictionary initialized with the name=value pairs
- in the keyword argument list. For example: dict(one=1, two=2)
Functions¶
The following functions can be used to access configuration data.
-
mdpow.config.
get_template
(t)¶ Find template file t and return its real path.
t can be a single string or a list of strings. A string should be one of
- a relative or absolute path,
- a filename in the package template directory (defined in the template dictionary
gromacs.config.templates
) or - a key into
templates
.
The first match (in this order) is returned. If the argument is a single string then a single string is returned, otherwise a list of strings.
Arguments: t : template file or key (string or list of strings) Returns: os.path.realpath(t) (or a list thereof) Raises: ValueError
if no file can be located.
-
mdpow.config.
get_templates
(t)¶ Find template file(s) t and return their real paths.
t can be a single string or a list of strings. A string should be one of
- a relative or absolute path,
- a filename in the package template directory (defined in the template dictionary
gromacs.config.templates
) or - a key into
templates
.
The first match (in this order) is returned for each input argument.
Arguments: t : template file or key (string or list of strings) Returns: list of os.path.realpath(t) Raises: ValueError
if no file can be located.
-
mdpow.config.
get_configuration
(filename=None)¶ Reads and parses a run input config file.
Uses the package-bundled defaults as a basis.
Developer note
Templates have to be extracted from the egg because they are used by external
code. All template filenames are stored in config.templates
or
config.topfiles
.
Sub-directories are extracted (see Resource extraction) but the file names themselves will not appear in the template dict. Thus, only store files in subdirectories that don’t have to be explicitly found by the package (e.g. the Gromacs force field files are ok).
-
mdpow.config.
_generate_template_dict
(dirname)¶ Generate a list of included top-level files and extract them to a temp space.
Only lists files and directories at the top level of the dirname; however, all directories are extracted recursively and will be available.
mdpow.log
— Configure logging for POW analysis¶
Import this module if logging is desired in application code and
create the logger in __init__.py
:
import log
logger = log.create(logname, logfile)
In modules simply use:
import logging
logger = logging.getLogger(logname)
mdpow.restart
— Restarting and checkpointing¶
The module provides classes and functions to keep track of which stages of a
simulation protocol have been completed. It uses a Journal
class for
the book-keeping. Together with saving the current state of a protocol to a
checkpoint file (using Journalled.save()
) it is possible to implement
restartable simulation protocols (for example mdpow-equilibrium).
-
exception
mdpow.restart.
JournalSequenceError
¶ Raised when a stage is started without another one having been completed.
-
class
mdpow.restart.
Journal
(stages)¶ Class that keeps track of the stage in a protocol.
Transaction blocks have to be bracketed by calls to
start()
andcompleted()
. If a block is started before completion, aJournalSequenceError
will be raised.Other methods such as
has_completed()
andhas_not_completed()
can be used to query the status. The attributeincomplete
flags the state of the current stage (current
).All completed stages are recorded in the attribute
history
.The current (incomplete) stage can be reset to its initial state with
Journal.clear()
.Example:
J = Journal(['pre', 'main', 'post']) J.start('pre') ... J.completed('pre') J.start('main') ... # main does not finish properly print J.incomplete # --> 'main' J.start('post') # raises JournalSequenceError
Initialise the journal that keeps track of stages.
Arguments: - stages
list of the stage identifiers, in the order that they should per performed. Stage identifiers are checked against this list before they are accepted as arguments to most methods.
-
clear
()¶ Reset incomplete status and current stage
-
completed
(stage)¶ Record completed stage and reset
Journal.current
-
current
¶ Current stage identifier
-
has_completed
(stage)¶ Returns
True
if the stage has been started and completed at any time.
-
has_not_completed
(stage)¶ Returns
True
if the stage had been started but not completed yet.
-
history
¶ List of stages completed
-
incomplete
¶ This last stage was not completed.
-
start
(stage)¶ Record that stage is starting.
-
class
mdpow.restart.
Journalled
(*args, **kwargs)¶ A base class providing methods for journalling and restarts.
It installs an instance of
Journal
in the attributeJournalled.journal
if it does not exist already.-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
-
mdpow.restart.
checkpoint
(name, sim, filename)¶ Execute the
Journalled.save()
method and log the event.
mdpow.run
— Performing complete simulation protocols¶
The module provides building blocks for complete simulation protocols (or pipelines). Each protocol is written as a function that takes a run input file and the solvent type as input.
The mdpow-* scripts make use of the building blocks.
Typically, journalling is enabled, i.e. the tasks remember which stages have already been completed and can be restarted directly from the last completed stage. (Restarts are only implemeneted at the level of individual steps in a MDPOW protocol, not at the level of continuing interrupted simulations using the Gromacs restart files.)
Input is read from the run input cfg file.
See also
mdpow.restart
for the journalling required for restarts.
Protocols¶
-
mdpow.run.
equilibrium_simulation
(cfg, solvent, **kwargs)¶ Set up and run equilibrium simulation.
See tutorial for the individual steps.
Note
Depending on the settings in the run input file (
runlocal
), mdrun is executed at various stages, and hence this process can take a while.
-
mdpow.run.
fep_simulation
(cfg, solvent, **kwargs)¶ Set up and run FEP simulation.
See tutorial for the individual steps.
Note
Depending on the settings in the run input file (
runlocal
), mdrun is executed sequentially for all windows and hence this can take a long time. It is recommended to userunlocal = False
in the run input file and submit all window simulations to a cluster.
Support¶
-
class
mdpow.run.
MDrunnerSimple
(dirname='.', **kwargs)¶ Gromacs mdrun.
Use Gromacs 4.5.x with threaded mdrun to get max performance and just leave everything as it is. For older version you can change the name and add a mpiexec binary to launch a multiprocessor job. See
gromacs.run
for details.Set up a simple run with
mdrun
.Keywords: - dirname
Change to this directory before launching the job. Input files must be supplied relative to this directory.
- keywords
All other keword arguments are used to construct the
mdrun
commandline. Note that only keyword arguments are allowed.
-
mdpow.run.
setupMD
(S, protocol, cfg)¶ setup MD simulation protocol using the runinput file cfg
-
mdpow.run.
get_mdp_files
(cfg, protocols)¶ Get file names of MDP files from cfg for all protocols
-
mdpow.run.
runMD_or_exit
(S, protocol, params, cfg, **kwargs)¶ run simulation
Can launch mdrun itself (
MDrunnerSimple
) or exit so that the user can run the simulation independently. Checks if the simulation has completed and sets the return value toTrue
if this is the case.- Configuration parameters are taken from the section protocol of the runinput configuration cfg.
- The directory in which the simulation input files reside can be provided as keyword argument dirname or taken from S.dirs[protocol].
- Other kwargs are interpreted as options for
Mdrun
.
It never returns
False
but instead does asys.exit()
.
Tables of hard-coded values used in mdpow¶
TODO: Move these data into files in the top directory.
See also
mdpow.forcefields
Footnotes
[1] | The package is built on top of the `GromacsWrapper`_ framework (which is automatically installed). |