3. The mdpow-*
scripts¶
A number of python scripts are installed together with the
mdpow
package. They simplify some common tasks (especially at
the analysis stage) but they make some assumptions about directory
layout and filenames. If one uses defaults for all directory and
filename options then it should “just work”.
In particular, a directory hierarchy such as the following is assumed:
moleculename/
water.simulation
octanol.simulation
Equilibrium/
water/
octanol/
FEP/
water/
Ghyd.fep
Coulomb/
VDW/
octanol/
Goct.fep
Coulomb/
VDW/
moleculename is, for instance, “benzene” or “amantadine”; in the run
input file (see Equilibrium simulations) is the value of the
variable name
in the [setup]
section.
3.1. Run input file¶
The mdpow-*
scripts are the commandline interface (CLI) to the
Python API functionality in the mdpow
package. Whereas the
Python API requires passing of parameters to classes and functions and
therefore exposes the full flexibility of MDPOW, the CLI works with a
narrower set of options which are collected in a run input
file. This run input file or RUNFILE is named runinput.yml
by default.
A template RUNFILE can be generated with mdpow-get-runinput
mdpow-get-runinput runinput.yml
which will copy the default run input file bundled with MDPOW and put
it in the current directory under the name runinput.yml
.
For an example of a RUNFILE see benzene.yml
. The comments in the file serve as the
documentation.
The run input file uses YAML syntax (and is parsed by yaml
).
Note
It is recommended to use absolute paths to file names.
Note
It is recommended to enclose all strings in the input file in quotes, especially if they can be interpreted as numbers. For example, a name “005” would be interpreted as the number 5 unless explicitly quoted.
3.2. Equilibrium simulations¶
The mdpow-equilibrium script
- sets up equilibrium MD simulations for the solvents (e.g., water or octanol)
- runs energy minimization, MD_relaxed, and MD_NPT protocols; the user can choose if she wants to launch mdrun herself (e.g. on a cluster) or let the script do it locally on the workstation
The script runs essentially the same steps as described in the tutorial Tutorial: Manual session: 1-octanol as a solute but it gathers all required parameters from the run input file and it allows one to stop and continue and the protocol transparently.
It requires as at least Gromacs 4.6.5 ready to run (check that all commands can be found). The required input is
- a run configuration file (runinput.yml);
- a structure file (PDB or GRO) for the compound
- a Gromacs ITP file for the compound (OPLS/AA force field)
The script keeps track of the stages of the simulation protocol (in
the state files water.simulation
, octanol.simulation
etc) and allows the user to restart from the last completed
stage. For instance, one can use the script to set up a simulation,
then run the simulation on a cluster, transfer back the generated
files, and start mdpow-equilibrium again with the exact
same input to finish the protocol. Since Gromacs 4.5 it is also
possible to interrupt a running mdrun process (e.g. with
Control-c) and then resume the simulation at the last saved
trajectory checkpoint by running mdpow-equilibrium again.
If in doubt, just try running mdpow-equilibrium running again and let it figure out the best course of action. Look at the log file to see what has been done. Lines reading “Fast forwarding: stage” indicate that results from stage are available.
Usage of the command:
mdpow-equilibrium [options] RUNFILE
Set up (and possibly run) the equilibration equilibrium simulation for one compound and one solvent. All parameters except the solvent are specified in the RUNFILE.
Arguments:
RUNFILE
¶
The runfile
runinput.yml
with all configuration parameters.Options:
-h
,
--help
¶
show this help message and exit
-S
<NAME>
,
--solvent
=<NAME>
¶solvent
<NAME>
for compound, can be ‘water’, ‘octanol’, ‘cyclohexane’ [water]
-d
<DIRECTORY>
,
--dirname
=<DIRECTORY>
¶generate files and directories in
<DIRECTORY>
, which is created if it does not already exist. The default is to use the molecule name from the run input file.
3.3. FEP simulations¶
The mdpow-fep script sets up (and can also run) the free energy perturbation calculations for one compound and one solvent. It uses the results from mdpow-equilibrium together with the run input file.
You will require:
- at least Gromacs 4.6.5 ready to run (check that all commands can be found)
- the results from a previous complete run of mdpow-equilibrium
Usage of the command:
mdpow-fep [options] RUNFILE
Arguments:
RUNFILE
¶
The runfile
runinput.yml
with all configuration parameters.Options:
-h
,
--help
¶
show this help message and exit
-S
<NAME>
,
--solvent
=<NAME>
¶solvent
<NAME>
for compound, can be ‘water’ or ‘octanol’ [water]
-d
<DIRECTORY>
,
--dirname
=<DIRECTORY>
¶generate files and directories in
<DIRECTORY>
, which is created if it does not already exist. The default is to use the molecule name from the run input file.
3.4. Running analysis¶
3.4.1. Solvation free energy¶
The mdpow-solvationenergy calculatest the solvation free energy. It
- collects data from FEP simulations (converting EDR files to XVG.bz2 when necessary)
- calculates desolvation free energies for solvent –> vacuum via thermodynamic integration (TI)
- plots dV/dlambda graphs
- appends results to
energies.txt
(when the default names are chosen), see Output data file formats.
Usage of the command:
mdpow-solvationenergy [options] DIRECTORY [DIRECTORY …]
Run the free energy analysis for a solvent in
<DIRECTORY>/FEP
and returnDeltaG
.
DIRECTORY
should contain all the files resulting from runningmdpow.fep.Ghyd.setup()
(or the correspondingGoct.setup()
orGcyclohexane.setup()
and the results of the MD FEP simulations. It relies on the canonical naming scheme (basically: just use the defaults as in the tutorial).The dV/dlambda plots can be produced automatically (
--plotfile auto
). If multipleDIRECTORY
arguments are provided then one has to choose the “auto” option (or “none”).The total solvation free energy is calculated as
\[\Delta G^{*} = -(\Delta G_{\text{coul}} + \Delta G_{\text{vdw}})\]Note that the standard state refers to the “Ben-Naim” standard state of transferring 1 M of compound in the gas phase to 1 M in the aqueous phase.
Results are appended to a data file with Output file format:
. ---------- kJ/mol --- molecule solvent DeltaG* coulomb vdwAll observables are quoted with an error estimate, which is derived from the correlation time and error propagation through Simpson’s rule (see
mdpow.fep.Gsolv()
). It ultimately comes from the error on <dV/dlambda>, which is estimated as the error to determine the average.
molecule: molecule name as used in the itp DeltaG*: solvation free energy vacuum –> solvent, in kJ/mol coulomb: discharging contribution to the DeltaG* vdw: decoupling contribution to the DeltaG* directory: folder in which the simulations were stored positional arguments:
DIRECTORY
[DIRECTORY ...]
¶directory or directories which contain all the files resulting from running
mdpow.fep.Ghyd.setup()
optional arguments:
-h
,
--help
¶
show this help message and exit
--plotfile
FILE
¶plot dV/dlambda to FILE; use png or pdf suffix to determine the file type. ‘auto’ generates a pdf file
DIRECTORY/dVdl_molname_solvent.pdf
and none disables it The plot function is only available for mdpow estimator,and is disabled when using alchemlyb estimators. (default: none)
--solvent
NAME
,
-S
NAME
¶solvent NAME for compound, ‘water’, ‘octanol’, or ‘cyclohexane’ (default: water)
-o
FILE
,
--outfile
FILE
¶append one-line results summary to FILE (default:
gsolv.txt
)
-e
FILE
,
--energies
FILE
¶append solvation free energies to FILE (default:
energies.txt
)
--estimator
{mdpow,alchemlyb}
¶choose the estimator to be used, alchemlyb or mdpow estimators (default: alchemlyb)
--method
{TI,MBAR,BAR}
¶choose the method to calculate free energy (default: MBAR)
--force
¶
force rereading all data (default: False)
--SI
¶
enable statistical inefficiency (SI) analysis. Statistical inefficiency analysis is performed by default when usingalchemlyb estimators and is always disabled when using mdpow estimator. (default: True)
--no-SI
¶
disable statistical inefficiency analysis. Statistical inefficiency analysis is performed by default when usingalchemlyb estimators and is disabled when using mdpow estimator. (default: False)
-s
N
,
--stride
N
¶use every N-th datapoint from the original dV/dlambda data. (default: 1)
--start
START
¶start point for the data used from the original dV/dlambda data. (default: 0)
--stop
STOP
¶stop point for the data used from the original dV/dlambda data. (default: None)
--ignore-corrupted
¶
skip lines in the md.xvg files that cannot be parsed. (default: False)
Warning
Other lines in the file might have been corrupted in such a way that they appear correct but are in fact wrong. WRONG RESULTS CAN OCCUR! USE AT YOUR OWN RISK
It is recommended to simply rerun the affected simulation(s).
3.4.2. Partition coefficients¶
The mdpow-pow script
- collects data from FEP simulations
- calculates desolvation free energies for octanol –> vacuum and water –> vacuum via thermodynamic integration (TI)
- calculates transfer free energy water –> octanol
- calculates log POW
- plots dV/dlambda graphs
- appends results to
pow.txt
andenergies.txt
(when the default names are chosen), see Output data file formats.
An equivalent script mdpow-pcw for the water-cyclohexane partition coefficient is also included.
Usage of the command:
mdpow-pow [options] DIRECTORY [DIRECTORY …]
Run the free energy analysis for water and octanol in DIRECTORY/FEP and return the octanol-water partition coefficient log POW.
DIRECTORY should contain all the files resulting from running
mdpow.fep.Goct.setup()
andmdpow.fep.Goct.setup()
and the results of the MD FEP simulations. It relies on the canonical naming scheme (basically: just use the defaults as in the tutorial).The dV/dlambda plots can be produced automatically (
--plotfile auto
). If multiple DIRECTORY arguments are provided then one has to choose the “auto” option (or “none”).positional arguments:
DIRECTORY
[DIRECTORY ...]
¶One or more directories that contain the state pickle files (
water.simulation
,octanol.simulation
) for the solvation free energy calculations in water and octanol. These directory or directories should contain all the files resulting from runningmdpow.fep.Ghyd.setup()
andmdpow.fep.Goct.setup()
and the results of the MD FEP simulations.optional arguments:
-h
,
--help
¶
show this help message and exit
--plotfile
FILE
¶plot dV/dlambda to FILE; use png or pdf suffix to determine the file type. ‘auto’ generates a pdf file
DIRECTORY/dVdl_molname_pow.pdf
and none disables it The plot function is only available for mdpow estimator,and is disabled when using alchemlyb estimators. (default: none)
-o
FILE
,
--outfile
FILE
¶append one-line results summary to FILE (default:
pow.txt
)
-e
FILE
,
--energies
FILE
¶append solvation free energies to FILE (default:
energies.txt
)
--estimator
{mdpow,alchemlyb}
¶choose the estimator to be used, alchemlyb or mdpow estimators (default: alchemlyb)
--method
{TI,MBAR,BAR}
¶choose the method to calculate free energy (default: MBAR)
--force
¶
force rereading all data (default: False)
--SI
¶
enable statistical inefficiency (SI) analysis. Statistical inefficiency analysis is performed by default when usingalchemlyb estimators and is always disabled when using mdpow estimator. (default: True)
--no-SI
¶
disable statistical inefficiency analysis. Statistical inefficiency analysis is performed by default when usingalchemlyb estimators and is disabled when using mdpow estimator. (default: False)
-s
N
,
--stride
N
¶use every N-th datapoint from the original dV/dlambda data. (default: 1)
--start
START
¶start point for the data used from the original dV/dlambda data. (default: 0)
--stop
STOP
¶stop point for the data used from the original dV/dlambda data. (default: None)
--ignore-corrupted
¶
skip lines in the md.xvg files that cannot be parsed. (default: False)
Warning
Other lines in the file might have been corrupted in such a way that they appear correct but are in fact wrong. WRONG RESULTS CAN OCCUR! USE AT YOUR OWN RISK
It is recommended to simply rerun the affected simulation(s).
3.4.3. Output data file formats¶
Results are appended to data files.
Note
Energies are always output in kJ/mol.
3.4.3.1. POW summary file¶
The pow.txt
output file summarises the results from the water and
octanol solvation calculations. Its name set with the option mdpow-pow -o
.
It contains fixed column data in the following order and all energies are
in kJ/mol. See the Table of computed water-octanol transfer energies
and logPow as an example.
itp_name
molecule identifier from the itp file
DeltaG0
transfer free energy from water to octanol (difference between DeltaG0 for octanol and water), in kJ/mol; >0: partitions into water, <0: partitions into octanol,
errDG0
error on DeltaG0; errors are calculated through propagation of errors from the errors on the means <dV/dlambda>
logPOW
log POW, base-10 logarithm of the octanol-water partition coefficient; >0: partitions into octanol, <0: partitions preferrably into water
errlogP
error on logPow
directory
directory under which all data files are stored; by default this is the name of the molecule and hence it can be used to identify the compound.
itp_name | DeltaG0 | errDG0 | logPow | errlogP | directory |
---|---|---|---|---|---|
BNZ | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
OC9 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
URE | +4.66 | 1.13 | -0.81 | 0.20 | urea |
902 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
3.4.3.2. Energy file¶
The energy.txt
output file collects all computed energy terms together
with the results also found in the summary file pow.txt
. Its name is
set with the option mdpow-pow -e
. It contains fixed column data in
the following order and all energies are in kJ/mol. As an example see
Table of Solvation Energies for the same
compounds as above.
molecule
molecule identifier from the itp file
solvent
solvent name (water, octanol)
DeltaG0
solvation free energy difference in kJ/mol (Ben-Naim standard state, i.e. 1M gas/1M solution)
errDG0
error on DeltaG0, calculated through propagation of errors from the errors on the mean <dV/dlambda>
coulomb
Coulomb (discharging) contribution to the solvation free energy DeltaG0
errCoul
error on coulomb
VDW
Van der Waals/Lennard-Jones (decoupling) contribution to DeltaG0
errVDW
error on VDW
w2oct
transfer free energy from water to octanol (difference between DeltaG0 for octanol and water)
errw2oct
error on w2oct
logPOW
log POW
errlogP
error on logPow
directory
directory under which all data files are stored; by default this is the name of the molecule and hence it can be used to identify the compound.
molecule | solvent | DeltaG0 | errDG0 | coulomb | errCoul | VDW | errVDW | w2oct | errw2oct | logPOW | errlogP | directory |
---|---|---|---|---|---|---|---|---|---|---|---|---|
BNZ | water | -2.97 | 0.21 | +7.65 | 0.07 | -4.68 | 0.20 | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
BNZ | octanol | -15.84 | 0.37 | +1.40 | 0.19 | +14.44 | 0.32 | -12.87 | 0.43 | +2.24 | 0.07 | benzene |
OC9 | water | -16.03 | 0.32 | +27.35 | 0.09 | -11.32 | 0.31 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
OC9 | octanol | -32.28 | 1.08 | +11.32 | 0.92 | +20.96 | 0.56 | -16.24 | 1.12 | +2.83 | 0.20 | octanol |
URE | water | -53.52 | 0.17 | +56.94 | 0.11 | -3.41 | 0.14 | +4.66 | 1.13 | -0.81 | 0.20 | urea |
URE | octanol | -48.86 | 1.12 | +35.75 | 1.09 | +13.11 | 0.25 | +4.66 | 1.13 | -0.81 | 0.20 | urea |
902 | water | -25.46 | 0.11 | +34.93 | 0.10 | -9.48 | 0.06 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
902 | octanol | -16.11 | 1.05 | +21.16 | 1.05 | -5.05 | 0.09 | +9.35 | 1.06 | -1.63 | 0.18 | water_TIP4P |
3.5. House-keeping scripts¶
A number of scripts are provided to complete simple tasks; they can be used to check that all required files are present or they can help in fixing small problems without one having to write Python code to do this oneself (e.g. by manipulating the checlpoint files). They generally make the same assumptions about file system layout as the other mdpow scripts.
3.5.1. Checking if the simulation is complete¶
Run mdpow-check in order to check if all necessary files are available.
Usage of the command:
mdpow-check [options] DIRECTORY [DIRECTORY …]
Check status of the progress of the project in DIRECTORY.
Output is only written to the status file (
status.txt
). A quick way to find problems is to do acat status.txt | gawk -F '|' '$2 !~ /OK/ {print $0}'Options:
-h
,
--help
¶
show this help message and exit
-o
<FILE>
,
--outfile
=<FILE>
¶write status results to
FILE
[status.txt
]
3.5.2. Changing paths in water.simulation
and octanol.simulation
¶
It can become necessary to recreate the solvent.simulation
status/checkpoint files in order to change paths, e.g. when one moved
the directories or transferred all files to a different file system.
Typically, one would execute the mdpow-rebuild-simulation command in the parent directory of moleculename.
Usage of the command:
3.5.3. Re-building Ghyd.fep
and Goct.fep
¶
It can become necessary to recreate the name.fep
status/checkpoint
files (e.g. if the files became corrupted due to a software error or in order
to change paths).
Typically, one would execute the mdpow-rebuild-fep command in the parent directory of moleculename.
Usage of the command:
mdpow-rebuild-fep [options] DIRECTORY [DIRECTORY …]
Re-create the
Goct.fep
orGhyd.fep
file using the appropriate equilibrium simulation file under DIRECTORY/FEP.This should only be necessary when the fep file was destroyed due to a software error or when the files are transferred to a different file system and some of the paths stored in the
name.fep
files have to be changed.Options:
-h
,
--help
¶
show this help message and exit
--solvent
=<NAME>
¶rebuild fep for ‘water’, ‘octanol’, or ‘all’ [all]
--setup
=<LIST>
¶Re-generate queuing system scripts with appropriate paths: runs
fep.Gsolv.setup()
with argument qscript=[LIST] after fixing Gsolv.
LIST
should contain a comma-separated list of queing system templates. For example:'icsn_8pd.sge,icsn_2pd.sge,local.sh'
. It is an error if--setup
is set without aLIST
.