2. mdpow
— Computing the octanol/water partitioning coefficient¶
The mdpow
module helps in setting up and analyzing absolute
free energy calculations of small molecules by molecular dynamics (MD)
simulations. By computing the hydration free energy and the solvation
free energy in octanol one can compute the octanol/water partitioning
coefficient, an important quantity that is used to characterize
drug-like compounds.
The MD simulations can be performed with all recent versions of Gromacs, starting with Gromacs 4.6.x.
2.1. How to use the module¶
Before you can start you will need
- a coordinate file for the small molecule
- a Gromacs OPLS/AA topology (itp) file
- an installation of Gromacs in a supported version.
2.1.1. Basic work flow¶
You will typically calculate two solvation free energies (free energy of transfer of the solute from the liquid into the vacuum phase):
- solvent = water
- set up a short equilibrium simulation of the molecule in a water box (and run the MD simulation);
- set up a free energy perturbation calculation of the ligand in water , which will yield the hydration free energy;
- solvent = octanol
- set up a short equilibrium simulation of the molecule in a octanol box (and run the MD simulation);
- set up a free energy perturbation calculation of the ligand in octanol , which will yield the solvation free energy in octanol;
- run these simulations on a cluster;
- analyze the output and combine the free energies to arrive at an estimate of the octanol-water partition coefficient;
- plot results using
mdpow.analysis.plot_exp_vs_comp()
.
2.1.2. Customized submission scripts for queuing systems¶
One can also generate run scripts for various queuing systems; check
the documentation for gromacs.qsub
and in particular the
section on writing queuing system templates . You will have to
add a template script to your private GromacsWrapper template directory (
~/.gromacswrapper/qscripts
); in this example we call itmy_script.sge
;add the keyword qscript to the
mdpow.equil.Simulation.MD()
andmdpow.fep.Gsolv.setup()
invocations; e.g. asqscript = [‘my_script.sge’, ‘local.sh’]
submit the generated queuing system script to your queuing system, e.g.
cd Equilibrium/water qsub my_script.sh
2.2. The mdpow scripts¶
Some tasks are simplified by using scripts, which are installed in a bin
directory (or the directory pointed to by python setup.py
--install-scripts
). See The mdpow-* scripts for details and
Tutorial: Using the mdpow scripts to compute log POW of benzene for an example. The scripts essentially execute
the steps shown in Tutorial: Manual session: 1-octanol as a solute so in order to gain a better
understanding of MDpow it is suggested to look at both tutorials.
2.3. Tutorial: Using the mdpow scripts to compute log POW of benzene¶
The most straightforward use of MDpow is through the Python scripts described in The mdpow-* scripts. This tutorial shows how to use them to calculate log POW for benzene.
The examples/benzene
directory of the distribution contains a Gromacs
OPLS/AA topology for benzene (benzene.itp
) and a starting structure
(benzene.pdb
) together with a
run input configuration file for the mdpow scripts
(benzene.yml
) in YAML format.
Make a directory
benzene
and collect the files in the directory.Move into the parent directory of
benzene
; in this tutorial all files will be generated underbenzene
but the configuration file has recorded the location of the input files asbenzene/filename
. (For more on to make best use of file names in the configuration file see Advanced: Separating input from output data.)Set up Gromacs (e.g. by sourcing
GMXRC
).Note
Gromacs must be set up so that all Gromacs commands (such as mdrun or grompp can be found on the shell’s
PATH
). If this is not the case then the following will fail. Look at the log file (named mdpow.log) for error messages.)Generate the input files for a equilibrium simulation of benzene in water (TIP4P) and run the simulation:
mdpow-equilibrium --solvent water benzene/benzene.yml
The example file provided will only set up very short simulations and it will also directly call mdrun to run the simulations. This is configured via the
runlocal = True
parameter in theMD_relaxed
andMD_NPT
sections in thebenzene.yml
file.In fact, the following steps are carried out:
- generate a system topology (section
Setup
) - solvate the compound (section
Setup
) - energy minimise the system (section
Setup
) - run a short relaxation with a time step of 0.1 fs in order to remove any
steric clashes; this step was found to enable much more robust
simulations, especially when using octanol as a solvent. Section
MD_relaxed
controls this step. - run a equilibrium MD simulation at constant pressure and temperature
using a time step of 2 fs. Section
MD_NPT
controls this step.
- generate a system topology (section
Generate the input files for a equilibrium simulation of benzene in octanol (OPLS/AA parameters) and run the simulation:
mdpow-equilibrium --solvent octanol benzene/benzene.yml
The steps are analogous to the ones described under 4.
Use the last frame of the equilibrium simulation as a starting point for the free energy perturbation (FEP) calculations. This step is controlled by the
FEP
section in the run input configuration file.The example only runs very short windows (
runtime: 25
ps) but it will run all 21 individual simulations sequentially (runlocal: True
). Thus it is recommended to run this example on a fast multi-core workstation and at least Gromacs 4.5.x, which has thread support for mdrun.The FEP windows for benzene in water are generated and run by
mdpow-fep --solvent water benzene/benzene.yml
(In order to run the FEP windows on a cluster see Advanced: Generating queuing system scripts.)
The FEP windows for benzene in octanol are generated and run by
mdpow-fep --solvent octanol benzene/benzene.yml
Analyse the simulation output with mdpow-pow. It collects the raw data from the FEP simulations and computes the free energies of solvation using thermodynamic integration (TI) together with error estimates. log POW is calculated from the difference of the octanol and water solvation free energies:
mdpow-pow benzene
benzene
is the directory name under which the FEP simulations are stored. By default, results are appended to the filesenergies.txt
andpow.txt
.See also
The output formats are explained with examples in Output data file formats.
2.3.1. Advanced: Generating queuing system scripts¶
(To be written)
See also
gromacs.qsub
for the reference on how queuing system script
templates are handled and processed
2.3.2. Advanced: Separating input from output data¶
Make another directory under which simulation data will be stored; in this
tutorial it will be called WORK
and we assume that all directories reside
in ~/Projects/POW
. We will end up with a directory layout under
~/Projects/POW
like this:
benzene/
benzene.itp
benzene.pdb
benzene.yml
WORK/
benzene/
water/
octanol/
Edit benzene.yml
to put in absolute paths to the input
files: In the setup
section use absolute paths to the itp
and pdb files of benzene:
setup:
name: "benzene"
molecule: "BNZ"
structure: "~/Projects/POW/benzene/benzene.pdb"
itp: "~/Projects/POW/benzene/benzene.itp"
With absolute paths defined, it is easy to generate all other files under
WORK/benzene
(the directory name “benzene” is the name entry from the
setup
section of the configuration file):
cd WORK
mdpow-equilibrium --solvent water ../benzene/benzene.yml
mdpow-equilibrium --solvent octanol ../benzene/benzene.yml
mdpow-fep --solvent water ../benzene/benzene.yml
mdpow-fep --solvent octanol ../benzene/benzene.yml
Finally, calculate log POW:
cd WORK
mdpow-pow benzene
2.4. Tutorial: Manual session: 1-octanol as a solute¶
In the following interactive python session we use octanol as an example for a solute; all files are present in the package so one can work through the example immediately.
Before starting python (preferrably ipython) make sure that the
Gromacs 4.6.x tools can be found, e.g. which grompp
should show you the
path to grompp.
2.4.1. Water¶
2.4.1.1. Equilibrium simulation¶
Make a directory octanol
and copy the octanol.itp
and
octanol.gro
file into it. Launch ipython from this
directory and type:
import mdpow.equil
S = mdpow.equil.WaterSimulation(molecule="OcOH")
S.topology(itp="octanol.itp")
S.solvate(struct="octanol.gro")
S.energy_minimize()
S.MD_relaxed()
# run the simulation in the MD_relaxed/ directory
S.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial
S.save("water.simulation") # save setup for later (analysis stage)
Background (Ctrl-Z) or quit (Ctrl-D) python and run the
simulations in the MD_relaxed
and MD_NPT
subdirectory. You can modify
the local.sh
script to your ends or use qscript to generate queuing system
scripts.
Note
Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.
2.4.1.2. Hydration free energy¶
Reopen the python session and set up a Ghyd
object:
import mdpow.fep
gwat = mdpow.fep.Ghyd(molecule="OcOH", top="Equilibrium/water/top/system.top", struct="Equilibrium/water/MD_NPT/md.pdb", ndx="Equilibrium/water/solvation/main.ndx", runtime=100)
Alternatively, one can save some typing if we continue the last session and use
the mdpow.equil.Simulation
object (which we can re-load from its saved
state file from disk):
import mdpow.equil
S = mdpow.equil.WaterSimulation(filename="water.simulation") # only needed when quit
gwat = mdpow.fep.Ghyd(simulation=S, runtime=100)
This generates all the input files under FEP/water
.
Note
Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns).
Then set up all input files:
gwat.setup(qscript=['my_script.sge', 'local.sh'])
(The details of the FEP runs can be customized by setting some keywords (such as
lambda_vdw, lamda_coulomb, see mdpow.fep.Gsolv
for details) or by
deriving a new class from the mdpow.fep.Ghyd
base class but this is not
covered in this tutorial.)
2.4.2. Octanol¶
2.4.2.1. Equilibrium simulation¶
Almost identical to the water case:
O = mdpow.equil.OctanolSimulation(molecule="OcOH")
O.topology(itp="octanol.itp")
O.solvate(struct="octanol.gro")
O.energy_minimize()
O.MD_relaxed()
O.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial
O.save()
Note
Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.
2.4.2.2. Octanol solvation free energy¶
Almost identical setup as in the water case:
goct = mdpow.fep.Goct(simulation=O, runtime=100)
goct.setup(qscript=['my_script.sge', 'local.sh'])
This generates all the input files under FEP/octanol
.
Note
Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns)
2.4.3. Running the FEP simulations¶
The files are under the FEP/water
and FEP/octanol
directories
in separate sub directories.
Either run job arrays that should have been generated from the
my_script.sge
template
qsub Coul_my_script.sge
qsub VDW_my_script.sge
Or run each job in its own directory. Note that mdrun should be called with at least the following options
mdrun -deffnm $DEFFNM -dgdl
where DEFFNM is typically “md”; see the run local.sh
script in
each direcory for hints on what needs to be done.
2.4.4. Analyze output and log POW calculation¶
For the water and octanol FEPs do
gwat.collect()
gwat.analyze()
goct.collect()
goct.analyze()
The analyze step reports the estimate for the free energy difference.
Calculate the free energy for transferring the solute from water to octanol and octanol-water partition coefficient log P_OW
mdpow.fep.pOW(gwat, goct)
(see mdpow.fep.pOW()
for details and definitions).
All individual results can also accessed as a dictionary
gwat.results.DeltaA
Free energy of transfer from water to octanol:
goct.results.DeltaA.Gibbs - gwat.results.DeltaA.Gibbs
The individual components are
- Gibbs
total free energy difference of transfer from solvent to vacuum at the Ben-Naim standard state (i.e. 1M solution/1M gas phase) in kJ/mol;
DeltaG0 = (G_solv - G_vac)We calculate the Gibbs free energy (at constant pressure P) by using the NPT ensemble for all MD simulations.
- coulomb
- contribution of the de-charging process to DeltaG
- vdw
- contribution of the de-coupling process to DeltaG
To plot the data (de-charging and de-coupling):
import pylab
gwat.plot()
pylab.figure()
goct.plot()
For comparison to experimental values see mdpow.analysis
.
2.4.5. Error analysis¶
The data points are the (time) average <G> of G = dV/dl over each
window. The error bars s_G are the error of the mean <G>. They are computed
from the auto-correlation time of the fluctuations and the standard deviation
(see Frenkel and Smit, p526 and numkit.timeseries.tcorrel()
):
s_G**2 = 2*tc*acf(0)/T
where tc is the decay time of the ACF of <(G-<G>)**2> (assumed to follow f(t) = exp(-t/tc) and hence calculated from the integral of the ACF to its first root); T is the total runtime.
Errors on the energies are calculated via the propagation of the errors s_G
through the thermodynamic integration and the subsequent thermodynamic sums
(see numkit.integration.simps_error()
numkit.observables.QuantityWithError
for details).
- If the graphs do not look smooth or the errors are large then a longer runtime is definitely required. It might also be necessary to add additional lambda values in regions where the function changes rapidly.
- The errors on the Coulomb and VDW free energies should be of similar magnitude because there is no point in being very accurate in one if the other is inaccurate.
- For water the “canonical” lambda schedule produces errors <0.5 kJ/mol (or sometimes much better) in the Coulomb and VDW free energy components.
- For octanol the errors on the coulomb dis-charging free energy can become large (up to 4 kJ/mol) and thus completely swamp the final estimate. Additional lambdas 0.125 and 0.375 should improve the precision of the calculations.