2. mdpow — Computing the octanol/water partitioning coefficient

The mdpow module helps in setting up and analyzing absolute free energy calculations of small molecules by molecular dynamics (MD) simulations. By computing the hydration free energy and the solvation free energy in octanol one can compute the octanol/water partitioning coefficient, an important quantity that is used to characterize drug-like compounds.

The MD simulations can be performed with all recent versions of Gromacs, starting with Gromacs 4.6.x.

2.1. How to use the module

Before you can start you will need

  • a coordinate file for the small molecule
  • a Gromacs OPLS/AA topology (itp) file
  • an installation of Gromacs in a supported version.

2.1.1. Basic work flow

You will typically calculate two solvation free energies (free energy of transfer of the solute from the liquid into the vacuum phase):

  1. solvent = water
    1. set up a short equilibrium simulation of the molecule in a water box (and run the MD simulation);
    2. set up a free energy perturbation calculation of the ligand in water , which will yield the hydration free energy;
  2. solvent = octanol
    1. set up a short equilibrium simulation of the molecule in a octanol box (and run the MD simulation);
    2. set up a free energy perturbation calculation of the ligand in octanol , which will yield the solvation free energy in octanol;
  3. run these simulations on a cluster;
  4. analyze the output and combine the free energies to arrive at an estimate of the octanol-water partition coefficient;
  5. plot results using mdpow.analysis.plot_exp_vs_comp().

2.1.2. Customized submission scripts for queuing systems

One can also generate run scripts for various queuing systems; check the documentation for gromacs.qsub and in particular the section on writing queuing system templates . You will have to

  • add a template script to your private GromacsWrapper template directory (~/.gromacswrapper/qscripts); in this example we call it my_script.sge;

  • add the keyword qscript to the mdpow.equil.Simulation.MD() and mdpow.fep.Gsolv.setup() invocations; e.g. as

    qscript = [‘my_script.sge’, ‘local.sh’]

  • submit the generated queuing system script to your queuing system, e.g.

    cd Equilibrium/water
    qsub my_script.sh
    

2.2. The mdpow scripts

Some tasks are simplified by using scripts, which are installed in a bin directory (or the directory pointed to by python setup.py --install-scripts). See The mdpow-* scripts for details and Tutorial: Using the mdpow scripts to compute log POW of benzene for an example. The scripts essentially execute the steps shown in Tutorial: Manual session: 1-octanol as a solute so in order to gain a better understanding of MDpow it is suggested to look at both tutorials.

2.3. Tutorial: Using the mdpow scripts to compute log POW of benzene

The most straightforward use of MDpow is through the Python scripts described in The mdpow-* scripts. This tutorial shows how to use them to calculate log POW for benzene.

The examples/benzene directory of the distribution contains a Gromacs OPLS/AA topology for benzene (benzene.itp) and a starting structure (benzene.pdb) together with a run input configuration file for the mdpow scripts (benzene.yml) in YAML format.

_images/benzene.pdb.png
  1. Make a directory benzene and collect the files in the directory.

  2. Move into the parent directory of benzene; in this tutorial all files will be generated under benzene but the configuration file has recorded the location of the input files as benzene/filename. (For more on to make best use of file names in the configuration file see Advanced: Separating input from output data.)

  3. Set up Gromacs (e.g. by sourcing GMXRC).

    Note

    Gromacs must be set up so that all Gromacs commands (such as mdrun or grompp can be found on the shell’s PATH). If this is not the case then the following will fail. Look at the log file (named mdpow.log) for error messages.)

  4. Generate the input files for a equilibrium simulation of benzene in water (TIP4P) and run the simulation:

    mdpow-equilibrium --solvent water benzene/benzene.yml
    

    The example file provided will only set up very short simulations and it will also directly call mdrun to run the simulations. This is configured via the runlocal = True parameter in the MD_relaxed and MD_NPT sections in the benzene.yml file.

    In fact, the following steps are carried out:

    1. generate a system topology (section Setup)
    2. solvate the compound (section Setup)
    3. energy minimise the system (section Setup)
    4. run a short relaxation with a time step of 0.1 fs in order to remove any steric clashes; this step was found to enable much more robust simulations, especially when using octanol as a solvent. Section MD_relaxed controls this step.
    5. run a equilibrium MD simulation at constant pressure and temperature using a time step of 2 fs. Section MD_NPT controls this step.
  5. Generate the input files for a equilibrium simulation of benzene in octanol (OPLS/AA parameters) and run the simulation:

    mdpow-equilibrium --solvent octanol benzene/benzene.yml
    

    The steps are analogous to the ones described under 4.

  6. Use the last frame of the equilibrium simulation as a starting point for the free energy perturbation (FEP) calculations. This step is controlled by the FEP section in the run input configuration file.

    The example only runs very short windows (runtime: 25 ps) but it will run all 21 individual simulations sequentially (runlocal: True). Thus it is recommended to run this example on a fast multi-core workstation and at least Gromacs 4.5.x, which has thread support for mdrun.

    The FEP windows for benzene in water are generated and run by

    mdpow-fep --solvent water benzene/benzene.yml
    

    (In order to run the FEP windows on a cluster see Advanced: Generating queuing system scripts.)

  7. The FEP windows for benzene in octanol are generated and run by

    mdpow-fep --solvent octanol benzene/benzene.yml
    
  8. Analyse the simulation output with mdpow-pow. It collects the raw data from the FEP simulations and computes the free energies of solvation using thermodynamic integration (TI) together with error estimates. log POW is calculated from the difference of the octanol and water solvation free energies:

    mdpow-pow benzene
    

    benzene is the directory name under which the FEP simulations are stored. By default, results are appended to the files energies.txt and pow.txt.

    See also

    The output formats are explained with examples in Output data file formats.

2.3.1. Advanced: Generating queuing system scripts

(To be written)

See also

gromacs.qsub for the reference on how queuing system script templates are handled and processed

2.3.2. Advanced: Separating input from output data

Make another directory under which simulation data will be stored; in this tutorial it will be called WORK and we assume that all directories reside in ~/Projects/POW. We will end up with a directory layout under ~/Projects/POW like this:

benzene/
       benzene.itp
       benzene.pdb
       benzene.yml
WORK/
     benzene/
             water/
             octanol/

Edit benzene.yml to put in absolute paths to the input files: In the setup section use absolute paths to the itp and pdb files of benzene:

setup:
  name: "benzene"
  molecule:  "BNZ"
  structure: "~/Projects/POW/benzene/benzene.pdb"
  itp: "~/Projects/POW/benzene/benzene.itp"

With absolute paths defined, it is easy to generate all other files under WORK/benzene (the directory name “benzene” is the name entry from the setup section of the configuration file):

cd WORK
mdpow-equilibrium --solvent water ../benzene/benzene.yml
mdpow-equilibrium --solvent octanol ../benzene/benzene.yml
mdpow-fep --solvent water ../benzene/benzene.yml
mdpow-fep --solvent octanol ../benzene/benzene.yml

Finally, calculate log POW:

cd WORK
mdpow-pow benzene

2.4. Tutorial: Manual session: 1-octanol as a solute

In the following interactive python session we use octanol as an example for a solute; all files are present in the package so one can work through the example immediately.

Before starting python (preferrably ipython) make sure that the Gromacs 4.6.x tools can be found, e.g. which grompp should show you the path to grompp.

2.4.1. Water

2.4.1.1. Equilibrium simulation

Make a directory octanol and copy the octanol.itp and octanol.gro file into it. Launch ipython from this directory and type:

import mdpow.equil
S = mdpow.equil.WaterSimulation(molecule="OcOH")
S.topology(itp="octanol.itp")
S.solvate(struct="octanol.gro")
S.energy_minimize()
S.MD_relaxed()
# run the simulation in the MD_relaxed/ directory
S.MD(runtime=50, qscript=['my_script.sge', 'local.sh'])  # only run for 50 ps in this tutorial
S.save("water.simulation")                               # save setup for later (analysis stage)

Background (Ctrl-Z) or quit (Ctrl-D) python and run the simulations in the MD_relaxed and MD_NPT subdirectory. You can modify the local.sh script to your ends or use qscript to generate queuing system scripts.

Note

Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.

2.4.1.2. Hydration free energy

Reopen the python session and set up a Ghyd object:

import mdpow.fep
gwat = mdpow.fep.Ghyd(molecule="OcOH", top="Equilibrium/water/top/system.top.template", struct="Equilibrium/water/MD_NPT/md.pdb", ndx="Equilibrium/water/solvation/main.ndx", runtime=100)

Alternatively, one can save some typing if we continue the last session and use the mdpow.equil.Simulation object (which we can re-load from its saved state file from disk):

import mdpow.equil
S = mdpow.equil.WaterSimulation(filename="water.simulation")  # only needed when quit
gwat = mdpow.fep.Ghyd(simulation=S, runtime=100)

This generates all the input files under FEP/water.

Note

Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns).

Then set up all input files:

gwat.setup(qscript=['my_script.sge', 'local.sh'])

(The details of the FEP runs can be customized by setting some keywords (such as lambda_vdw, lamda_coulomb, see mdpow.fep.Gsolv for details) or by deriving a new class from the mdpow.fep.Ghyd base class but this is not covered in this tutorial.)

2.4.2. Octanol

2.4.2.1. Equilibrium simulation

Almost identical to the water case:

O = mdpow.equil.OctanolSimulation(molecule="OcOH")
O.topology(itp="octanol.itp")
O.solvate(struct="octanol.gro")
O.energy_minimize()
O.MD_relaxed()
O.MD(runtime=50, qscript=['my_script.sge', 'local.sh'])    # only run for 50 ps in this tutorial
O.save()

Note

Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data.

2.4.2.2. Octanol solvation free energy

Almost identical setup as in the water case:

goct = mdpow.fep.Goct(simulation=O, runtime=100)
goct.setup(qscript=['my_script.sge', 'local.sh'])

This generates all the input files under FEP/octanol.

Note

Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns)

2.4.3. Running the FEP simulations

The files are under the FEP/water and FEP/octanol directories in separate sub directories.

Either run job arrays that should have been generated from the my_script.sge template

qsub Coul_my_script.sge
qsub VDW_my_script.sge

Or run each job in its own directory. Note that mdrun should be called with at least the following options

mdrun -deffnm $DEFFNM  -dgdl

where DEFFNM is typically “md”; see the run local.sh script in each direcory for hints on what needs to be done.

2.4.4. Analyze output and log POW calculation

For the water and octanol FEPs do

gwat.collect()
gwat.analyze()

goct.collect()
goct.analyze()

The analyze step reports the estimate for the free energy difference.

Calculate the free energy for transferring the solute from water to octanol and octanol-water partition coefficient log P_OW

mdpow.fep.pOW(gwat, goct)

(see mdpow.fep.pOW() for details and definitions).

All individual results can also accessed as a dictionary

gwat.results.DeltaA

Free energy of transfer from water to octanol:

goct.results.DeltaA.Gibbs - gwat.results.DeltaA.Gibbs

The individual components are

Gibbs

total free energy difference of transfer from solvent to vacuum at the Ben-Naim standard state (i.e. 1M solution/1M gas phase) in kJ/mol;

DeltaG0 = (G_solv - G_vac)

We calculate the Gibbs free energy (at constant pressure P) by using the NPT ensemble for all MD simulations.

coulomb
contribution of the de-charging process to DeltaG
vdw
contribution of the de-coupling process to DeltaG

To plot the data (de-charging and de-coupling):

import pylab
gwat.plot()
pylab.figure()
goct.plot()

For comparison to experimental values see mdpow.analysis.

2.4.5. Error analysis

The data points are the (time) average <G> of G = dV/dl over each window. The error bars s_G are the error of the mean <G>. They are computed from the auto-correlation time of the fluctuations and the standard deviation (see Frenkel and Smit, p526 and numkit.timeseries.tcorrel()):

s_G**2 = 2*tc*acf(0)/T

where tc is the decay time of the ACF of <(G-<G>)**2> (assumed to follow f(t) = exp(-t/tc) and hence calculated from the integral of the ACF to its first root); T is the total runtime.

Errors on the energies are calculated via the propagation of the errors s_G through the thermodynamic integration and the subsequent thermodynamic sums (see numkit.integration.simps_error() numkit.observables.QuantityWithError for details).

  • If the graphs do not look smooth or the errors are large then a longer runtime is definitely required. It might also be necessary to add additional lambda values in regions where the function changes rapidly.
  • The errors on the Coulomb and VDW free energies should be of similar magnitude because there is no point in being very accurate in one if the other is inaccurate.
  • For water the “canonical” lambda schedule produces errors <0.5 kJ/mol (or sometimes much better) in the Coulomb and VDW free energy components.
  • For octanol the errors on the coulomb dis-charging free energy can become large (up to 4 kJ/mol) and thus completely swamp the final estimate. Additional lambdas 0.125 and 0.375 should improve the precision of the calculations.