===================================================================== :mod:`mdpow` --- Computing the octanol/water partitioning coefficient ===================================================================== The :mod:`mdpow` module helps in setting up and analyzing absolute free energy calculations of small molecules by molecular dynamics (MD) simulations. By computing the hydration free energy and the solvation free energy in octanol one can compute the octanol/water partitioning coefficient, an important quantity that is used to characterize drug-like compounds. The MD simulations can be performed with all recent versions of Gromacs_, starting with Gromacs 4.6.x. .. _Gromacs: http://www.gromacs.org .. |Pow| replace:: *P*\ :sub:`OW` .. |logPow| replace:: log *P*\ :sub:`OW` How to use the module ===================== Before you can start you will need - a coordinate file for the small molecule - a Gromacs OPLS/AA topology (itp) file - an installation of Gromacs_ in a :ref:`supported version`. Basic work flow --------------- You will typically calculate two solvation free energies (free energy of transfer of the solute from the liquid into the vacuum phase): 1. solvent = *water* 1. set up a short equilibrium simulation of the molecule in a *water* box (and run the MD simulation); 2. set up a free energy perturbation calculation of the ligand in water , which will yield the hydration free energy; 2. solvent = *octanol* 1. set up a short equilibrium simulation of the molecule in a *octanol* box (and run the MD simulation); 2. set up a free energy perturbation calculation of the ligand in octanol , which will yield the solvation free energy in octanol; 3. run these simulations on a cluster; 4. analyze the output and combine the free energies to arrive at an estimate of the octanol-water partition coefficient; 5. plot results using :func:`mdpow.analysis.plot_exp_vs_comp`. Customized submission scripts for queuing systems ------------------------------------------------- One can also generate run scripts for various queuing systems; check the documentation for :mod:`gromacs.qsub` and in particular the section on `writing queuing system templates`_ . You will have to - add a template script to your private GromacsWrapper template directory (``~/.gromacswrapper/qscripts``); in this example we call it ``my_script.sge``; - add the keyword *qscript* to the :meth:`mdpow.equil.Simulation.MD` and :meth:`mdpow.fep.Gsolv.setup` invocations; e.g. as *qscript* = ['my_script.sge', 'local.sh'] - submit the generated queuing system script to your queuing system, e.g. :: cd Equilibrium/water qsub my_script.sh .. _writing queuing system templates: http://gromacswrapper.readthedocs.io/en/latest/gromacs/blocks/qsub.html#queuing-system-templates The mdpow scripts ================= Some tasks are simplified by using scripts, which are installed in a bin directory (or the directory pointed to by ``python setup.py --install-scripts``). See :doc:`scripts` for details and :ref:`tutorial-benzene-label` for an example. The scripts essentially execute the steps shown in :ref:`example_octanol-label` so in order to gain a better understanding of MDpow it is suggested to look at both tutorials. .. _tutorial-benzene-label: Tutorial: Using the mdpow scripts to compute |logPow| of benzene ================================================================ The most straightforward use of MDpow is through the Python scripts described in :doc:`scripts`. This tutorial shows how to use them to calculate |logPow| for benzene. The :file:`examples/benzene` directory of the distribution contains a Gromacs OPLS/AA topology for benzene (:download:`benzene.itp <../../examples/benzene/benzene.itp>`) and a starting structure (:download:`benzene.pdb <../../examples/benzene/benzene.pdb>`) together with a *run input configuration file* for the mdpow scripts (:download:`benzene.yml <../../examples/benzene.yml>`) in YAML format. .. image:: ../../examples/benzene/benzene.pdb.png :width: 200 #. Make a directory ``benzene`` and collect the files in the directory. #. Move into the parent directory of ``benzene``; in this tutorial all files will be generated under ``benzene`` but the configuration file has recorded the location of the input files as :file:`benzene/{filename}`. (For more on to make best use of file names in the configuration file see :ref:`advanced-mdpow-filenames-label`.) #. Set up Gromacs_ (e.g. by sourcing :file:`GMXRC`). .. Note:: Gromacs_ must be set up so that all Gromacs commands (such as :program:`mdrun` or :program:`grompp` can be found on the shell's :envvar:`PATH`). If this is not the case then the following will fail. Look at the **log file** (named **mdpow.log**) for error messages.) #. Generate the input files for a equilibrium simulation of benzene in **water** (TIP4P) and run the simulation:: mdpow-equilibrium --solvent water benzene/benzene.yml The example file provided will only set up very short simulations and it will also directly call :program:`mdrun` to run the simulations. This is configured via the ``runlocal = True`` parameter in the ``MD_relaxed`` and ``MD_NPT`` sections in the :file:`benzene.yml` file. In fact, the following steps are carried out: #. generate a system topology (section ``Setup``) #. solvate the compound (section ``Setup``) #. energy minimise the system (section ``Setup``) #. run a short relaxation with a time step of 0.1 fs in order to remove any steric clashes; this step was found to enable much more robust simulations, especially when using octanol as a solvent. Section ``MD_relaxed`` controls this step. #. run a equilibrium MD simulation at constant pressure and temperature using a time step of 2 fs. Section ``MD_NPT`` controls this step. #. Generate the input files for a equilibrium simulation of benzene in **octanol** (OPLS/AA parameters) and run the simulation:: mdpow-equilibrium --solvent octanol benzene/benzene.yml The steps are analogous to the ones described under 4. #. Use the last frame of the equilibrium simulation as a starting point for the free energy perturbation (FEP) calculations. This step is controlled by the ``FEP`` section in the run input configuration file. The example only runs very short windows (``runtime: 25`` ps) but it will run *all* 21 individual simulations sequentially (``runlocal: True``). Thus it is recommended to run this example on a fast multi-core workstation and at least Gromacs 4.5.x, which has thread support for :program:`mdrun`. The FEP windows for benzene in **water** are generated and run by :: mdpow-fep --solvent water benzene/benzene.yml (In order to run the FEP windows on a cluster see :ref:`advanced-mdpow-qscript-label`.) #. The FEP windows for benzene in **octanol** are generated and run by :: mdpow-fep --solvent octanol benzene/benzene.yml #. Analyse the simulation output with :ref:`mdpow-pow `. It collects the raw data from the FEP simulations and computes the free energies of solvation using thermodynamic integration (TI) together with error estimates. |logPow| is calculated from the difference of the octanol and water solvation free energies:: mdpow-pow benzene ``benzene`` is the directory name under which the FEP simulations are stored. By default, results are *appended* to the files :file:`energies.txt` and :file:`pow.txt`. .. SeeAlso:: The output formats are explained with examples in :ref:`mdpow-pow-outputformat-label`. .. _advanced-mdpow-qscript-label: Advanced: Generating queuing system scripts ------------------------------------------- (To be written) .. SeeAlso:: :mod:`gromacs.qsub` for the reference on how queuing system script templates are handled and processed .. _advanced-mdpow-filenames-label: Advanced: Separating input from output data ------------------------------------------- Make another directory under which simulation data will be stored; in this tutorial it will be called ``WORK`` and we assume that all directories reside in ``~/Projects/POW``. We will end up with a directory layout under ``~/Projects/POW`` like this:: benzene/ benzene.itp benzene.pdb benzene.yml WORK/ benzene/ water/ octanol/ Edit :file:`benzene.yml` to put in *absolute paths* to the input files: In the ``setup`` section use absolute paths to the itp and pdb files of benzene: .. code-block:: yaml setup: name: "benzene" molecule: "BNZ" structure: "~/Projects/POW/benzene/benzene.pdb" itp: "~/Projects/POW/benzene/benzene.itp" With absolute paths defined, it is easy to generate all other files under ``WORK/benzene`` (the directory name "benzene" is the *name* entry from the ``setup`` section of the configuration file):: cd WORK mdpow-equilibrium --solvent water ../benzene/benzene.yml mdpow-equilibrium --solvent octanol ../benzene/benzene.yml mdpow-fep --solvent water ../benzene/benzene.yml mdpow-fep --solvent octanol ../benzene/benzene.yml Finally, calculate |logPow|:: cd WORK mdpow-pow benzene .. _example_octanol-label: Tutorial: Manual session: 1-octanol as a solute =============================================== In the following interactive python session we use octanol as an example for a solute; all files are present in the package so one can work through the example immediately. Before starting :program:`python` (preferrably ipython_) make sure that the Gromacs_ 4.6.x tools can be found, e.g. ``which grompp`` should show you the path to :program:`grompp`. .. _ipython: http://ipython.scipy.org .. _Gromacs: http://www.gromacs.org Water ----- Equilibrium simulation ...................... Make a directory ``octanol`` and copy the ``octanol.itp`` and ``octanol.gro`` file into it. Launch :program:`ipython` from this directory and type:: import mdpow.equil S = mdpow.equil.WaterSimulation(molecule="OcOH") S.topology(itp="octanol.itp") S.solvate(struct="octanol.gro") S.energy_minimize() S.MD_relaxed() # run the simulation in the MD_relaxed/ directory S.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial S.save("water.simulation") # save setup for later (analysis stage) Background (:kbd:`Ctrl-Z`) or quit (:kbd:`Ctrl-D`) python and run the simulations in the ``MD_relaxed`` and ``MD_NPT`` subdirectory. You can modify the ``local.sh`` script to your ends or use *qscript* to generate queuing system scripts. .. note:: Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data. Hydration free energy ..................... Reopen the python session and set up a :class:`~mdpow.fep.Ghyd` object:: import mdpow.fep gwat = mdpow.fep.Ghyd(molecule="OcOH", top="Equilibrium/water/top/system.top.template", struct="Equilibrium/water/MD_NPT/md.pdb", ndx="Equilibrium/water/solvation/main.ndx", runtime=100) Alternatively, one can save some typing if we continue the last session and use the :class:`mdpow.equil.Simulation` object (which we can re-load from its saved state file from disk):: import mdpow.equil S = mdpow.equil.WaterSimulation(filename="water.simulation") # only needed when quit gwat = mdpow.fep.Ghyd(simulation=S, runtime=100) This generates all the input files under ``FEP/water``. .. note:: Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns). Then set up all input files:: gwat.setup(qscript=['my_script.sge', 'local.sh']) (The details of the FEP runs can be customized by setting some keywords (such as *lambda_vdw*, *lamda_coulomb*, see :class:`mdpow.fep.Gsolv` for details) or by deriving a new class from the :class:`mdpow.fep.Ghyd` base class but this is not covered in this tutorial.) Octanol ------- Equilibrium simulation ...................... Almost identical to the water case:: O = mdpow.equil.OctanolSimulation(molecule="OcOH") O.topology(itp="octanol.itp") O.solvate(struct="octanol.gro") O.energy_minimize() O.MD_relaxed() O.MD(runtime=50, qscript=['my_script.sge', 'local.sh']) # only run for 50 ps in this tutorial O.save() .. note:: Here we only run 50 ps equilibrium MD for testing. For production this should be substantially longer, maybe even 50 ns if you want to extract thermodynamic data. Octanol solvation free energy ............................. Almost identical setup as in the water case:: goct = mdpow.fep.Goct(simulation=O, runtime=100) goct.setup(qscript=['my_script.sge', 'local.sh']) This generates all the input files under ``FEP/octanol``. .. note:: Here we only run 100 ps per window for testing. For production this should be rather something like 5-10 ns (the default is 5 ns) Running the FEP simulations --------------------------- The files are under the ``FEP/water`` and ``FEP/octanol`` directories in separate sub directories. Either run job arrays that should have been generated from the ``my_script.sge`` template :: qsub Coul_my_script.sge qsub VDW_my_script.sge Or run each job in its own directory. Note that :program:`mdrun` should be called with at least the following options :: mdrun -deffnm $DEFFNM -dgdl where DEFFNM is typically "md"; see the run ``local.sh`` script in each direcory for hints on what needs to be done. Analyze output and |logPow| calculation --------------------------------------- For the water and octanol FEPs do :: gwat.collect() gwat.analyze() goct.collect() goct.analyze() The analyze step reports the estimate for the free energy difference. Calculate the free energy for transferring the solute from water to octanol and *octanol-water partition coefficient* log P_OW :: mdpow.fep.pOW(gwat, goct) (see :func:`mdpow.fep.pOW` for details and definitions). All individual results can also accessed as a dictionary :: gwat.results.DeltaA Free energy of transfer from water to octanol:: goct.results.DeltaA.Gibbs - gwat.results.DeltaA.Gibbs The individual components are *Gibbs* total free energy difference of transfer from solvent to vacuum at the Ben-Naim standard state (i.e. 1M solution/1M gas phase) in kJ/mol; DeltaG0 = (G_solv - G_vac) We calculate the Gibbs free energy (at constant pressure P) by using the *NPT* ensemble for all MD simulations. *coulomb* contribution of the de-charging process to DeltaG *vdw* contribution of the de-coupling process to DeltaG To plot the data (de-charging and de-coupling):: import pylab gwat.plot() pylab.figure() goct.plot() For comparison to experimental values see :mod:`mdpow.analysis`. Error analysis -------------- The data points are the (time) **average ** of G = dV/dl over each window. The **error bars** s_G are the error of the mean . They are computed from the auto-correlation time of the fluctuations and the standard deviation (see Frenkel and Smit, p526 and :meth:`numkit.timeseries.tcorrel`):: s_G**2 = 2*tc*acf(0)/T where tc is the decay time of the ACF of <(G-)**2> (assumed to follow f(t) = exp(-t/tc) and hence calculated from the integral of the ACF to its first root); T is the total runtime. Errors on the energies are calculated via the propagation of the errors s_G through the thermodynamic integration and the subsequent thermodynamic sums (see :func:`numkit.integration.simps_error` :class:`numkit.observables.QuantityWithError` for details). * If the graphs do not look smooth or the errors are large then a longer *runtime* is definitely required. It might also be necessary to add additional lambda values in regions where the function changes rapidly. * The errors on the Coulomb and VDW free energies should be of similar magnitude because there is no point in being very accurate in one if the other is inaccurate. * For water the "canonical" lambda schedule produces errors <0.5 kJ/mol (or sometimes much better) in the Coulomb and VDW free energy components. * For octanol the errors on the coulomb dis-charging free energy can become large (up to 4 kJ/mol) and thus completely swamp the final estimate. Additional lambdas 0.125 and 0.375 should improve the precision of the calculations.