5. mdpow.fep – Calculate free energy of solvation

Set up and run free energy perturbation (FEP) calculations to calculate the free energy of hydration of a solute in a solvent box. The protocol follows the works of D. Mobley (Free Energy Tutorial) and M. Shirts, and uses Gromacs 4.0.x.

Required Input:

  • topology
  • equilibrated structure of the solvated molecule

See the docs for Gsolv for details on what is calculated.

5.1. Differences to published protocols

Some notes on how the approach encoded here differs from what others (notabley Mobley) did:

  • We use Gromacs 4.x and use the new decoupling feature

    couple-intramol = no
    

    which indicates that “intra-molecular interactions remain”. It should (as I understand it) allow us to only calculate the free energy changes in solution so that we do not have to do an extra calculation in vacuo. http://www.mail-archive.com/gmx-users@gromacs.org/msg18803.html

    Mobley does an extra discharging calculation in vacuo and calculates

    \[\Delta A = \Delta A_{\mathrm{coul}}(\mathrm{vac}) - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\]

    (but also annihilates the interactions on the solute, corresponding to couple-intramol = yes) whereas we do

    \[\Delta A = - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\]
  • simulations are run as NPT (but make sure to use Langevin dynamics for temperature control)

5.2. Example

see mdpow

5.3. User reference

Simulation setup and analysis of all FEP simulations is encapsulated by a mdpow.fep.Gsolv object. For the hydration free energy there is a special class Ghyd and for the solvation free energy in octanol there is Goct. See the description of Gsolv for methods common to both.

class mdpow.fep.Gsolv(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]

Simulations to calculate and analyze the solvation free energy.

\(\Delta A\) is computed from the decharging and the decoupling step. With our choice of lambda=0 being the fully interacting and lambda=1 the non-interacting state, it is computed as

\[\Delta A = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

With this protocol, the concentration in the liquid and in the gas phase is the same. (Under the assumption of ideal solution/ideal gas behaviour this directly relates to the Ben-Naim 1M/1M standard state.)

(We neglect the negligible correction \(-kT \ln V_x/V_{\mathrm{sim}} = -kT \ln(1 - v_s/V_{\mathrm{sim}})\) where \(V_x\) is the volume of the system without the solute but the same number of water molecules as in the fully interacting case [see Michael Shirts’ Thesis, p82].)

Typical work flow:

G = Gsolv(simulation='drug.simulation')           # continue from :mod:`mdpow.equil`
G.setup(qscript=['my_template.sge', 'local.sh'])  # my_template.sge is user supplied
G.qsub()    # run SGE job arrays as generated from my_template.sge
G.analyze()
G.plot()

See gromacs.qsub for notes on how to write templates for queuing system scripts (in particular queuing system templates).

Set up Gsolv from input files or a equilibrium simulation.

Arguments:
molecule

name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]

top

topology [REQUIRED]

struct

solvated and equilibrated input structure [REQUIRED]

ndx

index file

dirname

directory to work under [‘FEP/solvent’]

solvent

name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to solvent_default.

lambda_coulomb

list of lambdas for discharging: q+vdw –> vdw

lambda_vdw

list of lambdas for decoupling: vdw –> none

runtime

simulation time per window in ps [5000]

temperature

temperature in Kelvin of the simulation [300.0]

qscript

template or list of templates for queuing system scripts (see gromacs.config.templates for details) [local.sh]

includes

include directories

simulation

Instead of providing the required arguments, obtain the input files from a mdpow.equil.Simulation instance.

method

“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.

mdp

MDP file name (if no entry then the package defaults are used)

filename

Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute filename of the default checkpoint file to filename.

basedir

Prepend basedir to all filenames; None disables [.]

permissive

Set to True if you want to read past corrupt data in output xvg files (see gromacs.formats.XVG for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False]

stride

collect every stride data line, see Gsolv.collect() [1]

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

kwargs

other undocumented arguments (see source for the moment)

Note

Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of FEPschedule in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.

analyze(force=False, stride=None, autosave=True, ncorrel=25000)[source]

Extract dV/dl from output and calculate dG by TI.

Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined lambda=0 as interaction switched on and lambda=1 as switched off.

\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

Data are stored in Gsolv.results.

The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see scipy.integrate.simps() for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.

For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].

Errors are estimated from the errors of the individual <dV/dlambda>:

  1. The error of the mean <dV/dlambda> is calculated via the decay time of the fluctuation around the mean. ncorrel is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. See numkit.timeseries.tcorrel().
  2. The error on the integral is calculated analytically via propagation of errors through Simpson’s rule (with the approximation that all spacings are assumed to be equal; taken as the average over all spacings as implemented in numkit.integration.simps_error()).

Note

For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.

[Jorge2010]M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c.
Keywords:
force

reload raw data even though it is already loaded

stride

read data every stride lines, None uses the class default

autosave

save to the pickle file when results have been computed

ncorrel

aim for <= 25,000 samples for t_correl

Notes

The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:

\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]

where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
analyze_alchemlyb(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)[source]

Compute the free energy from the simulation data with alchemlyb.

Unlike analyze(), the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.

arraylabel(component)[source]

Batch submission script name for a job array.

collect(stride=None, autosave=True, autocompress=True)[source]

Collect dV/dl from output.

Keywords:
stride

read data every stride lines, None uses the class default

autosave

immediately save the class pickle fep file

autocompress

compress the xvg file with bzip2 (saves >70% of space)

collect_alchemlyb(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)[source]

Collect the data files using alchemlyb.

Unlike collect(), this method can subsample with the statistical inefficiency (parameter SI).

compress_dgdl_xvg()[source]

Compress all dgdl xvg files with bzip2.

Note

After running this method you might want to run collect() to ensure that the results in results.xvg point to the compressed files. Otherwise IOError might occur which fail to find a md.xvg file.

contains_corrupted_xvgs()[source]

Check if any of the source datafiles had reported corrupted lines.

Returns:True if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.

For debugging purposes, the number of corrupted lines are stored in Gsolv._corrupted as dicts of dicts with the component as primary and the lambda as secondary key.

convert_edr()[source]

Convert EDR files to compressed XVG files.

dgdl_edr(*args)[source]

Return filename of the dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to EDR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_total_edr(*args, **kwargs)[source]

Return filename of the combined dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Keywords:
total_edr_name

Name of the user defined total edr file.

Returns:

path to total EDR

dgdl_tpr(*args)[source]

Return filename of the dgdl TPR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to TPR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_xvg(*args)[source]

Return filename of the dgdl XVG file.

Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

Checks if a compressed file exists and returns the appropriate filename.

Raises:

IOError with error code ENOENT if no file could be found

estimators = {'BAR': {'estimator': <class 'alchemlyb.estimators.bar_.BAR'>, 'extract': <function extract_u_nk>}, 'MBAR': {'estimator': <class 'alchemlyb.estimators.mbar_.MBAR'>, 'extract': <function extract_u_nk>}, 'TI': {'estimator': <class 'alchemlyb.estimators.ti_.TI'>, 'extract': <function extract_dHdl>}}

Estimators in alchemlyb

fep_dirs()[source]

Generator for all simulation sub directories

frombase(*args)[source]

Return path with Gsolv.basedir prefixed.

get_protocol(protocol)

Return method for protocol.

  • If protocol is a real method of the class then the method is returned.

  • If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature

    dummy_protocol(func, *args, **kwargs)

    Runs func with the arguments and keywords between calls to Journal.start() and Journal.completed(), with the stage set to protocol.

  • Raises a ValueError if the protocol is not registered (i.e. not found in Journalled.protocols).

has_dVdl()[source]

Check if dV/dl data have already been collected.

Returns:True if the dV/dl data have bee aquired (Gsolv.collect()) for all FEP simulations.
label(component)[source]

Simple label for component, e.g. for use in filenames.

load(filename=None)

Re-instantiate class from pickled file.

If no filename was supplied then the filename is taken from the attribute filename.

Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.

logger_DeltaA0()[source]

Print the free energy contributions (errors in parentheses).

mdp_default = 'bar_opls.mdp'

Default Gromacs MDP run parameter file for FEP. (The file is part of the package and is found with mdpow.config.get_template().)

plot(**kwargs)[source]

Plot the TI data with error bars.

Run mdpow.fep.Gsolv.analyze() first.

All kwargs are passed on to pylab.errorbar().

Returns:The axes of the subplot.
protocols = ['setup', 'fep_run']

Check list of all methods that can be run as an independent protocol; see also Simulation.get_protocol() and restart.Journal

qsub(script=None)[source]

Submit a batch script locally.

If script == None then take the first script (works well if only one template was provided).

results = None

Results from the analysis

save(filename=None)

Save instance to a pickle file.

The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute filename to the absolute path of the saved file.

scripts = None

Generated run scripts

setup(**kwargs)[source]

Prepare the input files for all Gromacs runs.

Keywords:
qscript

(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.

kwargs

Most kwargs are passed on to gromacs.setup.MD() although some are set to values that are required for the FEP functionality.

mdrun_opts

list of options to mdrun; -dhdl is always added to this list as it is required for the thermodynamic integration calculations

includes

list of directories where Gromacs input files can be found

The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing Gsolv.

Changed in version 0.6.0: Gromacs now uses option -dhdl instead of -dgdl.

summary()[source]

Return a string that summarizes the energetics.

Each energy component is followed by its error.

Format:

.                 ------ kJ/mol -----
molecule solvent  total  coulomb  vdw
tasklabel(component, lmbda)[source]

Batch submission script name for a single task job.

wdir(component, lmbda)[source]

Return rooted path to the work directory for component and lmbda.

(Constructed from frombase() and wname().)

wname(component, lmbda)[source]

Return name of the window directory itself.

Typically something like VDW/0000, VDW/0500, …, Coulomb/1000

write_DeltaA0(filename, mode='w')[source]

Write free energy components to a file.

Arguments:
filename

name of the text file

mode

‘w’ for overwrite or ‘a’ for append [‘w’]

Format::
. —– kJ/mol —— molecule solvent total coulomb vdw
class mdpow.fep.Ghyd(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]

Sets up and analyses MD to obtain the hydration free energy of a solute.

Set up Gsolv from input files or a equilibrium simulation.

Arguments:
molecule

name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]

top

topology [REQUIRED]

struct

solvated and equilibrated input structure [REQUIRED]

ndx

index file

dirname

directory to work under [‘FEP/solvent’]

solvent

name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to solvent_default.

lambda_coulomb

list of lambdas for discharging: q+vdw –> vdw

lambda_vdw

list of lambdas for decoupling: vdw –> none

runtime

simulation time per window in ps [5000]

temperature

temperature in Kelvin of the simulation [300.0]

qscript

template or list of templates for queuing system scripts (see gromacs.config.templates for details) [local.sh]

includes

include directories

simulation

Instead of providing the required arguments, obtain the input files from a mdpow.equil.Simulation instance.

method

“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.

mdp

MDP file name (if no entry then the package defaults are used)

filename

Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute filename of the default checkpoint file to filename.

basedir

Prepend basedir to all filenames; None disables [.]

permissive

Set to True if you want to read past corrupt data in output xvg files (see gromacs.formats.XVG for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False]

stride

collect every stride data line, see Gsolv.collect() [1]

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

kwargs

other undocumented arguments (see source for the moment)

Note

Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of FEPschedule in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.

analyze(force=False, stride=None, autosave=True, ncorrel=25000)

Extract dV/dl from output and calculate dG by TI.

Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined lambda=0 as interaction switched on and lambda=1 as switched off.

\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

Data are stored in Gsolv.results.

The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see scipy.integrate.simps() for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.

For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].

Errors are estimated from the errors of the individual <dV/dlambda>:

  1. The error of the mean <dV/dlambda> is calculated via the decay time of the fluctuation around the mean. ncorrel is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. See numkit.timeseries.tcorrel().
  2. The error on the integral is calculated analytically via propagation of errors through Simpson’s rule (with the approximation that all spacings are assumed to be equal; taken as the average over all spacings as implemented in numkit.integration.simps_error()).

Note

For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.

[Jorge2010]M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c.
Keywords:
force

reload raw data even though it is already loaded

stride

read data every stride lines, None uses the class default

autosave

save to the pickle file when results have been computed

ncorrel

aim for <= 25,000 samples for t_correl

Notes

The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:

\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]

where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
analyze_alchemlyb(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)

Compute the free energy from the simulation data with alchemlyb.

Unlike analyze(), the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.

arraylabel(component)

Batch submission script name for a job array.

collect(stride=None, autosave=True, autocompress=True)

Collect dV/dl from output.

Keywords:
stride

read data every stride lines, None uses the class default

autosave

immediately save the class pickle fep file

autocompress

compress the xvg file with bzip2 (saves >70% of space)

collect_alchemlyb(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)

Collect the data files using alchemlyb.

Unlike collect(), this method can subsample with the statistical inefficiency (parameter SI).

compress_dgdl_xvg()

Compress all dgdl xvg files with bzip2.

Note

After running this method you might want to run collect() to ensure that the results in results.xvg point to the compressed files. Otherwise IOError might occur which fail to find a md.xvg file.

contains_corrupted_xvgs()

Check if any of the source datafiles had reported corrupted lines.

Returns:True if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.

For debugging purposes, the number of corrupted lines are stored in Gsolv._corrupted as dicts of dicts with the component as primary and the lambda as secondary key.

convert_edr()

Convert EDR files to compressed XVG files.

dgdl_edr(*args)

Return filename of the dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to EDR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_total_edr(*args, **kwargs)

Return filename of the combined dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Keywords:
total_edr_name

Name of the user defined total edr file.

Returns:

path to total EDR

dgdl_tpr(*args)

Return filename of the dgdl TPR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to TPR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_xvg(*args)

Return filename of the dgdl XVG file.

Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

Checks if a compressed file exists and returns the appropriate filename.

Raises:

IOError with error code ENOENT if no file could be found

fep_dirs()

Generator for all simulation sub directories

frombase(*args)

Return path with Gsolv.basedir prefixed.

get_protocol(protocol)

Return method for protocol.

  • If protocol is a real method of the class then the method is returned.

  • If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature

    dummy_protocol(func, *args, **kwargs)

    Runs func with the arguments and keywords between calls to Journal.start() and Journal.completed(), with the stage set to protocol.

  • Raises a ValueError if the protocol is not registered (i.e. not found in Journalled.protocols).

has_dVdl()

Check if dV/dl data have already been collected.

Returns:True if the dV/dl data have bee aquired (Gsolv.collect()) for all FEP simulations.
label(component)

Simple label for component, e.g. for use in filenames.

load(filename=None)

Re-instantiate class from pickled file.

If no filename was supplied then the filename is taken from the attribute filename.

Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.

logger_DeltaA0()

Print the free energy contributions (errors in parentheses).

plot(**kwargs)

Plot the TI data with error bars.

Run mdpow.fep.Gsolv.analyze() first.

All kwargs are passed on to pylab.errorbar().

Returns:The axes of the subplot.
qsub(script=None)

Submit a batch script locally.

If script == None then take the first script (works well if only one template was provided).

save(filename=None)

Save instance to a pickle file.

The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute filename to the absolute path of the saved file.

setup(**kwargs)

Prepare the input files for all Gromacs runs.

Keywords:
qscript

(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.

kwargs

Most kwargs are passed on to gromacs.setup.MD() although some are set to values that are required for the FEP functionality.

mdrun_opts

list of options to mdrun; -dhdl is always added to this list as it is required for the thermodynamic integration calculations

includes

list of directories where Gromacs input files can be found

The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing Gsolv.

Changed in version 0.6.0: Gromacs now uses option -dhdl instead of -dgdl.

summary()

Return a string that summarizes the energetics.

Each energy component is followed by its error.

Format:

.                 ------ kJ/mol -----
molecule solvent  total  coulomb  vdw
tasklabel(component, lmbda)

Batch submission script name for a single task job.

wdir(component, lmbda)

Return rooted path to the work directory for component and lmbda.

(Constructed from frombase() and wname().)

wname(component, lmbda)

Return name of the window directory itself.

Typically something like VDW/0000, VDW/0500, …, Coulomb/1000

write_DeltaA0(filename, mode='w')

Write free energy components to a file.

Arguments:
filename

name of the text file

mode

‘w’ for overwrite or ‘a’ for append [‘w’]

Format::
. —– kJ/mol —— molecule solvent total coulomb vdw
class mdpow.fep.Goct(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]

Sets up and analyses MD to obtain the solvation free energy of a solute in octanol.

The coulomb lambda schedule is enhanced compared to water as the initial part of the dV/dl curve is quite sensitive. By adding two additional points we hope to reduce the overall error on the dis-charging free energy.

Set up Gsolv from input files or a equilibrium simulation.

Arguments:
molecule

name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]

top

topology [REQUIRED]

struct

solvated and equilibrated input structure [REQUIRED]

ndx

index file

dirname

directory to work under [‘FEP/solvent’]

solvent

name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to solvent_default.

lambda_coulomb

list of lambdas for discharging: q+vdw –> vdw

lambda_vdw

list of lambdas for decoupling: vdw –> none

runtime

simulation time per window in ps [5000]

temperature

temperature in Kelvin of the simulation [300.0]

qscript

template or list of templates for queuing system scripts (see gromacs.config.templates for details) [local.sh]

includes

include directories

simulation

Instead of providing the required arguments, obtain the input files from a mdpow.equil.Simulation instance.

method

“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.

mdp

MDP file name (if no entry then the package defaults are used)

filename

Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute filename of the default checkpoint file to filename.

basedir

Prepend basedir to all filenames; None disables [.]

permissive

Set to True if you want to read past corrupt data in output xvg files (see gromacs.formats.XVG for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False]

stride

collect every stride data line, see Gsolv.collect() [1]

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

kwargs

other undocumented arguments (see source for the moment)

Note

Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of FEPschedule in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.

analyze(force=False, stride=None, autosave=True, ncorrel=25000)

Extract dV/dl from output and calculate dG by TI.

Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined lambda=0 as interaction switched on and lambda=1 as switched off.

\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

Data are stored in Gsolv.results.

The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see scipy.integrate.simps() for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.

For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].

Errors are estimated from the errors of the individual <dV/dlambda>:

  1. The error of the mean <dV/dlambda> is calculated via the decay time of the fluctuation around the mean. ncorrel is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. See numkit.timeseries.tcorrel().
  2. The error on the integral is calculated analytically via propagation of errors through Simpson’s rule (with the approximation that all spacings are assumed to be equal; taken as the average over all spacings as implemented in numkit.integration.simps_error()).

Note

For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.

[Jorge2010]M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c.
Keywords:
force

reload raw data even though it is already loaded

stride

read data every stride lines, None uses the class default

autosave

save to the pickle file when results have been computed

ncorrel

aim for <= 25,000 samples for t_correl

Notes

The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:

\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]

where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
analyze_alchemlyb(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)

Compute the free energy from the simulation data with alchemlyb.

Unlike analyze(), the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.

arraylabel(component)

Batch submission script name for a job array.

collect(stride=None, autosave=True, autocompress=True)

Collect dV/dl from output.

Keywords:
stride

read data every stride lines, None uses the class default

autosave

immediately save the class pickle fep file

autocompress

compress the xvg file with bzip2 (saves >70% of space)

collect_alchemlyb(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)

Collect the data files using alchemlyb.

Unlike collect(), this method can subsample with the statistical inefficiency (parameter SI).

compress_dgdl_xvg()

Compress all dgdl xvg files with bzip2.

Note

After running this method you might want to run collect() to ensure that the results in results.xvg point to the compressed files. Otherwise IOError might occur which fail to find a md.xvg file.

contains_corrupted_xvgs()

Check if any of the source datafiles had reported corrupted lines.

Returns:True if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.

For debugging purposes, the number of corrupted lines are stored in Gsolv._corrupted as dicts of dicts with the component as primary and the lambda as secondary key.

convert_edr()

Convert EDR files to compressed XVG files.

dgdl_edr(*args)

Return filename of the dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to EDR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_total_edr(*args, **kwargs)

Return filename of the combined dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Keywords:
total_edr_name

Name of the user defined total edr file.

Returns:

path to total EDR

dgdl_tpr(*args)

Return filename of the dgdl TPR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to TPR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_xvg(*args)

Return filename of the dgdl XVG file.

Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

Checks if a compressed file exists and returns the appropriate filename.

Raises:

IOError with error code ENOENT if no file could be found

fep_dirs()

Generator for all simulation sub directories

frombase(*args)

Return path with Gsolv.basedir prefixed.

get_protocol(protocol)

Return method for protocol.

  • If protocol is a real method of the class then the method is returned.

  • If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature

    dummy_protocol(func, *args, **kwargs)

    Runs func with the arguments and keywords between calls to Journal.start() and Journal.completed(), with the stage set to protocol.

  • Raises a ValueError if the protocol is not registered (i.e. not found in Journalled.protocols).

has_dVdl()

Check if dV/dl data have already been collected.

Returns:True if the dV/dl data have bee aquired (Gsolv.collect()) for all FEP simulations.
label(component)

Simple label for component, e.g. for use in filenames.

load(filename=None)

Re-instantiate class from pickled file.

If no filename was supplied then the filename is taken from the attribute filename.

Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.

logger_DeltaA0()

Print the free energy contributions (errors in parentheses).

plot(**kwargs)

Plot the TI data with error bars.

Run mdpow.fep.Gsolv.analyze() first.

All kwargs are passed on to pylab.errorbar().

Returns:The axes of the subplot.
qsub(script=None)

Submit a batch script locally.

If script == None then take the first script (works well if only one template was provided).

save(filename=None)

Save instance to a pickle file.

The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute filename to the absolute path of the saved file.

setup(**kwargs)

Prepare the input files for all Gromacs runs.

Keywords:
qscript

(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.

kwargs

Most kwargs are passed on to gromacs.setup.MD() although some are set to values that are required for the FEP functionality.

mdrun_opts

list of options to mdrun; -dhdl is always added to this list as it is required for the thermodynamic integration calculations

includes

list of directories where Gromacs input files can be found

The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing Gsolv.

Changed in version 0.6.0: Gromacs now uses option -dhdl instead of -dgdl.

summary()

Return a string that summarizes the energetics.

Each energy component is followed by its error.

Format:

.                 ------ kJ/mol -----
molecule solvent  total  coulomb  vdw
tasklabel(component, lmbda)

Batch submission script name for a single task job.

wdir(component, lmbda)

Return rooted path to the work directory for component and lmbda.

(Constructed from frombase() and wname().)

wname(component, lmbda)

Return name of the window directory itself.

Typically something like VDW/0000, VDW/0500, …, Coulomb/1000

write_DeltaA0(filename, mode='w')

Write free energy components to a file.

Arguments:
filename

name of the text file

mode

‘w’ for overwrite or ‘a’ for append [‘w’]

Format::
. —– kJ/mol —— molecule solvent total coulomb vdw
class mdpow.fep.Gcyclo(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]

Set up Gsolv from input files or a equilibrium simulation.

Arguments:
molecule

name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]

top

topology [REQUIRED]

struct

solvated and equilibrated input structure [REQUIRED]

ndx

index file

dirname

directory to work under [‘FEP/solvent’]

solvent

name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to solvent_default.

lambda_coulomb

list of lambdas for discharging: q+vdw –> vdw

lambda_vdw

list of lambdas for decoupling: vdw –> none

runtime

simulation time per window in ps [5000]

temperature

temperature in Kelvin of the simulation [300.0]

qscript

template or list of templates for queuing system scripts (see gromacs.config.templates for details) [local.sh]

includes

include directories

simulation

Instead of providing the required arguments, obtain the input files from a mdpow.equil.Simulation instance.

method

“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.

mdp

MDP file name (if no entry then the package defaults are used)

filename

Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute filename of the default checkpoint file to filename.

basedir

Prepend basedir to all filenames; None disables [.]

permissive

Set to True if you want to read past corrupt data in output xvg files (see gromacs.formats.XVG for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False]

stride

collect every stride data line, see Gsolv.collect() [1]

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

kwargs

other undocumented arguments (see source for the moment)

Note

Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of FEPschedule in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.

analyze(force=False, stride=None, autosave=True, ncorrel=25000)

Extract dV/dl from output and calculate dG by TI.

Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined lambda=0 as interaction switched on and lambda=1 as switched off.

\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

Data are stored in Gsolv.results.

The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see scipy.integrate.simps() for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.

For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].

Errors are estimated from the errors of the individual <dV/dlambda>:

  1. The error of the mean <dV/dlambda> is calculated via the decay time of the fluctuation around the mean. ncorrel is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. See numkit.timeseries.tcorrel().
  2. The error on the integral is calculated analytically via propagation of errors through Simpson’s rule (with the approximation that all spacings are assumed to be equal; taken as the average over all spacings as implemented in numkit.integration.simps_error()).

Note

For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.

[Jorge2010]M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c.
Keywords:
force

reload raw data even though it is already loaded

stride

read data every stride lines, None uses the class default

autosave

save to the pickle file when results have been computed

ncorrel

aim for <= 25,000 samples for t_correl

Notes

The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:

\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]

where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
analyze_alchemlyb(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)

Compute the free energy from the simulation data with alchemlyb.

Unlike analyze(), the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.

arraylabel(component)

Batch submission script name for a job array.

collect(stride=None, autosave=True, autocompress=True)

Collect dV/dl from output.

Keywords:
stride

read data every stride lines, None uses the class default

autosave

immediately save the class pickle fep file

autocompress

compress the xvg file with bzip2 (saves >70% of space)

collect_alchemlyb(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)

Collect the data files using alchemlyb.

Unlike collect(), this method can subsample with the statistical inefficiency (parameter SI).

compress_dgdl_xvg()

Compress all dgdl xvg files with bzip2.

Note

After running this method you might want to run collect() to ensure that the results in results.xvg point to the compressed files. Otherwise IOError might occur which fail to find a md.xvg file.

contains_corrupted_xvgs()

Check if any of the source datafiles had reported corrupted lines.

Returns:True if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.

For debugging purposes, the number of corrupted lines are stored in Gsolv._corrupted as dicts of dicts with the component as primary and the lambda as secondary key.

convert_edr()

Convert EDR files to compressed XVG files.

dgdl_edr(*args)

Return filename of the dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to EDR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_total_edr(*args, **kwargs)

Return filename of the combined dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Keywords:
total_edr_name

Name of the user defined total edr file.

Returns:

path to total EDR

dgdl_tpr(*args)

Return filename of the dgdl TPR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to TPR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_xvg(*args)

Return filename of the dgdl XVG file.

Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

Checks if a compressed file exists and returns the appropriate filename.

Raises:

IOError with error code ENOENT if no file could be found

fep_dirs()

Generator for all simulation sub directories

frombase(*args)

Return path with Gsolv.basedir prefixed.

get_protocol(protocol)

Return method for protocol.

  • If protocol is a real method of the class then the method is returned.

  • If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature

    dummy_protocol(func, *args, **kwargs)

    Runs func with the arguments and keywords between calls to Journal.start() and Journal.completed(), with the stage set to protocol.

  • Raises a ValueError if the protocol is not registered (i.e. not found in Journalled.protocols).

has_dVdl()

Check if dV/dl data have already been collected.

Returns:True if the dV/dl data have bee aquired (Gsolv.collect()) for all FEP simulations.
label(component)

Simple label for component, e.g. for use in filenames.

load(filename=None)

Re-instantiate class from pickled file.

If no filename was supplied then the filename is taken from the attribute filename.

Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.

logger_DeltaA0()

Print the free energy contributions (errors in parentheses).

plot(**kwargs)

Plot the TI data with error bars.

Run mdpow.fep.Gsolv.analyze() first.

All kwargs are passed on to pylab.errorbar().

Returns:The axes of the subplot.
qsub(script=None)

Submit a batch script locally.

If script == None then take the first script (works well if only one template was provided).

save(filename=None)

Save instance to a pickle file.

The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute filename to the absolute path of the saved file.

setup(**kwargs)

Prepare the input files for all Gromacs runs.

Keywords:
qscript

(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.

kwargs

Most kwargs are passed on to gromacs.setup.MD() although some are set to values that are required for the FEP functionality.

mdrun_opts

list of options to mdrun; -dhdl is always added to this list as it is required for the thermodynamic integration calculations

includes

list of directories where Gromacs input files can be found

The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing Gsolv.

Changed in version 0.6.0: Gromacs now uses option -dhdl instead of -dgdl.

summary()

Return a string that summarizes the energetics.

Each energy component is followed by its error.

Format:

.                 ------ kJ/mol -----
molecule solvent  total  coulomb  vdw
tasklabel(component, lmbda)

Batch submission script name for a single task job.

wdir(component, lmbda)

Return rooted path to the work directory for component and lmbda.

(Constructed from frombase() and wname().)

wname(component, lmbda)

Return name of the window directory itself.

Typically something like VDW/0000, VDW/0500, …, Coulomb/1000

write_DeltaA0(filename, mode='w')

Write free energy components to a file.

Arguments:
filename

name of the text file

mode

‘w’ for overwrite or ‘a’ for append [‘w’]

Format::
. —– kJ/mol —— molecule solvent total coulomb vdw
class mdpow.fep.Gtol(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]

Sets up and analyses MD to obtain the solvation free energy of a solute in toluene.

Set up Gsolv from input files or a equilibrium simulation.

Arguments:
molecule

name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]

top

topology [REQUIRED]

struct

solvated and equilibrated input structure [REQUIRED]

ndx

index file

dirname

directory to work under [‘FEP/solvent’]

solvent

name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to solvent_default.

lambda_coulomb

list of lambdas for discharging: q+vdw –> vdw

lambda_vdw

list of lambdas for decoupling: vdw –> none

runtime

simulation time per window in ps [5000]

temperature

temperature in Kelvin of the simulation [300.0]

qscript

template or list of templates for queuing system scripts (see gromacs.config.templates for details) [local.sh]

includes

include directories

simulation

Instead of providing the required arguments, obtain the input files from a mdpow.equil.Simulation instance.

method

“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.

mdp

MDP file name (if no entry then the package defaults are used)

filename

Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute filename of the default checkpoint file to filename.

basedir

Prepend basedir to all filenames; None disables [.]

permissive

Set to True if you want to read past corrupt data in output xvg files (see gromacs.formats.XVG for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False]

stride

collect every stride data line, see Gsolv.collect() [1]

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

kwargs

other undocumented arguments (see source for the moment)

Note

Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of FEPschedule in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.

analyze(force=False, stride=None, autosave=True, ncorrel=25000)

Extract dV/dl from output and calculate dG by TI.

Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined lambda=0 as interaction switched on and lambda=1 as switched off.

\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]

Data are stored in Gsolv.results.

The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see scipy.integrate.simps() for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.

For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].

Errors are estimated from the errors of the individual <dV/dlambda>:

  1. The error of the mean <dV/dlambda> is calculated via the decay time of the fluctuation around the mean. ncorrel is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. See numkit.timeseries.tcorrel().
  2. The error on the integral is calculated analytically via propagation of errors through Simpson’s rule (with the approximation that all spacings are assumed to be equal; taken as the average over all spacings as implemented in numkit.integration.simps_error()).

Note

For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.

[Jorge2010]M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c.
Keywords:
force

reload raw data even though it is already loaded

stride

read data every stride lines, None uses the class default

autosave

save to the pickle file when results have been computed

ncorrel

aim for <= 25,000 samples for t_correl

Notes

The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:

\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]

where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
analyze_alchemlyb(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)

Compute the free energy from the simulation data with alchemlyb.

Unlike analyze(), the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.

arraylabel(component)

Batch submission script name for a job array.

collect(stride=None, autosave=True, autocompress=True)

Collect dV/dl from output.

Keywords:
stride

read data every stride lines, None uses the class default

autosave

immediately save the class pickle fep file

autocompress

compress the xvg file with bzip2 (saves >70% of space)

collect_alchemlyb(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)

Collect the data files using alchemlyb.

Unlike collect(), this method can subsample with the statistical inefficiency (parameter SI).

compress_dgdl_xvg()

Compress all dgdl xvg files with bzip2.

Note

After running this method you might want to run collect() to ensure that the results in results.xvg point to the compressed files. Otherwise IOError might occur which fail to find a md.xvg file.

contains_corrupted_xvgs()

Check if any of the source datafiles had reported corrupted lines.

Returns:True if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.

For debugging purposes, the number of corrupted lines are stored in Gsolv._corrupted as dicts of dicts with the component as primary and the lambda as secondary key.

convert_edr()

Convert EDR files to compressed XVG files.

dgdl_edr(*args)

Return filename of the dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to EDR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_total_edr(*args, **kwargs)

Return filename of the combined dgdl EDR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Keywords:
total_edr_name

Name of the user defined total edr file.

Returns:

path to total EDR

dgdl_tpr(*args)

Return filename of the dgdl TPR file.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

path to TPR

Raises:

IOError with error code ENOENT if no file could be found

dgdl_xvg(*args)

Return filename of the dgdl XVG file.

Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.

Arguments:
args

joins the arguments into a path and adds the default filename for the dvdl file

Returns:

Checks if a compressed file exists and returns the appropriate filename.

Raises:

IOError with error code ENOENT if no file could be found

fep_dirs()

Generator for all simulation sub directories

frombase(*args)

Return path with Gsolv.basedir prefixed.

get_protocol(protocol)

Return method for protocol.

  • If protocol is a real method of the class then the method is returned.

  • If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature

    dummy_protocol(func, *args, **kwargs)

    Runs func with the arguments and keywords between calls to Journal.start() and Journal.completed(), with the stage set to protocol.

  • Raises a ValueError if the protocol is not registered (i.e. not found in Journalled.protocols).

has_dVdl()

Check if dV/dl data have already been collected.

Returns:True if the dV/dl data have bee aquired (Gsolv.collect()) for all FEP simulations.
label(component)

Simple label for component, e.g. for use in filenames.

load(filename=None)

Re-instantiate class from pickled file.

If no filename was supplied then the filename is taken from the attribute filename.

Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.

logger_DeltaA0()

Print the free energy contributions (errors in parentheses).

plot(**kwargs)

Plot the TI data with error bars.

Run mdpow.fep.Gsolv.analyze() first.

All kwargs are passed on to pylab.errorbar().

Returns:The axes of the subplot.
qsub(script=None)

Submit a batch script locally.

If script == None then take the first script (works well if only one template was provided).

save(filename=None)

Save instance to a pickle file.

The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute filename to the absolute path of the saved file.

setup(**kwargs)

Prepare the input files for all Gromacs runs.

Keywords:
qscript

(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.

kwargs

Most kwargs are passed on to gromacs.setup.MD() although some are set to values that are required for the FEP functionality.

mdrun_opts

list of options to mdrun; -dhdl is always added to this list as it is required for the thermodynamic integration calculations

includes

list of directories where Gromacs input files can be found

The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing Gsolv.

Changed in version 0.6.0: Gromacs now uses option -dhdl instead of -dgdl.

summary()

Return a string that summarizes the energetics.

Each energy component is followed by its error.

Format:

.                 ------ kJ/mol -----
molecule solvent  total  coulomb  vdw
tasklabel(component, lmbda)

Batch submission script name for a single task job.

wdir(component, lmbda)

Return rooted path to the work directory for component and lmbda.

(Constructed from frombase() and wname().)

wname(component, lmbda)

Return name of the window directory itself.

Typically something like VDW/0000, VDW/0500, …, Coulomb/1000

write_DeltaA0(filename, mode='w')

Write free energy components to a file.

Arguments:
filename

name of the text file

mode

‘w’ for overwrite or ‘a’ for append [‘w’]

Format::
. —– kJ/mol —— molecule solvent total coulomb vdw
mdpow.fep.pOW(G1, G2, **kwargs)[source]

Compute water-octanol partition coefficient from two Gsolv objects.

transfer free energy from water into octanol:

DeltaDeltaG0 = DeltaG0_oct - DeltaG0_water

octanol/water partition coefficient:

log P_oct/wat =  log [X]_oct/[X]_wat
Arguments:
G1, G2

G1 and G2 should be a Ghyd and a Goct instance, but order does not matter

force

force rereading of data files even if some data were already stored [False]

stride

analyze every stride-th datapoint in the dV/dlambda files

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

estimator

Set to alchemlyb if you want to use alchemlyb estimators, or mdpow if you want the default TI method.

method

Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.

Returns:

(transfer free energy, log10 of the octanol/water partition coefficient = log Pow)

mdpow.fep.pCW(G1, G2, **kwargs)[source]

Compute water-cyclohexane partition coefficient from two Gsolv objects.

transfer free energy from water into cyclohexane:

DeltaDeltaG0 = DeltaG0_cyclohexane - DeltaG0_water

cyclohexane/water partition coefficient:

log P_CW =  log [X]_cyclohexane/[X]_water
Arguments:
G1, G2

G1 and G2 should be a Ghyd and a Gcyclo instance, but order does not matter

force

force rereading of data files even if some data were already stored [False]

stride

analyze every stride-th datapoint in the dV/dlambda files

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

estimator

Set to alchemlyb if you want to use alchemlyb estimators, or mdpow if you want the default TI method.

method

Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.

Returns:

(transfer free energy, log10 of the cyclohexane/water partition coefficient = log Pcw)

mdpow.fep.pTW(G1, G2, **kwargs)[source]

Compute water-toluene partition coefficient from two Gsolv objects.

transfer free energy from water into toluene:

DeltaDeltaG0 = DeltaG0_tol - DeltaG0_water

toluene/water partition coefficient:

log P_tol/wat =  log [X]_tol/[X]_wat
Arguments:
G1, G2

G1 and G2 should be a Ghyd and a Gtol instance, but order does not matter

force

force rereading of data files even if some data were already stored [False]

stride

analyze every stride-th datapoint in the dV/dlambda files

start

Start frame of data analyzed in every fep window.

stop

Stop frame of data analyzed in every fep window.

SI

Set to True if you want to perform statistical inefficiency to preprocess the data.

estimator

Set to alchemlyb if you want to use alchemlyb estimators, or mdpow if you want the default TI method.

method

Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.

Returns:

(transfer free energy, log10 of the toluene/water partition coefficient = log Ptw)

5.4. Developer notes

A user really only needs to access classes derived from mdpow.fep.Gsolv; all other classes and functions are auxiliary and only of interest to developers.

Additional objects that support mdpow.fep.Gsolv.

class mdpow.fep.FEPschedule[source]

Describe mdp parameter choices as key - value pairs.

The FEP schedule can be loaded from a configuration parser with the static method FEPschedule.load().

See the example runinput file for an example. It contains the sections:

[FEP_schedule_Coulomb]
name = Coulomb
description = dis-charging vdw+q --> vdw
label = Coul
couple_lambda0 = vdw-q
couple_lambda1 = vdw
# soft core alpha: linear scaling for coulomb
sc_alpha = 0
lambdas = 0, 0.25, 0.5, 0.75, 1.0


[FEP_schedule_VDW]
name = vdw
description = decoupling vdw --> none
label = VDW
couple_lambda0 = vdw
couple_lambda1 = none
# recommended values for soft cores (Mobley, Shirts et al)
sc_alpha = 0.5
sc_power = 1
sc_sigma = 0.3
lambdas = 0.0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1
static load(cfg, section)[source]

Initialize a FEPschedule from the section in the configuration cfg

mdp_dict

Dict of key-values that can be set in a mdp file.

mdpow.fep.molar_to_nm3(c)[source]

Convert a concentration in Molar to nm|^-3|.

mdpow.fep.kcal_to_kJ(x)[source]

Convert a energy in kcal to kJ.

mdpow.fep.kJ_to_kcal(x)[source]

Convert a energy in kJ to kcal.

mdpow.fep.N_AVOGADRO

Avogadro’s constant NA in mol-1 (NA NIST value).

mdpow.fep.kBOLTZ

Boltzmann’s constant kB in kJ mol-1 (kB NIST value).

5.4.1. TODO

  • run minimization, NVT-equil, NPT-equil prior to production (probably use preprocessed topology from grompp -pp for portability)

    See Free Energy Tutorial.