5. mdpow.fep
– Calculate free energy of solvation¶
Set up and run free energy perturbation (FEP) calculations to calculate the free energy of hydration of a solute in a solvent box. The protocol follows the works of D. Mobley (Free Energy Tutorial) and M. Shirts, and uses Gromacs 4.0.x.
Required Input:
- topology
- equilibrated structure of the solvated molecule
See the docs for Gsolv
for details on what is calculated.
5.1. Differences to published protocols¶
Some notes on how the approach encoded here differs from what others (notabley Mobley) did:
We use Gromacs 4.x and use the new decoupling feature
couple-intramol = no
which indicates that “intra-molecular interactions remain”. It should (as I understand it) allow us to only calculate the free energy changes in solution so that we do not have to do an extra calculation in vacuo. http://www.mail-archive.com/gmx-users@gromacs.org/msg18803.html
Mobley does an extra discharging calculation in vacuo and calculates
\[\Delta A = \Delta A_{\mathrm{coul}}(\mathrm{vac}) - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\](but also annihilates the interactions on the solute, corresponding to
couple-intramol = yes
) whereas we do\[\Delta A = - (\Delta A_{\mathrm{coul}}(\mathrm{sol}) + \Delta A_{\mathrm{vdw}}(\mathrm{sol}))\]simulations are run as NPT (but make sure to use Langevin dynamics for temperature control)
5.2. Example¶
see mdpow
5.3. User reference¶
Simulation setup and analysis of all FEP simulations is encapsulated
by a mdpow.fep.Gsolv
object. For the hydration free energy
there is a special class Ghyd
and for the
solvation free energy in octanol there is
Goct
. See the description of
Gsolv
for methods common to both.
-
class
mdpow.fep.
Gsolv
(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]¶ Simulations to calculate and analyze the solvation free energy.
\(\Delta A\) is computed from the decharging and the decoupling step. With our choice of
lambda=0
being the fully interacting andlambda=1
the non-interacting state, it is computed as\[\Delta A = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]With this protocol, the concentration in the liquid and in the gas phase is the same. (Under the assumption of ideal solution/ideal gas behaviour this directly relates to the Ben-Naim 1M/1M standard state.)
(We neglect the negligible correction \(-kT \ln V_x/V_{\mathrm{sim}} = -kT \ln(1 - v_s/V_{\mathrm{sim}})\) where \(V_x\) is the volume of the system without the solute but the same number of water molecules as in the fully interacting case [see Michael Shirts’ Thesis, p82].)
Typical work flow:
G = Gsolv(simulation='drug.simulation') # continue from :mod:`mdpow.equil` G.setup(qscript=['my_template.sge', 'local.sh']) # my_template.sge is user supplied G.qsub() # run SGE job arrays as generated from my_template.sge G.analyze() G.plot()
See
gromacs.qsub
for notes on how to write templates for queuing system scripts (in particular queuing system templates).Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True, ncorrel=25000)[source]¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
ncorrel
is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. Seenumkit.timeseries.tcorrel()
. - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- ncorrel
aim for <= 25,000 samples for t_correl
Notes
The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:
\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation
[FrenkelSmit2002] D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002 - The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
-
analyze_alchemlyb
(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)[source]¶ Compute the free energy from the simulation data with alchemlyb.
Unlike
analyze()
, the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.
-
collect
(stride=None, autosave=True, autocompress=True)[source]¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
collect_alchemlyb
(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)[source]¶ Collect the data files using alchemlyb.
Unlike
collect()
, this method can subsample with the statistical inefficiency (parameter SI).
-
contains_corrupted_xvgs
()[source]¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
dgdl_edr
(*args)[source]¶ Return filename of the dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to EDR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_total_edr
(*args, **kwargs)[source]¶ Return filename of the combined dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Keywords: - total_edr_name
Name of the user defined total edr file.
Returns: path to total EDR
-
dgdl_tpr
(*args)[source]¶ Return filename of the dgdl TPR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to TPR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_xvg
(*args)[source]¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
estimators
= {'BAR': {'estimator': <class 'alchemlyb.estimators.bar_.BAR'>, 'extract': <function extract_u_nk>}, 'MBAR': {'estimator': <class 'alchemlyb.estimators.mbar_.MBAR'>, 'extract': <function extract_u_nk>}, 'TI': {'estimator': <class 'alchemlyb.estimators.ti_.TI'>, 'extract': <function extract_dHdl>}}¶ Estimators in alchemlyb
-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
has_dVdl
()[source]¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.
-
mdp_default
= 'bar_opls.mdp'¶ Default Gromacs MDP run parameter file for FEP. (The file is part of the package and is found with
mdpow.config.get_template()
.)
-
plot
(**kwargs)[source]¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.Returns: The axes of the subplot.
-
protocols
= ['setup', 'fep_run']¶ Check list of all methods that can be run as an independent protocol; see also
Simulation.get_protocol()
andrestart.Journal
-
qsub
(script=None)[source]¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
results
= None¶ Results from the analysis
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
scripts
= None¶ Generated run scripts
-
setup
(**kwargs)[source]¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()[source]¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
wdir
(component, lmbda)[source]¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
class
mdpow.fep.
Ghyd
(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]¶ Sets up and analyses MD to obtain the hydration free energy of a solute.
Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True, ncorrel=25000)¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
ncorrel
is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. Seenumkit.timeseries.tcorrel()
. - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- ncorrel
aim for <= 25,000 samples for t_correl
Notes
The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:
\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation
[FrenkelSmit2002] D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002 - The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
-
analyze_alchemlyb
(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)¶ Compute the free energy from the simulation data with alchemlyb.
Unlike
analyze()
, the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.
-
arraylabel
(component)¶ Batch submission script name for a job array.
-
collect
(stride=None, autosave=True, autocompress=True)¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
collect_alchemlyb
(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)¶ Collect the data files using alchemlyb.
Unlike
collect()
, this method can subsample with the statistical inefficiency (parameter SI).
-
compress_dgdl_xvg
()¶ Compress all dgdl xvg files with bzip2.
-
contains_corrupted_xvgs
()¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
convert_edr
()¶ Convert EDR files to compressed XVG files.
-
dgdl_edr
(*args)¶ Return filename of the dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to EDR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_total_edr
(*args, **kwargs)¶ Return filename of the combined dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Keywords: - total_edr_name
Name of the user defined total edr file.
Returns: path to total EDR
-
dgdl_tpr
(*args)¶ Return filename of the dgdl TPR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to TPR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_xvg
(*args)¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
fep_dirs
()¶ Generator for all simulation sub directories
-
frombase
(*args)¶ Return path with
Gsolv.basedir
prefixed.
-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
has_dVdl
()¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
label
(component)¶ Simple label for component, e.g. for use in filenames.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.
-
logger_DeltaA0
()¶ Print the free energy contributions (errors in parentheses).
-
plot
(**kwargs)¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.Returns: The axes of the subplot.
-
qsub
(script=None)¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
setup
(**kwargs)¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
tasklabel
(component, lmbda)¶ Batch submission script name for a single task job.
-
wdir
(component, lmbda)¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
wname
(component, lmbda)¶ Return name of the window directory itself.
Typically something like
VDW/0000
,VDW/0500
, …,Coulomb/1000
-
write_DeltaA0
(filename, mode='w')¶ Write free energy components to a file.
Arguments: - filename
name of the text file
- mode
‘w’ for overwrite or ‘a’ for append [‘w’]
- Format::
- . —– kJ/mol —— molecule solvent total coulomb vdw
-
class
mdpow.fep.
Goct
(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]¶ Sets up and analyses MD to obtain the solvation free energy of a solute in octanol.
The coulomb lambda schedule is enhanced compared to water as the initial part of the dV/dl curve is quite sensitive. By adding two additional points we hope to reduce the overall error on the dis-charging free energy.
Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True, ncorrel=25000)¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
ncorrel
is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. Seenumkit.timeseries.tcorrel()
. - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- ncorrel
aim for <= 25,000 samples for t_correl
Notes
The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:
\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation
[FrenkelSmit2002] D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002 - The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
-
analyze_alchemlyb
(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)¶ Compute the free energy from the simulation data with alchemlyb.
Unlike
analyze()
, the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.
-
arraylabel
(component)¶ Batch submission script name for a job array.
-
collect
(stride=None, autosave=True, autocompress=True)¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
collect_alchemlyb
(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)¶ Collect the data files using alchemlyb.
Unlike
collect()
, this method can subsample with the statistical inefficiency (parameter SI).
-
compress_dgdl_xvg
()¶ Compress all dgdl xvg files with bzip2.
-
contains_corrupted_xvgs
()¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
convert_edr
()¶ Convert EDR files to compressed XVG files.
-
dgdl_edr
(*args)¶ Return filename of the dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to EDR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_total_edr
(*args, **kwargs)¶ Return filename of the combined dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Keywords: - total_edr_name
Name of the user defined total edr file.
Returns: path to total EDR
-
dgdl_tpr
(*args)¶ Return filename of the dgdl TPR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to TPR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_xvg
(*args)¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
fep_dirs
()¶ Generator for all simulation sub directories
-
frombase
(*args)¶ Return path with
Gsolv.basedir
prefixed.
-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
has_dVdl
()¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
label
(component)¶ Simple label for component, e.g. for use in filenames.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.
-
logger_DeltaA0
()¶ Print the free energy contributions (errors in parentheses).
-
plot
(**kwargs)¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.Returns: The axes of the subplot.
-
qsub
(script=None)¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
setup
(**kwargs)¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
tasklabel
(component, lmbda)¶ Batch submission script name for a single task job.
-
wdir
(component, lmbda)¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
wname
(component, lmbda)¶ Return name of the window directory itself.
Typically something like
VDW/0000
,VDW/0500
, …,Coulomb/1000
-
write_DeltaA0
(filename, mode='w')¶ Write free energy components to a file.
Arguments: - filename
name of the text file
- mode
‘w’ for overwrite or ‘a’ for append [‘w’]
- Format::
- . —– kJ/mol —— molecule solvent total coulomb vdw
-
class
mdpow.fep.
Gcyclo
(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]¶ Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True, ncorrel=25000)¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
ncorrel
is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. Seenumkit.timeseries.tcorrel()
. - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- ncorrel
aim for <= 25,000 samples for t_correl
Notes
The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:
\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation
[FrenkelSmit2002] D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002 - The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
-
analyze_alchemlyb
(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)¶ Compute the free energy from the simulation data with alchemlyb.
Unlike
analyze()
, the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.
-
arraylabel
(component)¶ Batch submission script name for a job array.
-
collect
(stride=None, autosave=True, autocompress=True)¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
collect_alchemlyb
(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)¶ Collect the data files using alchemlyb.
Unlike
collect()
, this method can subsample with the statistical inefficiency (parameter SI).
-
compress_dgdl_xvg
()¶ Compress all dgdl xvg files with bzip2.
-
contains_corrupted_xvgs
()¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
convert_edr
()¶ Convert EDR files to compressed XVG files.
-
dgdl_edr
(*args)¶ Return filename of the dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to EDR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_total_edr
(*args, **kwargs)¶ Return filename of the combined dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Keywords: - total_edr_name
Name of the user defined total edr file.
Returns: path to total EDR
-
dgdl_tpr
(*args)¶ Return filename of the dgdl TPR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to TPR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_xvg
(*args)¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
fep_dirs
()¶ Generator for all simulation sub directories
-
frombase
(*args)¶ Return path with
Gsolv.basedir
prefixed.
-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
has_dVdl
()¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
label
(component)¶ Simple label for component, e.g. for use in filenames.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.
-
logger_DeltaA0
()¶ Print the free energy contributions (errors in parentheses).
-
plot
(**kwargs)¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.Returns: The axes of the subplot.
-
qsub
(script=None)¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
setup
(**kwargs)¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
tasklabel
(component, lmbda)¶ Batch submission script name for a single task job.
-
wdir
(component, lmbda)¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
wname
(component, lmbda)¶ Return name of the window directory itself.
Typically something like
VDW/0000
,VDW/0500
, …,Coulomb/1000
-
write_DeltaA0
(filename, mode='w')¶ Write free energy components to a file.
Arguments: - filename
name of the text file
- mode
‘w’ for overwrite or ‘a’ for append [‘w’]
- Format::
- . —– kJ/mol —— molecule solvent total coulomb vdw
-
class
mdpow.fep.
Gtol
(molecule=None, top=None, struct=None, method='BAR', **kwargs)[source]¶ Sets up and analyses MD to obtain the solvation free energy of a solute in toluene.
Set up Gsolv from input files or a equilibrium simulation.
Arguments: - molecule
name of the molecule for which the hydration free energy is to be computed (as in the gromacs topology) [REQUIRED]
- top
topology [REQUIRED]
- struct
solvated and equilibrated input structure [REQUIRED]
- ndx
index file
- dirname
directory to work under [‘FEP/solvent’]
- solvent
name of the solvent (only used for path names); will not affect the simulations because the input coordinates and topologies are determined by either simulation or molecule, top, and struct. If solvent is not provided then it is set to
solvent_default
.- lambda_coulomb
list of lambdas for discharging: q+vdw –> vdw
- lambda_vdw
list of lambdas for decoupling: vdw –> none
- runtime
simulation time per window in ps [5000]
- temperature
temperature in Kelvin of the simulation [300.0]
- qscript
template or list of templates for queuing system scripts (see
gromacs.config.templates
for details) [local.sh]- includes
include directories
- simulation
Instead of providing the required arguments, obtain the input files from a
mdpow.equil.Simulation
instance.- method
“TI” for thermodynamic integration or “BAR” for Bennett acceptance ratio; using “BAR” in Gromacs also writes TI data so this is the default.
- mdp
MDP file name (if no entry then the package defaults are used)
- filename
Instead of providing the required arguments, load from pickle file. If either simulation or molecule, top, and struct are provided then simply set the attribute
filename
of the default checkpoint file to filename.- basedir
Prepend basedir to all filenames;
None
disables [.]- permissive
Set to
True
if you want to read past corrupt data in output xvg files (seegromacs.formats.XVG
for details); note that permissive*=``True`` can lead to **wrong results*. Overrides the value set in a loaded pickle file. [False
]- stride
collect every stride data line, see
Gsolv.collect()
[1]- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- kwargs
other undocumented arguments (see source for the moment)
Note
Only a subset of the FEP parameters can currently be set with keywords (lambdas); other parameters such as the soft cores are currently fixed. For advanced use: pass a dict with keys “coulomb” and “vdw” and values of
FEPschedule
in the keyword schedules. The lambdas will override these settings (which typically come from a cfg). This method allows setting all parameters.-
analyze
(force=False, stride=None, autosave=True, ncorrel=25000)¶ Extract dV/dl from output and calculate dG by TI.
Thermodynamic integration (TI) is performed on the individual component window calculation (typically the Coulomb and the VDW part, \(\Delta A_{\mathrm{coul}}\) and \(\Delta A_{\mathrm{vdW}}\)). \(\Delta A_{\mathrm{coul}}\) is the free energy component of discharging the molecule and \(\Delta A_{\mathrm{vdW}}\) of decoupling (switching off LJ interactions with the environment). The free energy components must be interpreted in this way because we defined
lambda=0
as interaction switched on andlambda=1
as switched off.\[\Delta A^{*} = -(\Delta A_{\mathrm{coul}} + \Delta A_{\mathrm{vdw}})\]Data are stored in
Gsolv.results
.The dV/dlambda graphs are integrated with the composite Simpson’s rule (and if the number of datapoints are even, the first interval is evaluated with the trapezoidal rule); see
scipy.integrate.simps()
for details). Note that this implementation of Simpson’s rule does not require equidistant spacing on the lambda axis.For the Coulomb part using Simpson’s rule has been shown to produce more accurate results than the trapezoidal rule [Jorge2010].
Errors are estimated from the errors of the individual <dV/dlambda>:
- The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
ncorrel
is the max number of samples that is going to be used to calculate the autocorrelation time via a FFT. Seenumkit.timeseries.tcorrel()
. - The error on the integral is calculated analytically via
propagation of errors through Simpson’s rule (with the
approximation that all spacings are assumed to be equal; taken as
the average over all spacings as implemented in
numkit.integration.simps_error()
).
Note
For the Coulomb part, which typically only contains about 5 lambdas, it is recommended to have a odd number of lambda values to fully benefit from the higher accuracy of the integration scheme.
[Jorge2010] M. Jorge, N.M. Garrido, A.J. Queimada, I.G. Economou, and E.A. Macedo. Effect of the integration method on the accuracy and computational efficiency of free energy calculations using thermodynamic integration. Journal of Chemical Theory and Computation, 6 (4):1018–1027, 2010. 10.1021/ct900661c. Keywords: - force
reload raw data even though it is already loaded
- stride
read data every stride lines,
None
uses the class default- autosave
save to the pickle file when results have been computed
- ncorrel
aim for <= 25,000 samples for t_correl
Notes
The error on the mean of the data \(\epsilon_y\), taking the correlation time into account, is calculated according to [FrenkelSmit2002] p526:
\[\epsilon_y = \sqrt{2 \tau_c \mathrm{acf}(0)/T}\]where \(\mathrm{acf}()\) is the autocorrelation function of the fluctuations around the mean, \(y - \langle y \rangle\), \(\tau_c\) is the correlation time, and \(T\) the total length of the simulation
[FrenkelSmit2002] D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002 - The error of the mean <dV/dlambda> is calculated via the decay time
of the fluctuation around the mean.
-
analyze_alchemlyb
(SI=True, start=0, stop=None, stride=None, force=False, autosave=True)¶ Compute the free energy from the simulation data with alchemlyb.
Unlike
analyze()
, the MBAR estimator is available (in addition to TI). Note that SI should be enabled for meaningful error estimates.
-
arraylabel
(component)¶ Batch submission script name for a job array.
-
collect
(stride=None, autosave=True, autocompress=True)¶ Collect dV/dl from output.
Keywords: - stride
read data every stride lines,
None
uses the class default- autosave
immediately save the class pickle fep file
- autocompress
compress the xvg file with bzip2 (saves >70% of space)
-
collect_alchemlyb
(SI=True, start=0, stop=None, stride=None, autosave=True, autocompress=True)¶ Collect the data files using alchemlyb.
Unlike
collect()
, this method can subsample with the statistical inefficiency (parameter SI).
-
compress_dgdl_xvg
()¶ Compress all dgdl xvg files with bzip2.
-
contains_corrupted_xvgs
()¶ Check if any of the source datafiles had reported corrupted lines.
Returns: True
if any of the xvg dgdl files were produced with the permissive=True flag and had skipped lines.For debugging purposes, the number of corrupted lines are stored in
Gsolv._corrupted
as dicts of dicts with the component as primary and the lambda as secondary key.
-
convert_edr
()¶ Convert EDR files to compressed XVG files.
-
dgdl_edr
(*args)¶ Return filename of the dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to EDR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_total_edr
(*args, **kwargs)¶ Return filename of the combined dgdl EDR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Keywords: - total_edr_name
Name of the user defined total edr file.
Returns: path to total EDR
-
dgdl_tpr
(*args)¶ Return filename of the dgdl TPR file.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: path to TPR
Raises: IOError
with error code ENOENT if no file could be found
-
dgdl_xvg
(*args)¶ Return filename of the dgdl XVG file.
Recognizes uncompressed, gzipped (gz), and bzip2ed (bz2) files.
Arguments: - args
joins the arguments into a path and adds the default filename for the dvdl file
Returns: Checks if a compressed file exists and returns the appropriate filename.
Raises: IOError
with error code ENOENT if no file could be found
-
fep_dirs
()¶ Generator for all simulation sub directories
-
frombase
(*args)¶ Return path with
Gsolv.basedir
prefixed.
-
get_protocol
(protocol)¶ Return method for protocol.
If protocol is a real method of the class then the method is returned.
If protocol is a registered protocol name but no method of the name exists (i.e. protocol is a “dummy protocol”) then a wrapper function is returned. The wrapper has the signature
-
dummy_protocol
(func, *args, **kwargs)¶ Runs func with the arguments and keywords between calls to
Journal.start()
andJournal.completed()
, with the stage set to protocol.
-
Raises a
ValueError
if the protocol is not registered (i.e. not found inJournalled.protocols
).
-
has_dVdl
()¶ Check if dV/dl data have already been collected.
Returns: True
if the dV/dl data have bee aquired (Gsolv.collect()
) for all FEP simulations.
-
label
(component)¶ Simple label for component, e.g. for use in filenames.
-
load
(filename=None)¶ Re-instantiate class from pickled file.
If no filename was supplied then the filename is taken from the attribute
filename
.Changed in version 0.7.1: Can read pickle files with either Python 2.7 or 3.x, regardless of the Python version that created the pickle.
-
logger_DeltaA0
()¶ Print the free energy contributions (errors in parentheses).
-
plot
(**kwargs)¶ Plot the TI data with error bars.
Run
mdpow.fep.Gsolv.analyze()
first.All kwargs are passed on to
pylab.errorbar()
.Returns: The axes of the subplot.
-
qsub
(script=None)¶ Submit a batch script locally.
If script ==
None
then take the first script (works well if only one template was provided).
-
save
(filename=None)¶ Save instance to a pickle file.
The default filename is the name of the file that was last loaded from or saved to. Also sets the attribute
filename
to the absolute path of the saved file.
-
setup
(**kwargs)¶ Prepare the input files for all Gromacs runs.
Keywords: - qscript
(List of) template(s) for batch submission scripts; if not set then the templates are used that were supplied to the constructor.
- kwargs
Most kwargs are passed on to
gromacs.setup.MD()
although some are set to values that are required for the FEP functionality.- mdrun_opts
list of options to mdrun;
-dhdl
is always added to this list as it is required for the thermodynamic integration calculations- includes
list of directories where Gromacs input files can be found
The following keywords cannot be changed: dirname, jobname, prefix, runtime, deffnm. The last two can be specified when constructing
Gsolv
.
See also
gromacs.setup.MD()
andgromacs.qsub.generate_submit_scripts()
Changed in version 0.6.0: Gromacs now uses option
-dhdl
instead of-dgdl
.
-
summary
()¶ Return a string that summarizes the energetics.
Each energy component is followed by its error.
Format:
. ------ kJ/mol ----- molecule solvent total coulomb vdw
-
tasklabel
(component, lmbda)¶ Batch submission script name for a single task job.
-
wdir
(component, lmbda)¶ Return rooted path to the work directory for component and lmbda.
(Constructed from
frombase()
andwname()
.)
-
wname
(component, lmbda)¶ Return name of the window directory itself.
Typically something like
VDW/0000
,VDW/0500
, …,Coulomb/1000
-
write_DeltaA0
(filename, mode='w')¶ Write free energy components to a file.
Arguments: - filename
name of the text file
- mode
‘w’ for overwrite or ‘a’ for append [‘w’]
- Format::
- . —– kJ/mol —— molecule solvent total coulomb vdw
-
mdpow.fep.
pOW
(G1, G2, **kwargs)[source]¶ Compute water-octanol partition coefficient from two
Gsolv
objects.transfer free energy from water into octanol:
DeltaDeltaG0 = DeltaG0_oct - DeltaG0_water
octanol/water partition coefficient:
log P_oct/wat = log [X]_oct/[X]_wat
Arguments: - G1, G2
G1 and G2 should be a
Ghyd
and aGoct
instance, but order does not matter- force
force rereading of data files even if some data were already stored [False]
- stride
analyze every stride-th datapoint in the dV/dlambda files
- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- estimator
Set to
alchemlyb
if you want to use alchemlyb estimators, ormdpow
if you want the default TI method.- method
Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.
Returns: (transfer free energy, log10 of the octanol/water partition coefficient = log Pow)
-
mdpow.fep.
pCW
(G1, G2, **kwargs)[source]¶ Compute water-cyclohexane partition coefficient from two
Gsolv
objects.transfer free energy from water into cyclohexane:
DeltaDeltaG0 = DeltaG0_cyclohexane - DeltaG0_water
cyclohexane/water partition coefficient:
log P_CW = log [X]_cyclohexane/[X]_water
Arguments: - G1, G2
G1 and G2 should be a
Ghyd
and aGcyclo
instance, but order does not matter- force
force rereading of data files even if some data were already stored [False]
- stride
analyze every stride-th datapoint in the dV/dlambda files
- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- estimator
Set to
alchemlyb
if you want to use alchemlyb estimators, ormdpow
if you want the default TI method.- method
Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.
Returns: (transfer free energy, log10 of the cyclohexane/water partition coefficient = log Pcw)
-
mdpow.fep.
pTW
(G1, G2, **kwargs)[source]¶ Compute water-toluene partition coefficient from two
Gsolv
objects.transfer free energy from water into toluene:
DeltaDeltaG0 = DeltaG0_tol - DeltaG0_water
toluene/water partition coefficient:
log P_tol/wat = log [X]_tol/[X]_wat
Arguments: - G1, G2
G1 and G2 should be a
Ghyd
and aGtol
instance, but order does not matter- force
force rereading of data files even if some data were already stored [False]
- stride
analyze every stride-th datapoint in the dV/dlambda files
- start
Start frame of data analyzed in every fep window.
- stop
Stop frame of data analyzed in every fep window.
- SI
Set to
True
if you want to perform statistical inefficiency to preprocess the data.- estimator
Set to
alchemlyb
if you want to use alchemlyb estimators, ormdpow
if you want the default TI method.- method
Use TI, BAR or MBAR method in alchemlyb, or TI in mdpow.
Returns: (transfer free energy, log10 of the toluene/water partition coefficient = log Ptw)
5.4. Developer notes¶
A user really only needs to access classes derived from
mdpow.fep.Gsolv
; all other classes and functions are auxiliary and
only of interest to developers.
Additional objects that support mdpow.fep.Gsolv
.
-
class
mdpow.fep.
FEPschedule
[source]¶ Describe mdp parameter choices as key - value pairs.
The FEP schedule can be loaded from a configuration parser with the static method
FEPschedule.load()
.See the example runinput file for an example. It contains the sections:
[FEP_schedule_Coulomb] name = Coulomb description = dis-charging vdw+q --> vdw label = Coul couple_lambda0 = vdw-q couple_lambda1 = vdw # soft core alpha: linear scaling for coulomb sc_alpha = 0 lambdas = 0, 0.25, 0.5, 0.75, 1.0 [FEP_schedule_VDW] name = vdw description = decoupling vdw --> none label = VDW couple_lambda0 = vdw couple_lambda1 = none # recommended values for soft cores (Mobley, Shirts et al) sc_alpha = 0.5 sc_power = 1 sc_sigma = 0.3 lambdas = 0.0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1
-
static
load
(cfg, section)[source]¶ Initialize a
FEPschedule
from the section in the configuration cfg
-
mdp_dict
¶ Dict of key-values that can be set in a mdp file.
-
static
-
mdpow.fep.
N_AVOGADRO
¶ Avogadro’s constant NA in mol-1 (NA NIST value).
-
mdpow.fep.
kBOLTZ
¶ Boltzmann’s constant kB in kJ mol-1 (kB NIST value).
5.4.1. TODO¶
run minimization, NVT-equil, NPT-equil prior to production (probably use preprocessed topology from
grompp -pp
for portability)See Free Energy Tutorial.