mdpow.analysis
— Collection of analysis and plotting functions¶
Simple functions to quickly plot data. Typically, it works best if ran interactively from the top level of the POW directory!
Experimental values are loaded from the targets list (targets.csv
)
and computed values from the table in pow.txt
. See
plot_exp_vs_comp()
for details.
Prepare data¶
First copy the computed results , the pow.txt
and energies.txt
files that are produced by mdpow-pow, into the data
directories.
Then format them:
./lib/scripts/make_tables.sh data/*
The experimental data are taken from targets.numbers. In
numbers export the table as UTF-8 in CSV format to
experimental/targets.csv
. This is only necessary if the experimental data
were changed. We only plot entries for which
- a id number (first column no) is defined (should be unique)
- a logPow value exists
- a itp_name exists, which must correspond to the molecule name used in
mdpow.fep.Gsolv
Making graphs¶
For the following, import the module:
import mdpow.analysis
Octanol-water partition coefficients¶
Plot results and save to a pdf file with plot_exp_vs_comp()
:
mdpow.analysis.plot_exp_vs_comp(figname="figs/logPow.pdf")
By default we also include the SAMPL2 and reference (Ref) set results. In practice some manual adjustments are required, e.g.
mdpow.analysis.plot_exp_vs_comp()
# resize window so that (huge!) legend fits
ylim(-22,10)
savefig("figs/logPow.pdf")
Using a file named exclusions.txt
in the same directory as the
data file, one can exclude certain runs from appearing in the graph:
set the exclusions keyword to True
:
pylab.clf()
mdpow.analysis.plot_exp_vs_comp(exclusions=True, figname='figs/logPow_best.pdf')
In practice, manual fiddling is required such as resizing the graph:
mdpow.analysis.plot_exp_vs_comp(exclusions=True)
ylim(-14,10)
savefig('figs/logPow_best.pdf')
The exclusions.txt
files must contain a table such as
Table[exclusions]: These sims are ignored.
======== ===========================================
itp_name directory_regex
======== ===========================================
AXX .*
5FH benzylhyd$
======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis.
Solvation energies¶
Plots that compare experimental hydration and octanol-solvation free energies to computed values. DeltaG_hyd are only available for a few compounds so we only plot a subset of all the compounds that we have done.
Experimental octanol solvation free energies are computed from experimental logPow and DeltaGhyd from
logPow = -(DeltaGoct-DeltaGhyd)/kT * log10(e)
(see also gsolv2logpow()
) as
DeltaGoct = DeltaGhyd - kT*logPow / log10(e)
Warning
In principle the temperature T of the logPow measurement and the DeltaGhyd measurement must be the same. For the time being we just assume that both were done at T=300K. Also, the error on DeltaGoct is not calculated properly at the moment, either, because the error on logPow is hard to quantify (based on the logKow data). We are estimating the error on kT*logPow/log10(e) error as 0.5 kcal/mol and combine it with the known experimental error for DeltaGhyd.
Plotting uses the GsolvData.plot()
method from GsolvData
:
from pylab import *
clf()
G = mdpow.analysis.GsolvData()
G.plot('hyd')
# adjust things such as manually increasing window ...
ylim(-180,20)
savefig("figs/ghyd.pdf")
clf()
G.plot('oct')
ylim(-150,20)
savefig("figs/goct.pdf")
Right now, the plots are a bit messy but I opted to include the legend to make it easier for us to understand the data. I had to manually increase the plotting window to make things fit properly.
GsolvData
also honours the exlusions = True
keyword argument.
Functions¶
-
mdpow.analysis.
plot_exp_vs_comp
(**kwargs)¶ Plot computed logPow against experimental values from default files.
Experimental values are stored in the reST table referenced byt the experiments keyword. data contains a list of
pow.txt
tables for the calculated values.
-
class
mdpow.analysis.
GsolvData
(exp='experimental/targets.csv', **kwargs)¶ Solvation energies organized as a database.
Plot either “hyd” or “oct” with
GsolvData.plot()
.Load experimental and simulation data.
The defaults load all the data generated in the project so far. See
DEFAULTS_E
in the python code.Keywords: - exp
path to the experimental
targets.csv
file- data
list of simulation results (typically stored as reST
energies.txt
).- temperature
temperature at which experimental measurements of logPow were presumed to be taken; needed for the calculations of the experimental DeltaG_oct from logPow and experimental [300.0] DeltaG_hyd.
- exclusions
False
does nothing special.True
: look for exclusions.txt in same directory as each data file. If it contains a table such as:Table[exclusions]: These sims are ignored. ======== =========================================== itp_name directory_regex ======== =========================================== AXX .* 5FH benzylhyd$ ======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [
False
]
-
plot
(mode, **kwargs)¶ Plot individual data points with legend.
- mode
- “hyd” or “oct”
- compoundnames
False
puts the directory names in the legend,True
uses the chemical compound names; “auto” choosesTrue
if exclusions were applied [“auto”]- figname
- write figure to filename; suffix determines file type
- ymin, ymax
- limits of the plot in the y direction (=computational results)
-
class
mdpow.analysis.
ExpComp
(**kwargs)¶ Database combining experimental with computed values.
Keywords: - experiments
path to
targets.csv
- data
list of
pow.txt
paths; default are the files for Ref, run01, SAMPL2 (stored inDEFAULTS_POW
)- exclusions
False
Does nothing special.
True
Look for exclusions.txt in same directory as each data file. If it contains a table such as:
Table[exclusions]: These sims are ignored. ======== =========================================== itp_name directory_regex ======== =========================================== AXX .* 5FH benzylhyd$ ======== ===========================================
then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [
False
]
-
plot
(**kwargs)¶ Plot individual data points with legend.
By default, the following should work:
Run from the top
mdpow
directory.Prepare
data/run01/pow.txt
(must prepend header and append footer so that it is proper table). Seeload_data()
.Prepare
experimental/targets.csv
if it does not exist or if something changed. Seeload_exp()
for details.- compoundnames
False
puts the directory names in the legend,True
uses the chemical compound names; “auto” choosesTrue
if exclusions were applied [“auto”]- figname
write figure to filename; suffix determines file type
- ymin, ymax
limits of the plot in the y direction (=computational results)
-
class
mdpow.analysis.
ExpData
(filename='experimental/targets.csv', **kwargs)¶ Object that represents our experimental data.
Access the raw data via
ExpData.rawdata
and a table enriched with statistics asExpData.data
(which is arecsql.SQLarray
).Load experimental values table and return
recsql.SQLarray
.To obtain
targets.csv
exporttargets.numbers
in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)
-
class
mdpow.analysis.
ComputedData
(filename='data/run01/pow.txt', **kwargs)¶ Object that represents computed data.
Access via
ComputedData.data
.Load computed POW table and return
recsql.SQLarray
.The data file is typically
pow.txt
. It must contain a proper reST table. Use the_header
and_footer
files if you only have the raw output from mdpow-pow.Furthermore, the column names are important because we use them here.
-
mdpow.analysis.
load_data
(filename='data/run01/pow.txt', **kwargs)¶ Load computed POW table and return
recsql.SQLarray
.The data file is typically
pow.txt
. It must contain a proper reST table. Use the_header
and_footer
files if you only have the raw output from mdpow-pow.Furthermore, the column names (defined in the header and footer files!) are important because we use them here.
-
mdpow.analysis.
load_exp
(filename='experimental/targets.csv', **kwargs)¶ Load experimental values table and return
recsql.SQLarray
.To obtain
targets.csv
exporttargets.numbers
in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)
-
mdpow.analysis.
gsolv2logpow
(Gwat, Goct, unit='kcal/mol', temperature=300.0)¶ Calculate logPow from the solvation free energies.
logPow = -(Goct-Ghyd)/kT * log10(e)Note
Default unit is kcal/mol, unlike the rest of mdpow, which uses kJ/mol. The reason is that most solvation free energies in the literature are quoted in kcal/mol.
Arguments: - Gwat
hydration free energy
- Goct
octanol solvation free energy
- temperature
temperature in K [300]
- unit
unit of the energies, either “kcal/mol” or “kJ/mol”; [“kcal/mol”]