mdpow.analysis — Collection of analysis and plotting functions

Simple functions to quickly plot data. Typically, it works best if ran interactively from the top level of the POW directory!

Experimental values are loaded from the targets list (targets.csv) and computed values from the table in pow.txt. See plot_exp_vs_comp() for details.

Prepare data

First copy the computed results , the pow.txt and energies.txt files that are produced by mdpow-pow, into the data directories.

Then format them:

./lib/scripts/make_tables.sh data/*

The experimental data are taken from targets.numbers. In numbers export the table as UTF-8 in CSV format to experimental/targets.csv. This is only necessary if the experimental data were changed. We only plot entries for which

  • a id number (first column no) is defined (should be unique)
  • a logPow value exists
  • a itp_name exists, which must correspond to the molecule name used in mdpow.fep.Gsolv

Making graphs

For the following, import the module:

import mdpow.analysis

Octanol-water partition coefficients

Plot results and save to a pdf file with plot_exp_vs_comp():

mdpow.analysis.plot_exp_vs_comp(figname="figs/logPow.pdf")

By default we also include the SAMPL2 and reference (Ref) set results. In practice some manual adjustments are required, e.g.

mdpow.analysis.plot_exp_vs_comp()
# resize window so that (huge!) legend fits
ylim(-22,10)
savefig("figs/logPow.pdf")

Using a file named exclusions.txt in the same directory as the data file, one can exclude certain runs from appearing in the graph: set the exclusions keyword to True:

pylab.clf()
mdpow.analysis.plot_exp_vs_comp(exclusions=True, figname='figs/logPow_best.pdf')

In practice, manual fiddling is required such as resizing the graph:

mdpow.analysis.plot_exp_vs_comp(exclusions=True)
ylim(-14,10)
savefig('figs/logPow_best.pdf')

The exclusions.txt files must contain a table such as

Table[exclusions]: These sims are ignored.
======== ===========================================
itp_name directory_regex
======== ===========================================
AXX      .*
5FH      benzylhyd$
======== ===========================================

then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis.

Solvation energies

Plots that compare experimental hydration and octanol-solvation free energies to computed values. DeltaG_hyd are only available for a few compounds so we only plot a subset of all the compounds that we have done.

Experimental octanol solvation free energies are computed from experimental logPow and DeltaGhyd from

logPow = -(DeltaGoct-DeltaGhyd)/kT * log10(e)

(see also gsolv2logpow()) as

DeltaGoct = DeltaGhyd - kT*logPow / log10(e)

Warning

In principle the temperature T of the logPow measurement and the DeltaGhyd measurement must be the same. For the time being we just assume that both were done at T=300K. Also, the error on DeltaGoct is not calculated properly at the moment, either, because the error on logPow is hard to quantify (based on the logKow data). We are estimating the error on kT*logPow/log10(e) error as 0.5 kcal/mol and combine it with the known experimental error for DeltaGhyd.

Plotting uses the GsolvData.plot() method from GsolvData:

from pylab import *
clf()
G = mdpow.analysis.GsolvData()
G.plot('hyd')
# adjust things such as manually increasing window ...
ylim(-180,20)
savefig("figs/ghyd.pdf")

clf()
G.plot('oct')
ylim(-150,20)
savefig("figs/goct.pdf")

Right now, the plots are a bit messy but I opted to include the legend to make it easier for us to understand the data. I had to manually increase the plotting window to make things fit properly.

GsolvData also honours the exlusions = True keyword argument.

Functions

mdpow.analysis.plot_exp_vs_comp(**kwargs)

Plot computed logPow against experimental values from default files.

Experimental values are stored in the reST table referenced byt the experiments keyword. data contains a list of pow.txt tables for the calculated values.

class mdpow.analysis.GsolvData(exp='experimental/targets.csv', **kwargs)

Solvation energies organized as a database.

Plot either “hyd” or “oct” with GsolvData.plot().

Load experimental and simulation data.

The defaults load all the data generated in the project so far. See DEFAULTS_E in the python code.

Keywords:
exp

path to the experimental targets.csv file

data

list of simulation results (typically stored as reST energies.txt).

temperature

temperature at which experimental measurements of logPow were presumed to be taken; needed for the calculations of the experimental DeltaG_oct from logPow and experimental [300.0] DeltaG_hyd.

exclusions

False does nothing special. True: look for exclusions.txt in same directory as each data file. If it contains a table such as:

Table[exclusions]: These sims are ignored.
======== ===========================================
itp_name directory_regex
======== ===========================================
AXX      .*
5FH      benzylhyd$
======== ===========================================

then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [False]

plot(mode, **kwargs)

Plot individual data points with legend.

mode
“hyd” or “oct”
compoundnames
False puts the directory names in the legend, True uses the chemical compound names; “auto” chooses True if exclusions were applied [“auto”]
figname
write figure to filename; suffix determines file type
ymin, ymax
limits of the plot in the y direction (=computational results)
class mdpow.analysis.ExpComp(**kwargs)

Database combining experimental with computed values.

Keywords:
experiments

path to targets.csv

data

list of pow.txt paths; default are the files for Ref, run01, SAMPL2 (stored in DEFAULTS_POW)

exclusions
False

Does nothing special.

True

Look for exclusions.txt in same directory as each data file. If it contains a table such as:

Table[exclusions]: These sims are ignored.
======== ===========================================
itp_name directory_regex
======== ===========================================
AXX      .*
5FH      benzylhyd$
======== ===========================================

then any simulation of a molecule equalling itp_name and which is recorded with a directory matching the regular expression directory_regex will be excluded from the analysis. [False]

plot(**kwargs)

Plot individual data points with legend.

By default, the following should work:

  • Run from the top mdpow directory.

  • Prepare data/run01/pow.txt (must prepend header and append footer so that it is proper table). See load_data().

  • Prepare experimental/targets.csv if it does not exist or if something changed. See load_exp() for details.

    compoundnames

    False puts the directory names in the legend, True uses the chemical compound names; “auto” chooses True if exclusions were applied [“auto”]

    figname

    write figure to filename; suffix determines file type

    ymin, ymax

    limits of the plot in the y direction (=computational results)

class mdpow.analysis.ExpData(filename='experimental/targets.csv', **kwargs)

Object that represents our experimental data.

Access the raw data via ExpData.rawdata and a table enriched with statistics as ExpData.data (which is a recsql.SQLarray).

Load experimental values table and return recsql.SQLarray.

To obtain targets.csv export targets.numbers in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)

class mdpow.analysis.ComputedData(filename='data/run01/pow.txt', **kwargs)

Object that represents computed data.

Access via ComputedData.data.

Load computed POW table and return recsql.SQLarray.

The data file is typically pow.txt. It must contain a proper reST table. Use the _header and _footer files if you only have the raw output from mdpow-pow.

Furthermore, the column names are important because we use them here.

mdpow.analysis.load_data(filename='data/run01/pow.txt', **kwargs)

Load computed POW table and return recsql.SQLarray.

The data file is typically pow.txt. It must contain a proper reST table. Use the _header and _footer files if you only have the raw output from mdpow-pow.

Furthermore, the column names (defined in the header and footer files!) are important because we use them here.

mdpow.analysis.load_exp(filename='experimental/targets.csv', **kwargs)

Load experimental values table and return recsql.SQLarray.

To obtain targets.csv export targets.numbers in Numbers as UTF-8 (important!) in the **CSV* format (File->Export)

mdpow.analysis.gsolv2logpow(Gwat, Goct, unit='kcal/mol', temperature=300.0)

Calculate logPow from the solvation free energies.

logPow = -(Goct-Ghyd)/kT * log10(e)

Note

Default unit is kcal/mol, unlike the rest of mdpow, which uses kJ/mol. The reason is that most solvation free energies in the literature are quoted in kcal/mol.

Arguments:
Gwat

hydration free energy

Goct

octanol solvation free energy

temperature

temperature in K [300]

unit

unit of the energies, either “kcal/mol” or “kJ/mol”; [“kcal/mol”]