5.3. Automated Dihedral Analysis

Added in version 0.9.0.

5.3.1. mdpow.workflows.dihedrals — Automation for DihedralAnalysis

dihedrals module provides functions for automated workflows that encompass DihedralAnalysis. See each function for requirements and examples.

Most functions can be used as standalone, individually, or in combination depending on the desired results. Details of the completely automated workflow are discussed under automated_dihedral_analysis().

Atom indices obtained by get_atom_indices() are 0-based, atom index labels on the molecule in the plots are 0-based, but atom names in plots and file names are 1-based.

mdpow.workflows.dihedrals.SOLVENTS_DEFAULT = ('water', 'octanol')

Default solvents are water and octanol:

  • must match solvents used in project directory

  • one or two solvents can be specified

  • current solvents supported,

See also

mdpow.forcefields

mdpow.workflows.dihedrals.INTERACTIONS_DEFAULT = ('Coulomb', 'VDW')

Default interactions set to Coulomb and VDW:

  • default values should not be changed

  • order should not be changed

mdpow.workflows.dihedrals.SMARTS_DEFAULT = [!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]

Default SMARTS string to identify relevant dihedral atom groups:

  • [!#1] : any atom, not Hydrogen

  • ~ : any bond

  • [!$(*#*)&!D1] : any atom that is not part of linear triple bond and not atom with 1 explicit bond

  • -!@ : single bond that is not ring bond

  • [!$(*#*)&!D1]-!@[!$(*#*)&!D1] : the central portion selects two atoms that are not involved in a triple bond and are not terminal, that are connected by a single, non-ring bond

  • [!#1]~` or `~[!#1] : the first and last portion specify any bond, to any atom that is not hydrogen

mdpow.workflows.dihedrals.PLOT_WIDTH_DEFAULT = 190

Plot width (plot_pdf_width) should be provided in millimeters (mm), and is converted to pixels (px) for use with cairosvg.

conversion factor: 1 mm = 3.7795275591 px default value: 190 mm = 718.110236229 pixels

mdpow.workflows.dihedrals.automated_dihedral_analysis(dirname, resname, figdir=None, df_save_dir=None, molname=None, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]', plot_pdf_width=190, dataframe=None, padding=45, width=0.9, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]

Runs DihedralAnalysis for a single MDPOW project and creates violin plots of dihedral angle frequencies for each relevant dihedral atom group.

For one MDPOW project, automatically determines all relevant dihedral atom groups in the molecule, runs DihedralAnalysis for each group, pads the dihedral angles to maintain periodicity, creates violin plots of dihedral angle frequencies (KDEs), and saves publication quality PDF figures for each group, separately.

Optionally saves all pre-padded DihedralAnalysis results as a single pandas.DataFrame in df_save_dir provided.

Keywords:

dirname

Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.

figdir

path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)

resname

resname for the molecule as defined in the topology and trajectory

df_save_dir

optional, path to the location to save results pandas.DataFrame

molname

molecule name to be used for labelling plots, if different from resname

SMARTS

The default SMARTS string is described in detail under SMARTS_DEFAULT.

plot_pdf_width

The default value for width of plot output is described in detail under PLOT_WIDTH_DEFAULT.

dataframe

optional, if DihedralAnalysis was done prior, then results pandas.DataFrame can be input to utilize angle padding and violin plotting functionality

padding

value in degrees default: 45

width

width of the violin element (>1 overlaps) default: 0.9

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.

interactions

The default interactions are documented under INTERACTIONS_DEFAULT.

start, stop, step

arguments passed to run(), as parameters for iterating through the trajectories of the current ensemble

See also

EnsembleAnalysis

Example

Typical Workflow:

import dihedrals

dihedrals.automated_dihedral_analysis(dirname='/foo/bar/MDPOW_project_data',
                                figdir='/foo/bar/MDPOW_figure_directory',
                                resname='UNK', molname='benzene',
                                padding=45, width=0.9,
                                solvents=('water','octanol'),
                                interactions=('Coulomb','VDW'),
                                start=0, stop=100, step=10)
mdpow.workflows.dihedrals.build_universe(dirname, solvents=('water', 'octanol'))[source]

Builds Universe from the ./Coulomb/0000 topology and trajectory of the project for the first solvent specified.

Output used by rdkit_conversion() and get_atom_indices() to obtain atom indices for each dihedral atom group.

Keywords:

dirname

Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.

Returns:

u

Universe object

mdpow.workflows.dihedrals.rdkit_conversion(u, resname)[source]

Converts the solute, resname, of the Universe to rdkit.Chem.rdchem.Mol object for use with a SMARTS selection string to identify dihedral atom groups.

Accepts Universe object made with build_universe() and a resname as input. Uses resname to select the solute for conversion by RDKitConverter to rdkit.Chem.rdchem.Mol, and will add element attributes for Hydrogen if not listed in the topology, using MDAnalysis.topology.guessers.guess_atom_element().

Keywords:

u

Universe object

resname

resname for the molecule as defined in the topology and trajectory

Returns:

tuple(mol, solute)

function call returns tuple, see below

mol

rdkit.Chem.rdchem.Mol object converted from solute

solute

the MDAnalysis AtomGroup for the solute

mdpow.workflows.dihedrals.get_atom_indices(mol, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]')[source]

Uses a SMARTS selection string to identify atom indices for relevant dihedral atom groups.

Requires a rdkit.Chem.rdchem.Mol object as input for the SMARTS_DEFAULT kwarg to match patterns to and identify relevant dihedral atom groups.

Keywords:

mol

rdkit.Chem.rdchem.Mol object converted from solute

SMARTS

The default SMARTS string is described in detail under SMARTS_DEFAULT.

Returns:

atom_indices

tuple of tuples of indices for each dihedral atom group

mdpow.workflows.dihedrals.get_bond_indices(mol, atom_indices)[source]

From the rdkit.Chem.rdchem.Mol object, uses atom_indices to determine the indices of the bonds between those atoms for each dihedral atom group.

Keywords:

mol

rdkit.Chem.rdchem.Mol object converted from solute

atom_indices

tuple of tuples of indices for each dihedral atom group

Returns:

bond_indices

tuple of tuples of indices for the bonds in each dihedral atom group

mdpow.workflows.dihedrals.get_dihedral_groups(solute, atom_indices)[source]

Uses the 0-based atom_indices of the relevant dihedral atom groups determined by get_atom_indices() and returns the 1-based index names for each atom in each group.

Requires the atom_indices from get_atom_indices() to index the solute specified by select_atoms() and return an array of the names of each atom within its respective dihedral atom group as identified by the SMARTS selection string.

Keywords:

solute

the MDAnalysis AtomGroup for the solute

atom_indices

tuple of tuples of indices for each dihedral atom group

Returns:

dihedral_groups

list of numpy.array() for atom names in each dihedral atom group

mdpow.workflows.dihedrals.get_paired_indices(atom_indices, bond_indices, dihedral_groups)[source]

Combines atom_indices and bond_indices in tuples to be paired with their respective dihedral atom groups.

A dictionary is created with key-value pairs as follows: atom_indices and bond_indices are joined in a tuple as the value, with the key being the respective member of dihedral_groups to facilitate highlighting the relevant dihedral atom group when generating violin plots. As an example, ‘C1-N2-O3-S4’: ((0, 1, 2, 3), (0, 1, 2)), would be one key-value pair in the dictionary.

Keywords:

atom_indices

tuple of tuples of indices for each dihedral atom group

bond_indices

tuple of tuples of indices for the bonds in each dihedral atom group

dihedral_groups

list of numpy.array() for atom names in each dihedral atom group

Returns:

name_index_pairs

dictionary with key-value pair for dihedral atom group, atom indices, and bond indices

mdpow.workflows.dihedrals.dihedral_groups_ensemble(dirname, atom_indices, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]

Creates one Ensemble for the MDPOW project and runs DihedralAnalysis for each dihedral atom group identified by the SMARTS selection string.

Keywords:

dirname

Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.

atom_indices

tuples of atom indices for dihedral atom groups

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.

interactions

The default interactions are documented under INTERACTIONS_DEFAULT.

start, stop, step

arguments passed to run(), as parameters for iterating through the trajectories of the current ensemble

See also

EnsembleAnalysis

Returns:

df

pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project

mdpow.workflows.dihedrals.save_df(df, df_save_dir, resname, molname=None)[source]

Takes a pandas.DataFrame of results from DihedralAnalysis as input before padding the angles to optionaly save the raw data.

Optionally saves results before padding the angles for periodicity and plotting dihedral angle frequencies as KDE violins with dihedral_violins(). Given a parent directory, creates subdirectory for molecule, saves fully sampled, unpadded results pandas.DataFrame as a compressed csv file, default: .csv.bz2.

Keywords:

df

pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project

df_save_dir

optional, path to the location to save results pandas.DataFrame

resname

resname for the molecule as defined in the topology and trajectory

molname

molecule name to be used for labelling plots, if different from resname

mdpow.workflows.dihedrals.periodic_angle_padding(df, padding=45)[source]

Pads the angles from the results DataFrame to maintain periodicity in the violin plots.

Takes a pandas.DataFrame of results from DihedralAnalysis or dihedral_groups_ensemble() as input and pads the angles to maintain periodicity for properly plotting dihedral angle frequencies as KDE violins with dihedral_violins() and plot_dihedral_violins(). Creates two new pandas.DataFrame based on the padding value specified, pads the angle values, concatenates all three pandas.DataFrame, maintaining original data and adding padded values, and returns new augmented pandas.DataFrame.

Keywords:

df

pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project

padding

value in degrees to specify angle augmentation threshold default: 45

Returns:

df_aug

augmented results pandas.DataFrame containing padded dihedral angles as specified by padding

mdpow.workflows.dihedrals.dihedral_violins(df, width=0.9, solvents=('water', 'octanol'), plot_title=None)[source]

Plots kernel density estimates (KDE) of dihedral angle frequencies for one dihedral atom group as violin plots, using as input the augmented pandas.DataFrame from periodic_angle_padding().

Output is converted to SVG by build_svg() and final output is saved as PDF by plot_dihedral_violins()

Keywords:

df

augmented results pandas.DataFrame from periodic_angle_padding()

width

width of the violin element (>1 overlaps); default: 0.9

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.

plot_title

generated by build_svg() using molname, dihedral_groups, atom_indices, and interactions in this order and format: f’{molname}, {name[0]} {a} | ‘’{col_name}’

Returns:

g

returns a seaborn.FacetGrid object containing a violin plot of the kernel density estimates (KDE) of the dihedral angle frequencies for each dihedral atom group identified by SMARTS_DEFAULT

mdpow.workflows.dihedrals.build_svg(mol, molname, name_index_pairs, atom_group_selection, solvents=('water', 'octanol'), width=0.9)[source]

Converts and combines figure components into an SVG object to be converted and saved as a publication quality PDF.

Keywords:

mol

rdkit.Chem.rdchem.Mol object converted from solute

molname

molecule name to be used for labelling plots, if different from resname (in this case, carried over from an upstream decision between the two)

name_index_pairs

dictionary with key-value pair for dihedral atom group, atom indices, and bond indices

atom_group_selection

name of each section in the groupby series of atom group selections

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.

width

width of the violin element (>1 overlaps); default: 0.9

Returns:

fig

svgutils SVG figure object

mdpow.workflows.dihedrals.plot_dihedral_violins(df, resname, mol, name_index_pairs, figdir=None, molname=None, width=0.9, plot_pdf_width=190, solvents=('water', 'octanol'))[source]

Coordinates plotting and saving figures for all dihedral atom groups.

Makes a subdirectory for the current project within the specified figdir using resname or molname as title and saves production quality PDFs for each dihedral atom group separately.

Keywords:

df

augmented results pandas.DataFrame from periodic_angle_padding()

resname

resname for the molecule as defined in the topology and trajectory

mol

rdkit.Chem.rdchem.Mol object converted from solute

name_index_pairs

dictionary with key-value pair for dihedral atom group, atom indices, and bond indices

figdir

path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)

molname

molecule name to be used for labelling plots, if different from resname

width

width of the violin element (>1 overlaps) default: 0.9

plot_pdf_width

The default value for width of plot output is described in detail under PLOT_WIDTH_DEFAULT.

solvents

The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.