7.3. Automated Dihedral Analysis

New in version 0.9.0.

7.3.1. mdpow.workflows.dihedrals — Automation for DihedralAnalysis

dihedrals module provides functions for automated workflows that encompass DihedralAnalysis. See each function for requirements and examples.

Most functions can be used as standalone, individually, or in combination depending on the desired results. Details of the completely automated workflow are discussed under automated_dihedral_analysis().

Atom indices obtained by get_atom_indices() are 0-based, atom index labels on the molecule in the plots are 0-based, but atom names in plots and file names are 1-based.

mdpow.workflows.dihedrals.SOLVENTS_DEFAULT = ('water', 'octanol')

Default solvents are water and octanol:

  • must match solvents used in project directory
  • one or two solvents can be specified
  • current solvents supported,
mdpow.workflows.dihedrals.INTERACTIONS_DEFAULT = ('Coulomb', 'VDW')

Default interactions set to Coulomb and VDW:

  • default values should not be changed
  • order should not be changed
mdpow.workflows.dihedrals.SMARTS_DEFAULT = [!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]

Default SMARTS string to identify relevant dihedral atom groups:

  • [!#1] : any atom, not Hydrogen
  • ~ : any bond
  • [!$(*#*)&!D1] : any atom that is not part of linear triple bond and not atom with 1 explicit bond
  • -!@ : single bond that is not ring bond
  • [!$(*#*)&!D1]-!@[!$(*#*)&!D1] : the central portion selects two atoms that are not involved in a triple bond and are not terminal, that are connected by a single, non-ring bond
  • [!#1]~` or `~[!#1] : the first and last portion specify any bond, to any atom that is not hydrogen
mdpow.workflows.dihedrals.PLOT_WIDTH_DEFAULT = 190

Plot width (plot_pdf_width) should be provided in millimeters (mm), and is converted to pixels (px) for use with cairosvg.

conversion factor: 1 mm = 3.7795275591 px default value: 190 mm = 718.110236229 pixels

mdpow.workflows.dihedrals.automated_dihedral_analysis(dirname, resname, figdir=None, df_save_dir=None, molname=None, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]', plot_pdf_width=190, dataframe=None, padding=45, width=0.9, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]

Runs DihedralAnalysis for a single MDPOW project and creates violin plots of dihedral angle frequencies for each relevant dihedral atom group.

For one MDPOW project, automatically determines all relevant dihedral atom groups in the molecule, runs DihedralAnalysis for each group, pads the dihedral angles to maintain periodicity, creates violin plots of dihedral angle frequencies (KDEs), and saves publication quality PDF figures for each group, separately.

Optionally saves all pre-padded DihedralAnalysis results as a single pandas.DataFrame in df_save_dir provided.

Keywords:
dirname
Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.
figdir
path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)
resname
resname for the molecule as defined in the topology and trajectory
df_save_dir
optional, path to the location to save results pandas.DataFrame
molname
molecule name to be used for labelling plots, if different from resname
SMARTS
The default SMARTS string is described in detail under SMARTS_DEFAULT.
plot_pdf_width
The default value for width of plot output is described in detail under PLOT_WIDTH_DEFAULT.
dataframe
optional, if DihedralAnalysis was done prior, then results pandas.DataFrame can be input to utilize angle padding and violin plotting functionality
padding

value in degrees default: 45

width

width of the violin element (>1 overlaps) default: 0.9

solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
interactions
The default interactions are documented under INTERACTIONS_DEFAULT.
start, stop, step

arguments passed to run(), as parameters for iterating through the trajectories of the current ensemble

See also

EnsembleAnalysis

Example

Typical Workflow:

import dihedrals

dihedrals.automated_dihedral_analysis(dirname='/foo/bar/MDPOW_project_data',
                                figdir='/foo/bar/MDPOW_figure_directory',
                                resname='UNK', molname='benzene',
                                padding=45, width=0.9,
                                solvents=('water','octanol'),
                                interactions=('Coulomb','VDW'),
                                start=0, stop=100, step=10)
mdpow.workflows.dihedrals.build_universe(dirname, solvents=('water', 'octanol'))[source]

Builds Universe from the ./Coulomb/0000 topology and trajectory of the project for the first solvent specified.

Output used by rdkit_conversion() and get_atom_indices() to obtain atom indices for each dihedral atom group.

Keywords:
dirname
Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.
solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
Returns:
u
Universe object
mdpow.workflows.dihedrals.rdkit_conversion(u, resname)[source]

Converts the solute, resname, of the Universe to rdkit.Chem.rdchem.Mol object for use with a SMARTS selection string to identify dihedral atom groups.

Accepts Universe object made with build_universe() and a resname as input. Uses resname to select the solute for conversion by RDKitConverter to rdkit.Chem.rdchem.Mol, and will add element attributes for Hydrogen if not listed in the topology, using MDAnalysis.topology.guessers.guess_atom_element().

Keywords:
u
Universe object
resname
resname for the molecule as defined in the topology and trajectory
Returns:
tuple(mol, solute)
function call returns tuple, see below
mol
rdkit.Chem.rdchem.Mol object converted from solute
solute
the MDAnalysis AtomGroup for the solute
mdpow.workflows.dihedrals.get_atom_indices(mol, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]')[source]

Uses a SMARTS selection string to identify atom indices for relevant dihedral atom groups.

Requires a rdkit.Chem.rdchem.Mol object as input for the SMARTS_DEFAULT kwarg to match patterns to and identify relevant dihedral atom groups.

Keywords:
mol
rdkit.Chem.rdchem.Mol object converted from solute
SMARTS
The default SMARTS string is described in detail under SMARTS_DEFAULT.
Returns:
atom_indices
tuple of tuples of indices for each dihedral atom group
mdpow.workflows.dihedrals.get_bond_indices(mol, atom_indices)[source]

From the rdkit.Chem.rdchem.Mol object, uses atom_indices to determine the indices of the bonds between those atoms for each dihedral atom group.

Keywords:
mol
rdkit.Chem.rdchem.Mol object converted from solute
atom_indices
tuple of tuples of indices for each dihedral atom group
Returns:
bond_indices
tuple of tuples of indices for the bonds in each dihedral atom group
mdpow.workflows.dihedrals.get_dihedral_groups(solute, atom_indices)[source]

Uses the 0-based atom_indices of the relevant dihedral atom groups determined by get_atom_indices() and returns the 1-based index names for each atom in each group.

Requires the atom_indices from get_atom_indices() to index the solute specified by select_atoms() and return an array of the names of each atom within its respective dihedral atom group as identified by the SMARTS selection string.

Keywords:
solute
the MDAnalysis AtomGroup for the solute
atom_indices
tuple of tuples of indices for each dihedral atom group
Returns:
dihedral_groups
list of numpy.array() for atom names in each dihedral atom group
mdpow.workflows.dihedrals.get_paired_indices(atom_indices, bond_indices, dihedral_groups)[source]

Combines atom_indices and bond_indices in tuples to be paired with their respective dihedral atom groups.

A dictionary is created with key-value pairs as follows: atom_indices and bond_indices are joined in a tuple as the value, with the key being the respective member of dihedral_groups to facilitate highlighting the relevant dihedral atom group when generating violin plots. As an example, ‘C1-N2-O3-S4’: ((0, 1, 2, 3), (0, 1, 2)), would be one key-value pair in the dictionary.

Keywords:
atom_indices
tuple of tuples of indices for each dihedral atom group
bond_indices
tuple of tuples of indices for the bonds in each dihedral atom group
dihedral_groups
list of numpy.array() for atom names in each dihedral atom group
Returns:
name_index_pairs
dictionary with key-value pair for dihedral atom group, atom indices, and bond indices
mdpow.workflows.dihedrals.dihedral_groups_ensemble(dirname, atom_indices, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]

Creates one Ensemble for the MDPOW project and runs DihedralAnalysis for each dihedral atom group identified by the SMARTS selection string.

Keywords:
dirname
Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an Ensemble the lambda directories are explored and _load_universe_from_dir() searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.
atom_indices

tuples of atom indices for dihedral atom groups

solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
interactions
The default interactions are documented under INTERACTIONS_DEFAULT.
start, stop, step

arguments passed to run(), as parameters for iterating through the trajectories of the current ensemble

See also

EnsembleAnalysis

Returns:
df
pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project
mdpow.workflows.dihedrals.save_df(df, df_save_dir, resname, molname=None)[source]

Takes a pandas.DataFrame of results from DihedralAnalysis as input before padding the angles to optionaly save the raw data.

Optionally saves results before padding the angles for periodicity and plotting dihedral angle frequencies as KDE violins with dihedral_violins(). Given a parent directory, creates subdirectory for molecule, saves fully sampled, unpadded results pandas.DataFrame as a compressed csv file, default: .csv.bz2.

Keywords:
df
pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project
df_save_dir
optional, path to the location to save results pandas.DataFrame
resname
resname for the molecule as defined in the topology and trajectory
molname
molecule name to be used for labelling plots, if different from resname
mdpow.workflows.dihedrals.periodic_angle_padding(df, padding=45)[source]

Pads the angles from the results DataFrame to maintain periodicity in the violin plots.

Takes a pandas.DataFrame of results from DihedralAnalysis or dihedral_groups_ensemble() as input and pads the angles to maintain periodicity for properly plotting dihedral angle frequencies as KDE violins with dihedral_violins() and plot_dihedral_violins(). Creates two new pandas.DataFrame based on the padding value specified, pads the angle values, concatenates all three pandas.DataFrame, maintaining original data and adding padded values, and returns new augmented pandas.DataFrame.

Keywords:
df
pandas.DataFrame of DihedralAnalysis results, including all dihedral atom groups for molecule of current project
padding
value in degrees to specify angle augmentation threshold default: 45
Returns:
df_aug
augmented results pandas.DataFrame containing padded dihedral angles as specified by padding
mdpow.workflows.dihedrals.dihedral_violins(df, width=0.9, solvents=('water', 'octanol'), plot_title=None)[source]

Plots kernel density estimates (KDE) of dihedral angle frequencies for one dihedral atom group as violin plots, using as input the augmented pandas.DataFrame from periodic_angle_padding().

Output is converted to SVG by build_svg() and final output is saved as PDF by plot_dihedral_violins()

Keywords:
df
augmented results pandas.DataFrame from periodic_angle_padding()
width
width of the violin element (>1 overlaps) default: 0.9
solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
plot_title
generated by build_svg() using molname, dihedral_groups, atom_indices, and interactions in this order and format: f’{molname}, {name[0]} {a} | ‘’{col_name}’
Returns:
g
returns a seaborn.FacetGrid object containing a violin plot of the kernel density estimates (KDE) of the dihedral angle frequencies for each dihedral atom group identified by SMARTS_DEFAULT
mdpow.workflows.dihedrals.build_svg(mol, molname, name_index_pairs, atom_group_selection, solvents=('water', 'octanol'), width=0.9)[source]

Converts and combines figure components into an SVG object to be converted and saved as a publication quality PDF.

Keywords:
mol
rdkit.Chem.rdchem.Mol object converted from solute
molname

molecule name to be used for labelling plots, if different from resname (in this case, carried over from an upstream

decision between the two)
name_index_pairs

dictionary with key-value pair for dihedral atom group, atom indices, and bond indices

atom_group_selection

name of each section in the groupby series of atom group selections

solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
width
width of the violin element (>1 overlaps) default: 0.9
Returns:
fig
svgutils SVG figure object
mdpow.workflows.dihedrals.plot_dihedral_violins(df, resname, mol, name_index_pairs, figdir=None, molname=None, width=0.9, plot_pdf_width=190, solvents=('water', 'octanol'))[source]

Coordinates plotting and saving figures for all dihedral atom groups.

Makes a subdirectory for the current project within the specified figdir using resname or molname as title and saves production quality PDFs for each dihedral atom group separately.

Keywords:
df
augmented results pandas.DataFrame from periodic_angle_padding()
resname
resname for the molecule as defined in the topology and trajectory
mol
rdkit.Chem.rdchem.Mol object converted from solute
name_index_pairs

dictionary with key-value pair for dihedral atom group, atom indices, and bond indices

figdir
path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)
molname
molecule name to be used for labelling plots, if different from resname
width

width of the violin element (>1 overlaps) default: 0.9

plot_pdf_width
The default value for width of plot output is described in detail under PLOT_WIDTH_DEFAULT.
solvents
The default solvents are documented under SOLVENTS_DEFAULT. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.