7.3. Automated Dihedral Analysis¶
New in version 0.9.0.
7.3.1. mdpow.workflows.dihedrals
— Automation for DihedralAnalysis
¶
dihedrals
module provides functions for automated
workflows that encompass DihedralAnalysis
.
See each function for requirements and examples.
Most functions can be used as standalone, individually, or in combination
depending on the desired results. Details of the completely automated workflow
are discussed under automated_dihedral_analysis()
.
Atom indices obtained by get_atom_indices()
are 0-based,
atom index labels on the molecule in the plots are 0-based,
but atom names in plots and file names are 1-based.
-
mdpow.workflows.dihedrals.
SOLVENTS_DEFAULT
= ('water', 'octanol')¶ Default solvents are water and octanol:
- must match solvents used in project directory
- one or two solvents can be specified
- current solvents supported,
See also
-
mdpow.workflows.dihedrals.
INTERACTIONS_DEFAULT
= ('Coulomb', 'VDW')¶ Default interactions set to Coulomb and VDW:
- default values should not be changed
- order should not be changed
-
mdpow.workflows.dihedrals.
SMARTS_DEFAULT
= [!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]¶ Default SMARTS string to identify relevant dihedral atom groups:
[!#1]
: any atom, not Hydrogen~
: any bond[!$(*#*)&!D1]
: any atom that is not part of linear triple bond and not atom with 1 explicit bond-!@
: single bond that is not ring bond[!$(*#*)&!D1]-!@[!$(*#*)&!D1]
: the central portion selects two atoms that are not involved in a triple bond and are not terminal, that are connected by a single, non-ring bond[!#1]~` or `~[!#1]
: the first and last portion specify any bond, to any atom that is not hydrogen
-
mdpow.workflows.dihedrals.
PLOT_WIDTH_DEFAULT
= 190¶ Plot width (plot_pdf_width) should be provided in millimeters (mm), and is converted to pixels (px) for use with
cairosvg
.conversion factor: 1 mm = 3.7795275591 px default value: 190 mm = 718.110236229 pixels
-
mdpow.workflows.dihedrals.
automated_dihedral_analysis
(dirname, resname, figdir=None, df_save_dir=None, molname=None, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]', plot_pdf_width=190, dataframe=None, padding=45, width=0.9, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]¶ Runs
DihedralAnalysis
for a single MDPOW project and creates violin plots of dihedral angle frequencies for each relevant dihedral atom group.For one MDPOW project, automatically determines all relevant dihedral atom groups in the molecule, runs
DihedralAnalysis
for each group, pads the dihedral angles to maintain periodicity, creates violin plots of dihedral angle frequencies (KDEs), and saves publication quality PDF figures for each group, separately.Optionally saves all pre-padded
DihedralAnalysis
results as a singlepandas.DataFrame
in df_save_dir provided.Keywords: - dirname
- Molecule Simulation directory. Loads simulation files present in
lambda directories into the new instance. With this method for
generating an
Ensemble
the lambda directories are explored and_load_universe_from_dir()
searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available. - figdir
- path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)
- resname
- resname for the molecule as defined in the topology and trajectory
- df_save_dir
- optional, path to the location to save results
pandas.DataFrame
- molname
- molecule name to be used for labelling plots, if different from resname
- SMARTS
- The default SMARTS string is described in detail under
SMARTS_DEFAULT
. - plot_pdf_width
- The default value for width of plot output is described in detail under
PLOT_WIDTH_DEFAULT
. - dataframe
- optional, if
DihedralAnalysis
was done prior, then resultspandas.DataFrame
can be input to utilize angle padding and violin plotting functionality - padding
value in degrees default: 45
See also
- width
width of the violin element (>1 overlaps) default: 0.9
See also
- solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent. - interactions
- The default interactions are documented under
INTERACTIONS_DEFAULT
. - start, stop, step
arguments passed to
run()
, as parameters for iterating through the trajectories of the current ensembleSee also
Example
Typical Workflow:
import dihedrals dihedrals.automated_dihedral_analysis(dirname='/foo/bar/MDPOW_project_data', figdir='/foo/bar/MDPOW_figure_directory', resname='UNK', molname='benzene', padding=45, width=0.9, solvents=('water','octanol'), interactions=('Coulomb','VDW'), start=0, stop=100, step=10)
-
mdpow.workflows.dihedrals.
build_universe
(dirname, solvents=('water', 'octanol'))[source]¶ Builds
Universe
from the./Coulomb/0000
topology and trajectory of the project for the first solvent specified.Output used by
rdkit_conversion()
andget_atom_indices()
to obtain atom indices for each dihedral atom group.Keywords: - dirname
- Molecule Simulation directory. Loads simulation files present in lambda directories into the new instance. With this method for generating an
Ensemble
the lambda directories are explored and_load_universe_from_dir()
searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available.- solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.
Returns: - u
Universe
object
-
mdpow.workflows.dihedrals.
rdkit_conversion
(u, resname)[source]¶ Converts the solute, resname, of the
Universe
tordkit.Chem.rdchem.Mol
object for use with a SMARTS selection string to identify dihedral atom groups.Accepts
Universe
object made withbuild_universe()
and a resname as input. Uses resname to select the solute for conversion byRDKitConverter
tordkit.Chem.rdchem.Mol
, and will add element attributes for Hydrogen if not listed in the topology, usingMDAnalysis.topology.guessers.guess_atom_element()
.Keywords: - u
Universe
object- resname
- resname for the molecule as defined in the topology and trajectory
Returns: - tuple(mol, solute)
- function call returns tuple, see below
- mol
rdkit.Chem.rdchem.Mol
object converted from solute- solute
- the
MDAnalysis
AtomGroup for the solute
-
mdpow.workflows.dihedrals.
get_atom_indices
(mol, SMARTS='[!#1]~[!$(*#*)&!D1]-!@[!$(*#*)&!D1]~[!#1]')[source]¶ Uses a SMARTS selection string to identify atom indices for relevant dihedral atom groups.
Requires a
rdkit.Chem.rdchem.Mol
object as input for theSMARTS_DEFAULT
kwarg to match patterns to and identify relevant dihedral atom groups.Keywords: - mol
rdkit.Chem.rdchem.Mol
object converted from solute- SMARTS
- The default SMARTS string is described in detail under
SMARTS_DEFAULT
.
Returns: - atom_indices
- tuple of tuples of indices for each dihedral atom group
-
mdpow.workflows.dihedrals.
get_bond_indices
(mol, atom_indices)[source]¶ From the
rdkit.Chem.rdchem.Mol
object, uses atom_indices to determine the indices of the bonds between those atoms for each dihedral atom group.Keywords: - mol
rdkit.Chem.rdchem.Mol
object converted from solute- atom_indices
- tuple of tuples of indices for each dihedral atom group
Returns: - bond_indices
- tuple of tuples of indices for the bonds in each dihedral atom group
-
mdpow.workflows.dihedrals.
get_dihedral_groups
(solute, atom_indices)[source]¶ Uses the 0-based atom_indices of the relevant dihedral atom groups determined by
get_atom_indices()
and returns the 1-based index names for each atom in each group.Requires the atom_indices from
get_atom_indices()
to index the solute specified byselect_atoms()
and return an array of the names of each atom within its respective dihedral atom group as identified by the SMARTS selection string.Keywords: - solute
- the
MDAnalysis
AtomGroup for the solute - atom_indices
- tuple of tuples of indices for each dihedral atom group
Returns: - dihedral_groups
- list of
numpy.array()
for atom names in each dihedral atom group
-
mdpow.workflows.dihedrals.
get_paired_indices
(atom_indices, bond_indices, dihedral_groups)[source]¶ Combines atom_indices and bond_indices in tuples to be paired with their respective dihedral atom groups.
A dictionary is created with key-value pairs as follows: atom_indices and bond_indices are joined in a tuple as the value, with the key being the respective member of dihedral_groups to facilitate highlighting the relevant dihedral atom group when generating violin plots. As an example, ‘C1-N2-O3-S4’: ((0, 1, 2, 3), (0, 1, 2)), would be one key-value pair in the dictionary.
Keywords: - atom_indices
- tuple of tuples of indices for each dihedral atom group
- bond_indices
- tuple of tuples of indices for the bonds in each dihedral atom group
- dihedral_groups
- list of
numpy.array()
for atom names in each dihedral atom group
Returns: - name_index_pairs
- dictionary with key-value pair for dihedral atom group, atom indices, and bond indices
-
mdpow.workflows.dihedrals.
dihedral_groups_ensemble
(dirname, atom_indices, solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'), start=None, stop=None, step=None)[source]¶ Creates one
Ensemble
for the MDPOW project and runsDihedralAnalysis
for each dihedral atom group identified by the SMARTS selection string.Keywords: - dirname
- Molecule Simulation directory. Loads simulation files present in
lambda directories into the new instance. With this method for
generating an
Ensemble
the lambda directories are explored and_load_universe_from_dir()
searches for .gro, .gro.bz2, .gro.gz, and .tpr files for topology, and .xtc files for trajectory. It will default to using the tpr file available. - atom_indices
tuples of atom indices for dihedral atom groups
See also
- solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent. - interactions
- The default interactions are documented under
INTERACTIONS_DEFAULT
. - start, stop, step
arguments passed to
run()
, as parameters for iterating through the trajectories of the current ensembleSee also
Returns: - df
pandas.DataFrame
ofDihedralAnalysis
results, including all dihedral atom groups for molecule of current project
-
mdpow.workflows.dihedrals.
save_df
(df, df_save_dir, resname, molname=None)[source]¶ Takes a
pandas.DataFrame
of results fromDihedralAnalysis
as input before padding the angles to optionaly save the raw data.Optionally saves results before padding the angles for periodicity and plotting dihedral angle frequencies as KDE violins with
dihedral_violins()
. Given a parent directory, creates subdirectory for molecule, saves fully sampled, unpadded resultspandas.DataFrame
as a compressed csv file, default: .csv.bz2.Keywords: - df
pandas.DataFrame
ofDihedralAnalysis
results, including all dihedral atom groups for molecule of current project- df_save_dir
- optional, path to the location to save results
pandas.DataFrame
- resname
- resname for the molecule as defined in the topology and trajectory
- molname
- molecule name to be used for labelling plots, if different from resname
-
mdpow.workflows.dihedrals.
periodic_angle_padding
(df, padding=45)[source]¶ Pads the angles from the results
DataFrame
to maintain periodicity in the violin plots.Takes a
pandas.DataFrame
of results fromDihedralAnalysis
ordihedral_groups_ensemble()
as input and pads the angles to maintain periodicity for properly plotting dihedral angle frequencies as KDE violins withdihedral_violins()
andplot_dihedral_violins()
. Creates two newpandas.DataFrame
based on the padding value specified, pads the angle values, concatenates all threepandas.DataFrame
, maintaining original data and adding padded values, and returns new augmentedpandas.DataFrame
.Keywords: - df
pandas.DataFrame
ofDihedralAnalysis
results, including all dihedral atom groups for molecule of current project- padding
- value in degrees to specify angle augmentation threshold default: 45
Returns: - df_aug
- augmented results
pandas.DataFrame
containing padded dihedral angles as specified by padding
-
mdpow.workflows.dihedrals.
dihedral_violins
(df, width=0.9, solvents=('water', 'octanol'), plot_title=None)[source]¶ Plots kernel density estimates (KDE) of dihedral angle frequencies for one dihedral atom group as violin plots, using as input the augmented
pandas.DataFrame
fromperiodic_angle_padding()
.Output is converted to SVG by
build_svg()
and final output is saved as PDF byplot_dihedral_violins()
Keywords: - df
- augmented results
pandas.DataFrame
fromperiodic_angle_padding()
- width
- width of the violin element (>1 overlaps) default: 0.9
- solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent. - plot_title
- generated by
build_svg()
using molname, dihedral_groups, atom_indices, and interactions in this order and format: f’{molname}, {name[0]} {a} | ‘’{col_name}’
Returns: - g
- returns a
seaborn.FacetGrid
object containing a violin plot of the kernel density estimates (KDE) of the dihedral angle frequencies for each dihedral atom group identified bySMARTS_DEFAULT
-
mdpow.workflows.dihedrals.
build_svg
(mol, molname, name_index_pairs, atom_group_selection, solvents=('water', 'octanol'), width=0.9)[source]¶ Converts and combines figure components into an SVG object to be converted and saved as a publication quality PDF.
Keywords: - mol
rdkit.Chem.rdchem.Mol
object converted from solute- molname
molecule name to be used for labelling plots, if different from resname (in this case, carried over from an upstream
decision between the two)- name_index_pairs
dictionary with key-value pair for dihedral atom group, atom indices, and bond indices
See also
- atom_group_selection
name of each section in the groupby series of atom group selections
See also
- solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent. - width
- width of the violin element (>1 overlaps) default: 0.9
Returns: - fig
svgutils
SVG figure object
-
mdpow.workflows.dihedrals.
plot_dihedral_violins
(df, resname, mol, name_index_pairs, figdir=None, molname=None, width=0.9, plot_pdf_width=190, solvents=('water', 'octanol'))[source]¶ Coordinates plotting and saving figures for all dihedral atom groups.
Makes a subdirectory for the current project within the specified figdir using resname or molname as title and saves production quality PDFs for each dihedral atom group separately.
Keywords: - df
- augmented results
pandas.DataFrame
fromperiodic_angle_padding()
- resname
- resname for the molecule as defined in the topology and trajectory
- mol
rdkit.Chem.rdchem.Mol
object converted from solute- name_index_pairs
dictionary with key-value pair for dihedral atom group, atom indices, and bond indices
See also
- figdir
- path to the location to save figures (REQUIRED but marked as a kwarg for technical reasons; will be changed in #244)
- molname
- molecule name to be used for labelling plots, if different from resname
- width
width of the violin element (>1 overlaps) default: 0.9
See also
- plot_pdf_width
- The default value for width of plot output is described in detail under
PLOT_WIDTH_DEFAULT
. - solvents
- The default solvents are documented under
SOLVENTS_DEFAULT
. Normally takes a two-tuple, but analysis is compatible with single solvent selections. Single solvent analyses will result in a figure with fully filled violins for the single solvent.