4.6.2. Updated nucleic acid analysis — MDAnalysis.analysis.nucleicacids
- Author:
Alia Lescoulie
- Year:
2022-2023
- copyright:
LGPLv2.1
The module provides classes for analyzing nucleic acids structures. This is an updated, higher performance version of previous nucleic acid tools. For applications see [1][2].
References
4.6.2.1. Distances
- class MDAnalysis.analysis.nucleicacids.NucPairDist(selection1: List[AtomGroup], selection2: List[AtomGroup], **kwargs)[source]
Atom pair distance calculation base class.
Takes two lists of
AtomGroupand computes the distances between them over a trajectory. Used as a superclass for the other nucleic acid distances classes. The distance will be measured between atoms sharing an index in the two lists ofAtomGroup.- Parameters:
- results.pair_distances
2D array of pair distances. First dimension is simulation time, second dimension contains the pair distances for each each entry pair in selection1 and selection2.
New in version 2.4.0.
Note
results.pair_distances is slated for deprecation in version 3.0.0, use results.distances instead.
Deprecated since version 2.7.0: results.pair_distances will be removed in version 3.0.0, use
results.distancesinstead.- Type:
- results.distances
stored in a 2d numpy array with first index selecting the Residue pair, and the second index selecting the frame number Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
New in version 2.7.0.
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – If the selections given are not the same length
ValueError – An
AtomGroupin one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroupwhen selecting the atom pairs used in the distance calculations
Version Info
Changed in version 2.5.0: The ability to access by passing selection indices to
resultsis now removed as of MDAnalysis version 2.5.0. Please useresults.pair_distancesinstead. Theresults.timeswas deprecated and is now removed as of MDAnalysis 2.5.0. Please use the class attributetimesinstead.Changed in version 2.7.0: Added static method
select_strand_atomsas a helper for selecting atom pairs for distance analysis.Changed in version 2.9.0: Enabled parallel execution with the
multiprocessinganddaskbackends; use the new methodget_supported_backends()to see all supported backends.- classmethod get_supported_backends()[source]
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...torun()method, or a custom object that hasapplymethod (see documentation forrun()):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBasesubclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True.- Returns:
names of built-in backends that can be used in
run(backend=...)()- Return type:
New in version 2.8.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()to multiple workers and then combine them with a proper_conclude()call. If set toFalse, no backends except forserialare supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBasesubclass instance is parallelizable with split-apply-combine, or not- Return type:
New in version 2.8.0.
- run(start: int = None, stop: int = None, step: int = None, frames: Iterable = None, verbose: bool = None, n_workers: int = None, n_parts: int = None, backend: str | BackendBase = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory;
framescan only be used instead ofstart,stop, andstep. Setting bothframesand at least one ofstart,stop,stepto a non-default value will raise aValueError.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBarfor full list. Available only forbackend='serial'backend (Union[str, BackendBase], optional) –
By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
stris matched to a builtin backend (one ofserial,multiprocessinganddask), or aMDAnalysis.analysis.results.BackendBasesubclass.New in version 2.8.0.
n_workers (int) –
positive integer with number of workers (processes, in case of built-in backends) to split the work between
New in version 2.8.0.
n_parts (int, optional) –
number of parts to split computations across. Can be more than number of workers.
New in version 2.8.0.
unsupported_backend (bool, optional) –
if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
New in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend,n_workers,n_partsandunsupported_backendkeywords, and refactored the method logic to support parallelizable execution.
- static select_strand_atoms(strand1: ResidueGroup, strand2: ResidueGroup, a1_name: str, a2_name: str, g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C') Tuple[List[AtomGroup], List[AtomGroup]][source]
A helper method for nucleic acid pair distance analyses. Used for selecting specific atoms from two strands of nucleic acids.
- Parameters:
strand1 (List[Residue]) – The first nucleic acid strand
strand2 (List[Residue]) – The second nucleic acid strand
a1_name (str) – The selection for the purine base of the strand pair
a2_name (str) – the selection for the pyrimidine base of the strand pair
g_name (str (optional)) – Name of Guanine in topology, by default assigned to G
a_name (str (optional)) – Name of Adenine in topology, by default assigned to A
u_name (str (optional)) – Name of Uracil in topology, by default assigned to U
t_name (str (optional)) – Name of Thymine in topology, by default assigned to T
c_name (str (optional)) – Name of Cytosine in topology, by default assigned to C
- Returns:
returns a tuple containing two lists of
AtomGroups corresponding to the provided selections from each strand.- Return type:
- Raises:
New in version 2.7.0.
- class MDAnalysis.analysis.nucleicacids.WatsonCrickDist(strand1: List[Residue] | ResidueGroup, strand2: List[Residue] | ResidueGroup, n1_name: str = 'N1', n3_name: str = 'N3', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Watson-Crick base pair distance for selected residues over a trajectory.
Takes two
ResidueGroupobjects or two lists ofResidueand calculates the distance between the nitrogen atoms in the Watson-Crick hydrogen bond over the trajectory. Bases are matched either by their index in the twoResidueGroupprovided as arguments, or based on the indices of the provided lists ofResidueobjects depending on which is provided.Note
Support for
Residueis slated for deprecation and will raise a warning when used. It still works butResidueGroupis preferred.- Parameters:
strand1 (ResidueClass) –
First list of bases
Deprecated since version 2.7.0: Using a list of
Residuewill be removed in 3.0.0. Use aResidueGroup.strand2 (ResidueClass) –
Second list of bases
Deprecated since version 2.7.0: Using a list of
Residuewill be removed in 3.0.0. Use aResidueGroup.n1_name (str (optional)) – Name of Nitrogen 1 of nucleic acids, by default assigned to “N1”
n3_name (str (optional)) – Name of Nitrogen 3 of nucleic acids, by default assigned to “N3”
g_name (str (optional)) – Name of Guanine in topology, by default assigned to “G”
a_name (str (optional)) – Name of Adenine in topology, by default assigned to “A”
u_name (str (optional)) – Name of Uracil in topology, by default assigned to “U”
t_name (str (optional)) – Name of Thymine in topology, by default assigned to “T”
c_name (str (optional)) – Name of Cytosine in topology, by default assigned to C
**kwargs (dict) – Key word arguments for
AnalysisBase
- results.distances
Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
New in version 2.7.0.
- Type:
- results.pair_distances
2D array of pair distances. First dimension is simulation time, second dimension contains the pair distances for each each entry pair in selection1 and selection2.
New in version 2.4.0.
Deprecated since version 2.7.0: results.pair_distances will be removed in version 3.0.0, use
results.distancesinstead.- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
TypeError – If the provided list of
Residuecontains non-Residue elements .. deprecated:: 2.7.0 Starting with version 3.0.0, this exception will no longer be raised because onlyResidueGroupwill be allowed.ValueError – If strand1 and strand2 are not the same length
ValueError: – An
AtomGroupin one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroupwhen selecting the atom pairs used in the distance calculations
Version Info
Changed in version 2.5.0: Accessing results by passing strand indices to
resultswas deprecated and is now removed as of MDAnalysis version 2.5.0. Please useresults.pair_distancesinstead. Theresults.timeswas deprecated and is now removed as of MDAnalysis 2.5.0. Please use the class attributetimesinstead.Changed in version 2.7.0: strand1 and strand2 now also accept a
ResidueGroupas input. The previous input type,List[Residue]is still supported, but it is deprecated and will be removed in release 3.0.0.- classmethod get_supported_backends()
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...torun()method, or a custom object that hasapplymethod (see documentation forrun()):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBasesubclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True.- Returns:
names of built-in backends that can be used in
run(backend=...)()- Return type:
New in version 2.8.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()to multiple workers and then combine them with a proper_conclude()call. If set toFalse, no backends except forserialare supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBasesubclass instance is parallelizable with split-apply-combine, or not- Return type:
New in version 2.8.0.
- run(start: int = None, stop: int = None, step: int = None, frames: Iterable = None, verbose: bool = None, n_workers: int = None, n_parts: int = None, backend: str | BackendBase = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory;
framescan only be used instead ofstart,stop, andstep. Setting bothframesand at least one ofstart,stop,stepto a non-default value will raise aValueError.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBarfor full list. Available only forbackend='serial'backend (Union[str, BackendBase], optional) –
By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
stris matched to a builtin backend (one ofserial,multiprocessinganddask), or aMDAnalysis.analysis.results.BackendBasesubclass.New in version 2.8.0.
n_workers (int) –
positive integer with number of workers (processes, in case of built-in backends) to split the work between
New in version 2.8.0.
n_parts (int, optional) –
number of parts to split computations across. Can be more than number of workers.
New in version 2.8.0.
unsupported_backend (bool, optional) –
if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
New in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend,n_workers,n_partsandunsupported_backendkeywords, and refactored the method logic to support parallelizable execution.
- class MDAnalysis.analysis.nucleicacids.MinorPairDist(strand1: ResidueGroup, strand2: ResidueGroup, o2_name: str = 'O2', c2_name: str = 'C2', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Minor-Pair basepair distance for selected residues over a trajectory.
Takes two
ResidueGroupobjects and calculates the Minor-groove hydrogen bond length between the nitrogen and oxygen atoms over the trajectory. Bases are matched by their index in the twoResidueGroupprovided as arguments.- Parameters:
strand1 (List[Residue]) – First list of bases
strand2 (List[Residue]) – Second list of bases
o2_name (str (optional)) – Name of Oxygen 2 of nucleic acids; by default assigned to “O2”;
c2_name (str (optional)) – Name of Carbon 2 of nucleic acids; by default assigned to “C2”;
g_name (str (optional)) – Name of Guanine in topology; by default assigned to “G”;
a_name (str (optional)) – Name of Adenine in topology by default assigned to “A”;
u_name (str (optional)) – Name of Uracil in topology; by default assigned to “U”;
t_name (str (optional)) – Name of Thymine in topology; by default assigned to “T”;
c_name (str (optional)) – Name of Cytosine in topology; by default assigned to “C”;
**kwargs – keyword arguments for
AnalysisBase
- results.distances
stored in a 2d numpy array with first index selecting the Residue pair, and the second index selecting the frame number
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – If the selections given are not the same length A
Residuein one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroupwhen selecting the atom pairs used in the distance calculations
New in version 2.7.0.
- classmethod get_supported_backends()
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...torun()method, or a custom object that hasapplymethod (see documentation forrun()):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBasesubclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True.- Returns:
names of built-in backends that can be used in
run(backend=...)()- Return type:
New in version 2.8.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()to multiple workers and then combine them with a proper_conclude()call. If set toFalse, no backends except forserialare supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBasesubclass instance is parallelizable with split-apply-combine, or not- Return type:
New in version 2.8.0.
- run(start: int = None, stop: int = None, step: int = None, frames: Iterable = None, verbose: bool = None, n_workers: int = None, n_parts: int = None, backend: str | BackendBase = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory;
framescan only be used instead ofstart,stop, andstep. Setting bothframesand at least one ofstart,stop,stepto a non-default value will raise aValueError.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBarfor full list. Available only forbackend='serial'backend (Union[str, BackendBase], optional) –
By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
stris matched to a builtin backend (one ofserial,multiprocessinganddask), or aMDAnalysis.analysis.results.BackendBasesubclass.New in version 2.8.0.
n_workers (int) –
positive integer with number of workers (processes, in case of built-in backends) to split the work between
New in version 2.8.0.
n_parts (int, optional) –
number of parts to split computations across. Can be more than number of workers.
New in version 2.8.0.
unsupported_backend (bool, optional) –
if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
New in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend,n_workers,n_partsandunsupported_backendkeywords, and refactored the method logic to support parallelizable execution.
- class MDAnalysis.analysis.nucleicacids.MajorPairDist(strand1: ResidueGroup, strand2: ResidueGroup, n4_name: str = 'N4', o6_name: str = 'O6', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Minor-Pair base pair distance for selected residues over a trajectory.
Takes two
ResidueGroupobjects and calculates the Major-groove hydrogen bond length between the nitrogen and oxygen atoms over the trajectory. Bases are matched by their index in the twoResidueGroupprovided as arguments.- Parameters:
strand1 (List[Residue]) – First list of bases
strand2 (List[Residue]) – Second list of bases
o6_name (str (optional)) – Name of Oxygen 6 of nucleic acids; by default assigned to “O6”
n4_name (str (optional)) – Name of Nitrogen 4 of nucleic acids; by default assigned to “N4”
g_name (str (optional)) – Name of Guanine in topology; by default assigned to “G”
a_name (str (optional)) – Name of Adenine in topology; by default assigned to “A”
u_name (str (optional)) – Name of Uracil in topology; by default assigned to “U”
t_name (str (optional)) – Name of Thymine in topology; by default assigned to “T”
c_name (str (optional)) – Name of Cytosine in topology; by default assigned to “C”
**kwargs – arguments for
AnalysisBase
- results.distances
Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – A
Residuein one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroupwhen selecting the atom pairs used in the distance calculationsValueError – if the selections given are not the same length
New in version 2.7.0.
- classmethod get_supported_backends()
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...torun()method, or a custom object that hasapplymethod (see documentation forrun()):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBasesubclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True.- Returns:
names of built-in backends that can be used in
run(backend=...)()- Return type:
New in version 2.8.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()to multiple workers and then combine them with a proper_conclude()call. If set toFalse, no backends except forserialare supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBasesubclass instance is parallelizable with split-apply-combine, or not- Return type:
New in version 2.8.0.
- run(start: int = None, stop: int = None, step: int = None, frames: Iterable = None, verbose: bool = None, n_workers: int = None, n_parts: int = None, backend: str | BackendBase = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory;
framescan only be used instead ofstart,stop, andstep. Setting bothframesand at least one ofstart,stop,stepto a non-default value will raise aValueError.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBarfor full list. Available only forbackend='serial'backend (Union[str, BackendBase], optional) –
By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
stris matched to a builtin backend (one ofserial,multiprocessinganddask), or aMDAnalysis.analysis.results.BackendBasesubclass.New in version 2.8.0.
n_workers (int) –
positive integer with number of workers (processes, in case of built-in backends) to split the work between
New in version 2.8.0.
n_parts (int, optional) –
number of parts to split computations across. Can be more than number of workers.
New in version 2.8.0.
unsupported_backend (bool, optional) –
if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
New in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend,n_workers,n_partsandunsupported_backendkeywords, and refactored the method logic to support parallelizable execution.