5.15. PDB Topology Parser
This topology parser uses a standard PDB file to build a minimum internal structure representation (list of atoms).
The topology reader reads a PDB file line by line and ignores atom
numbers but only reads residue numbers up to 9,999 correctly. If you
have systems containing at least 10,000 residues then you need to use
a different file format (e.g. the “extended” PDB, XPDB format, see
ExtendedPDBParser) that can handle residue
numbers up to 99,999.
Note
Atomtypes will be created from elements if they are present and valid. Otherwise, they will be guessed on Universe creation. By default, masses will also be guessed on Universe creation. This may change in release 3.0. See Guesser modules for more information.
Note
The parser processes atoms and their names. Partial charges are not set. Elements are parsed if they are valid. If partially missing or incorrect, empty records are assigned.
See also
5.15.1. Classes and Functions
- class MDAnalysis.topology.PDBParser.PDBParser(filename)[source]
Parser that obtains a list of atoms from a standard PDB file.
- Creates the following Attributes (if present):
names
chainids
tempfactors
occupancies
record_types (ATOM/HETATM)
resids
resnames
segids
elements
bonds
formalcharges
Note that PDBParser accepts an optional keyword argument
force_chainids_to_segids. If set toTrue, the chain IDs (even if empty values are in the chain ID column in the file) will forcibly be used instead of the segment IDs for creating segments.See also
New in version 0.8.
Changed in version 0.18.0: Added parsing of Record types
Changed in version 1.0.0: Added parsing of valid Elements
Changed in version 2.0.0: Bonds attribute is not added if no bonds are present in PDB file. If elements are invalid or partially missing, empty elements records are now assigned (Issue #2422). Aliased
bfactorstopologyattribute totempfactors.bfactorsis deprecated and will be removed in 3.0 (Issue #1901)Changed in version 2.3.0: Formal charges are now read from PDB files if present. No formalcharge attribute is created if no formal charges are present in the PDB file. Any formal charges not set are assumed to have a value of 0. Raise UserWarning instead RuntimeError when CONECT records are corrupt.
Changed in version 2.5.0: Formal charges will not be populated if an unknown entry is encountered, instead a UserWarning is emitted.
Changed in version 2.8.0: Removed type and mass guessing (attributes guessing takes place now through universe.guess_TopologyAttrs() API).
Changed in version 2.10.0: segID is read from 73-76 instead of 67-76 and added the force_chainids_to_segids keyword argument. Some infos in logger will be generated if the segids is not present or if the chainids are not completely equal to segids.
- close()
Close the trajectory file.
- convert_forces_from_native(force, inplace=True)
Conversion of forces array force from native to base units
- Parameters:
force (array_like) – Forces to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
- convert_forces_to_native(force, inplace=True)
Conversion of force array force from base to native units.
- Parameters:
force (array_like) – Forces to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
- convert_pos_from_native(x, inplace=True)
Conversion of coordinate array x from native units to base units.
- Parameters:
x (array_like) – Positions to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
Falseso that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_pos_to_native(x, inplace=True)
Conversion of coordinate array x from base units to native units.
- Parameters:
x (array_like) – Positions to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
Falseso that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_time_from_native(t, inplace=True)
Convert time t from native units to base units.
- Parameters:
t (array_like) – Time values to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
Falseso that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_time_to_native(t, inplace=True)
Convert time t from base units to native units.
- Parameters:
t (array_like) – Time values to transform
inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)
Changed in version 0.7.5: Keyword inplace can be set to
Falseso that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_velocities_from_native(v, inplace=True)
Conversion of velocities array v from native to base units
- Parameters:
v (array_like) – Velocities to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
- convert_velocities_to_native(v, inplace=True)
Conversion of coordinate array v from base to native units
- Parameters:
v (array_like) – Velocities to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
- parse(**kwargs)[source]
Parse atom information from PDB file
- Return type:
MDAnalysis Topology object
- units = {'length': None, 'time': None, 'velocity': None}
dict with units of of time and length (and velocity, force, … for formats that support it)
- MDAnalysis.topology.PDBParser.fetch_pdb(pdb_ids=None, cache_path=None, progressbar=False, file_format='pdb.gz')[source]
Download one or more PDB files from the RCSB Protein Data Bank and cache them locally.
Given one or multiple PDB IDs, downloads the corresponding structure files format and stores them in a local cache directory. If files are cached on disk, fetch_pdb will skip the download and use the cached version instead.
Returns the path(s) as a string to the downloaded file(s).
- Parameters:
pdb_ids (str or sequence of str) – A single PDB ID as a string, or a sequence of PDB IDs to fetch.
cache_path (str or pathlib.Path) – Directory where downloaded file(s) will be cached. The default
Noneargument uses thepoochdefault cache with project nameDEFAULT_CACHE_NAME_DOWNLOADER.file_format (str) – The file extension/format to download (e.g., “cif”, “pdb”). See the Notes section below for a list of all supported file formats.
progressbar (bool, optional) – If True, display a progress bar during file downloads. Default is False.
- Returns:
The path(s) to the downloaded file(s). Returns a single string if one PDB ID is given, or a list of strings if multiple PDB IDs are provided.
- Return type:
- Raises:
ValueError – For an invalid file format. Supported file formats are under Notes.
requests.exceptions.HTTPError – If an invalid PDB code is specified. Note that this is
requests, not the standard libraryurllib.request.
Notes
This function uses the RCSB File Download Services for directly downloading structure files via https.
The RCSB currently provides data in
'cif','cif.gz','bcif','bcif.gz','xml','xml.gz','pdb','pdb.gz','pdb1','pdb1.gz'file formats and can therefore be downloaded. Not all of these formats can be currently read with MDAnalysis.Caching, controlled by the cache_path parameter, is handled internally by
pooch. The default cache name is taken fromDEFAULT_CACHE_NAME_DOWNLOADER. To clear cache (and subsequently force re-fetching), it is required to delete the cache folder as specified by cache_path.Examples
Download a single PDB file:
>>> mda.fetch_pdb("1AKE", file_format="cif") './MDAnalysis_pdbs/1AKE.cif'
Download multiple PDB files with a progress bar:
>>> mda.fetch_pdb(["1AKE", "4BWZ"], progressbar=True) ['./MDAnalysis_pdbs/1AKE.pdb.gz', './MDAnalysis_pdbs/4BWZ.pdb.gz']
Download a single PDB file and convert it to a universe:
>>> mda.Universe(mda.fetch_pdb("1AKE"), file_format="pdb.gz") <Universe with 3816 atoms>
Download multiple PDB files and convert each of them into a universe:
>>> [mda.Universe(pdb) for pdb in mda.fetch_pdb(["1AKE", "4BWZ"], progressbar=True)] [<Universe with 3816 atoms>, <Universe with 2824 atoms>]
New in version 2.11.0.
- MDAnalysis.topology.PDBParser.DEFAULT_CACHE_NAME_DOWNLOADER = 'MDAnalysis_pdbs'
Name of the
poochcache directorypooch.os_cache(DEFAULT_CACHE_NAME_DOWNLOADER); seepooch.os_cache()for further details.New in version 2.11.0.