PDBTM: Protein Data Bank of Transmembrane Proteins
PDBTM version: 2017-01-26
Number of transmembrane proteins: 3084 (alpha: 2710 , beta: 362 )

PDBTM User Manual

Using the PDBTM Web Server Finding a PDBTM entry

There are three possible ways to search for a PDBTM entry.

1 - The user can search for a specific PDB structure by specifying the PDB ID in the input field under the 'Search' -> 'Basic Search' -> 'By Code' menu and clicking 'Submit' or by typing the PDB ID into the navigation bar at the top right corner and hitting enter. The server searches for the requested PDBTM entry. If the entry has been found, it is shown by the PDBTM molecule viewer.

2 - By keyword search. Select 'Search' -> 'Basic Search' -> 'By Keyword' menu and type the keyword(s) into the input field. Special words AND and OR are used as logical operators. Multiple words without logical operators are concatenated by logical AND. Matched entries are returned in a list.

3 - By the type and number of transmembrane segments. This is done by selecting 'Search' -> 'Basic Search' -> 'By Type' menu and specifying the type (alpha helical or beta barrel) in the submenu and selecting the number of transmembrane segments. The user can perform fine-tuned search requests using the address line query.

Users can query custom requests by using the 'Search' -> 'Advanced search' menu. Users should type the query string followed by the respective table name between [] brackets. Users can place AND, OR tokens into the query for more customized and specific search results. The following field titles can be used for searches in the database.

+------------+----------------------------------+ | Field | Description | +------------+----------------------------------+ | pdb_id | PDB code | | ch_id | chain ID | | type | alpha-helical, beta-barrel, ... | | title | TITLE section of PDB file | | numtm | number of transmembrane segments | | seq | sequence | | n_ifh | number of interfacial helices | | n_loop | number of loops | | source | SOURCE section of PDB file | | class | HEADER section of PDB file | | keyword | keywords | | creation | date of creation | | lmod_date | date of last modification | | lmod_descr | description of last mod. | +------------+----------------------------------+ EXAMPLE:

Searching for alpha helical (type=0) transmembrane proteins that contain a six-helix membrane region (numtm=6) and have 2 interfacial helices 0 [type] 6 [numtm] 2 [n_ifh]

An alternative, advanced way for employing database searches is by simply typing the following URL into the addressbar of your browser. This gives the same results as the previous technique. http://pdbtm.enzim.hu/?_=/hitlist/Chain/type/=/0/AND/numtm/=/6/AND/n_ifh/=/2 As you can see, the main difference between the last two methods is that the second one enables the definition of various relations for each parameter, while the first one enables only equalities.
Downloading PDBTM entries

The PDBTM web server offers various files for download through the download menu. The first set of downloads offers various sets of raw PDBTM xml files. There are xml files for alpha-helical or beta-barrel transmembrane proteins separately (pdbtmalpha, pdbtmbeta respectively), or both in one file (pdbtmall). The file "pdbtm" contains all entries that are in the pdb database including transmembrane and not-transmembrane proteins as well.

The next section of the download area contains only sequence information of the various sets defined above. The non-redundant sequence sets are procuded by using the cd-hit algorithm with parameters "-c 0.4 -n 2 -l 30" (wordsize: 2, percent identity: 40%, sequence length longer than 30).

The last part of downloads contains some C library functions for researchers developing programs using PDBTM xml files.
PDBTM update statistics

The PDBTM database is updated in every Thursday (see PDBTM updates for details). The first part of the statistics page shows the monthly increase on a graph. It is followed by the weekly increase in tabular form. In both cases the statistics of alpha helical and beta-barrel transmembrane proteins are shown separately as well.
PDBTM updates

The PDBTM database is updated in every Thursday after the PDB database update. The update is done by a semi-automated method. First, the TMDET algorithm is run on every new pdb file. The results are checked manually by inspecting the structure of all new transmembrane protein candidates using the OpenAstexViewer molecular visualization program.
PDBTM list view

Search results comprising multiple entries are shown by list view. The list includes the PDB identifier codes, some icons to handle the item, and the creation and modification dates in the header line for each entry. The following lines contain a thumbnail of the molecule structure as well as the header information from PDB and finally the chain list with the number of transmembrane helices in parenthesis. The following icons are used in the header line:

Show the original PDB entry. Download the original PDB entry in gzip format.
Show the transformed PDB entry. Download the transformed PDB entry.
Show the PDBTM xml file of the entry. Download the PDBTM xml file of the entry.
Launch the PDBTM molecule viewer

The thumbnail of the molecule structure is generated using OpenAstexViewer.
PDBTM molecule viewer

The PDBTM molecule viewer uses three windows: one for displaying the structure of proteins, one for showing sequences together with the segmentation information generated by the TMDET algorithm and one window to assist the user in downloading related information.

The PDBTM molecule viewer can only be applied on transmembrane proteins. Upon requesting non-transmembrane proteins it only returns a "pdbId is not transmembrane proteins" message.

The viewer uses java and javascript, therefore both of these should be installed for the user's browser.

NOTE
Newer Java versions block this applet, since our modified OpenAstex viewer is not digitally signed. To solve this issue, please, lower your security level in you Java control.

Structure window

In this window the cartoon representation of the given protein colored according to the segment type is shown generated by the OpenAstexViewer molecule viewer java applet.

The molecule can be rotated by dragging using the left mouse button while zooming in and out can be done using the mouse wheel (or with SHIFT+drag). Right clicking brings up a menu for further manipulation. Please refer to the OpenAstexViewer manuals for further details about this menu.

Should you find any unusual behavior in the image generation, please reload the page - this should solve the problem.
Sequence window

In the sequence window the sequence regions are coloured according to the segment type determined by the TMDET algorithm. By clicking on a sequence region, the representation type of the corresponding structure part turns from cartoon to sphere. When the mouse is on a sequence segment, a small tooltip box pops up and displays the segment type and its position in the sequence.

There is a combobox, containing the chain IDs of the given protein. This enables the selection of the desired chain ID if the structure contains multiple chains.
Download window

In this window the same icons are used as in PDBTM list view for downloading pdb files and the raw pdbtm, xml files.
Links window

In this window links related to the given protein are shown. There are two links, one to the corresponding PDB entry and one to the PDBsum entry.
PDBTM data format Format overview

The PDBTM database uses the xml file format to store data generated by the TMDET algorithm. The PDBTM xml format is defined in the PDBTM XML Schema Definition
Records details COPYRIGHT record

The copyright notice. It should read:

All information, data and files are copyright. The PDBTM database is produced in the Institute of Enzymology, Budapest, Hungary. There are no restrictions on its usage by non-profit institutions as long as its content are in no way modified and this statement is not removed from the entries. Usage by and for commercial entities requires a license agreement (send an email to tusi at enzim dot hu).
CREATE_DATE record

The date when the entry was created.

<CREATE_DATE>2003-08-11</CREATE_DATE>
MODIFICATION record

Description of the modifications made on an existing entry. The record has two child records containing the date of the modification and the description of the modification.

<MODIFICATION> <DATE>2005-04-06</DATE> <DESCR>Format has been changed to pdbtm format v2.0</DESCR> </MODIFICATION>
RAWRES record

This record contains some primary data determined by the TMDET algorithm. These are TMRES, TMTYPE, SPRES and PDBKWRES.

The TMRES record contains the Q-value, i.e. the numerical results of the TMDET algorithm (see in our article).

TMTYPE is the type of the protein according to the membrane spanning region, determined by TMDET. For non-transmembrane proteins TMTYPE should be Soluble, No_Protein, Nucleotide, Virus, Pilus or Ca_Globular. For transmembrane proteins the following types are defined:

  1. Tm_Alpha : alpha-helical TM protein
  2. Tm_Beta : beta-barrel TM protein
  3. Tm_Coil : secondary structure of TM part can not be determined
  4. Tm_Ca : low-resolution alpha-helical TM protein
  5. Tm_Part : globular fragment of a (probable) TM protein.

If TMTYPE is Tm_Part, then TMP should be "no".

SPRES is determined by using the corresponding SwissProt entry. The following types are defined:

  1. Soluble : in SwissProt marked as non transmembrane.
  2. Tm_Alpha : in SwissProt "FT TRANSMEM" line found and protein type is Tm_Alpha.
  3. Tm_Beta : in SwissProt "FT TRANSMEM" line found and protein type is Tm_Beta.
  4. Tm_Part : in SwissProt "FT TRANSMEM" line found but the region does not overlap with the pdb sequence.
  5. Unknown : no corresponding SwissProt file has been found.

PDBKWRES is set to "yes" if the transmembrane character is explicitely stated in the PDB header, otherwise it is set to "no".

<RAWRES> <TMRES>80.54</TMRES> <TMTYPE>Tm_Alpha</TMTYPE> <SPRES>Unknown</SPRES> <PDBKWRES>yes</PDBKWRES> </RAWRES>
BIOMATRIX record

BIOMATRIX contains the matrix transformations used to generate the "Biomolecule", i.e. the oligomer structure of the protein that has been shown (or is believed) to be functional (see in our article).

The transformation should be applied to the chains defined in the APPLY_TO_CHAIN record, where the chain identification of the newly generated chain is given by the NEW_ID attribute.

The transformation is defined in the TMATRIX record. TMATRIX is a usual 4x3 transformation matrix. By using notations XX for ROWX X, XY for ROWX Y ... ZT for ROWZ T, the coordinates of the atoms of chain NEW_ID can be generated from the coordinates of chain ID by the following formulas: (chain NEW_ID)x=(chain ID)x*XX+(chain ID)y*XY+(chain ID)z*XZ+XT (chain NEW_ID)y=(chain ID)x*YX+(chain ID)y*YY+(chain ID)z*YZ+YT (chain NEW_ID)z=(chain ID)x*ZX+(chain ID)y*ZY+(chain ID)z*ZZ+ZT <BIOMATRIX> <NOTE> The original biomatrix is described in the PDB entry 1e12. The same transformation matrices were applied to the chain ID="B", containing the lipid molecules. </NOTE> <MATRIX ID="1"> <APPLY_TO_CHAIN CHAINID="A" NEW_CHAINID="C"/> <APPLY_TO_CHAIN CHAINID="B" NEW_CHAINID="D"/> <TMATRIX> <ROWX X="-0.50000000" Y="0.86602497" Z="0.00000000" T="-33.65000153"/> <ROWY X="-0.86602497" Y="-0.50000000" Z="0.00000000" T="58.28350830"/> <ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="0.00000000"/> </TMATRIX> </MATRIX> <MATRIX ID="2"> <APPLY_TO_CHAIN CHAINID="A" NEW_CHAINID="E"/> <APPLY_TO_CHAIN CHAINID="B" NEW_CHAINID="F"/> <TMATRIX> <ROWX X="-0.50000000" Y="-0.86602497" Z="0.00000000" T="33.65000153"/> <ROWY X="0.86602497" Y="-0.50000000" Z="0.00000000" T="58.28350830"/> <ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="0.00000000"/> </TMATRIX> </MATRIX> <DELETE CHAINID="Z"/> </BIOMATRIX>
MEMBRANE record

The MEMBRANE record contains the information for the most likely localization of the membrane relative to the molecule. This is given by a transformation matrix, which transforms the molecule coordinates in such a way that the membrane planes are parallel with the XY plane, and the origin is in the middle of the membrane. The NORMAL record contains the data of the membrane plane's normal vector. Because of the matrix transformation, the X and Y component of the normal vector should be zero or close to zero. The Z component of the normal vector is the half of the membrane width.

<MEMBRANE> <NORMAL X="-0.00000050" Y="-0.00000021" Z="15.50000000"/> <TMATRIX> <ROWX X="1.00000000" Y="0.00000000" Z="0.00000000" T="0.00000260"/> <ROWY X="0.00000000" Y="1.00000000" Z="0.00000000" T="-38.85568237"/> <ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="-81.04399872"/> </TMATRIX> </MEMBRANE>
CHAIN record

All protein chains generated using the biomatrix or listed in the pdb file have a CHAIN record. This record has three attributes:

  • CHAINID: the chain identifier given in the original pdb file or generated by the biomatrix transformation;
  • NUM_TM: the number of transmembrane segments;
  • TYPE: the type of transmembrane segments (alpha, beta or coil (i.e. non alpha and non beta)) or the type of the chain if it does not cross the membrane (non_tm) or if it is not a protein chain (lipid).

Each CHAIN record contains one or more REGION records which locates the chain segment in the space relative to the membrane. The type of REGION can be 1, 2, B, H, C, I, L, F and U for Side1, Side2, Beta-strand, alpha-helix, coil, membrane-inside, membrane-loop, interfacial helix and unknown localizations, respectively. Side1 and Side2 refers to the two sides of the membrane (based solely on the information from the PDB file it is not possible to determine which side is outside or inside). Membrane-inside is the inside part of a beta barrel. Membrane-loop corresponds to a region of the polypeptide chain which does not cross the membrane, just dips into the membrane (for example in aquaporins or potassium-channels). Interfacial helices are alpha helical regions longer than 4 consecutive residues that are close to the membrane surface with a tilt angle smaller than a pre-determined threshold.

The pdb_beg and pdb_end attributes contain the segment localization using the pdb numbering while the seq_beg and seq_end use the numbering in the sequence found in the SEQ record. The sequence in SEQ record is generated by the alignment

<CHAIN CHAINID="A" NUM_TM="7" TYPE="alpha"> <SEQ> AVRENALLSS SLWVNVALAG IAILVFVYMG RTIRPGRPRL IWGATLMIPL VSISSYLGLL SGLTVGMIEM PAGHALAGEM VRSQWGRYLT WALSTPMILL ALGLLADVDL GSLFTVIAAD IGMCVTGLAA AMTTSALLFR WAFYAISCAF FVVVLSALVT DWAASASSAG TAEIFDTLRV LTVVLWLGYP IVWAVGVEGL ALVQSVGATS WAYSVLDVFA KYVFAFILLR WVANNERTVA VAGQTLGTMS SDD </SEQ> <REGION seq_beg="1" pdb_beg="22" seq_end="2" pdb_end="23" type="U"/> <REGION seq_beg="3" pdb_beg="24" seq_end="9" pdb_end="30" type="1"/> <REGION seq_beg="10" pdb_beg="31" seq_end="31" pdb_end="52" type="H"/> <REGION seq_beg="32" pdb_beg="53" seq_end="38" pdb_end="59" type="2"/> <REGION seq_beg="39" pdb_beg="60" seq_end="59" pdb_end="80" type="H"/> <REGION seq_beg="60" pdb_beg="81" seq_end="84" pdb_end="105" type="1"/> <REGION seq_beg="85" pdb_beg="106" seq_end="104" pdb_end="125" type="H"/> <REGION seq_beg="105" pdb_beg="126" seq_end="109" pdb_end="130" type="2"/> <REGION seq_beg="110" pdb_beg="131" seq_end="131" pdb_end="152" type="H"/> <REGION seq_beg="132" pdb_beg="153" seq_end="138" pdb_end="159" type="1"/> <REGION seq_beg="139" pdb_beg="160" seq_end="160" pdb_end="181" type="H"/> <REGION seq_beg="161" pdb_beg="182" seq_end="176" pdb_end="197" type="2"/> <REGION seq_beg="177" pdb_beg="198" seq_end="196" pdb_end="217" type="H"/> <REGION seq_beg="197" pdb_beg="218" seq_end="208" pdb_end="229" type="1"/> <REGION seq_beg="209" pdb_beg="230" seq_end="230" pdb_end="251" type="H"/> <REGION seq_beg="231" pdb_beg="252" seq_end="241" pdb_end="262" type="2"/> <REGION seq_beg="242" pdb_beg="263" seq_end="253" pdb_end="274" type="U"/> </CHAIN>
PDBTM XML Schema Definition

Detailed definition can be found here: pdbtm.xsd