PDBTM User Manual
Using the PDBTM Web Server
Finding a PDBTM entry
There are three possible ways to search for a PDBTM entry.
1 - The user can search for a specific PDB structure by specifying the PDB ID in the input field under the
'Search' -> 'Basic Search' -> 'By Code' menu and clicking 'Submit' or
by typing the PDB ID into the navigation bar at the top right corner and hitting enter.
The server searches for the requested PDBTM entry. If the entry has been found, it
is shown by the PDBTM molecule viewer.
2 - By keyword search. Select 'Search' -> 'Basic Search' -> 'By Keyword' menu
and type the keyword(s) into the input field.
Special words AND and OR are
used as logical operators. Multiple words without logical operators
are concatenated by logical AND.
Matched entries are returned in a list.
3 - By the type and number of transmembrane segments. This is done by selecting
'Search' -> 'Basic Search' -> 'By Type' menu and specifying the
type (alpha helical or beta barrel) in the submenu and selecting the
number of transmembrane segments. The user can perform fine-tuned search
requests using the address line query.
Users can query custom requests by using the 'Search' -> 'Advanced search' menu. Users
should type the query string followed by the respective table name between [] brackets. Users can place
AND, OR tokens into the query for more customized and specific search results.
The following field titles can be used for searches in the database.
+------------+----------------------------------+
| Field | Description |
+------------+----------------------------------+
| pdb_id | PDB code |
| ch_id | chain ID |
| type | alpha-helical, beta-barrel, ... |
| title | TITLE section of PDB file |
| numtm | number of transmembrane segments |
| seq | sequence |
| n_ifh | number of interfacial helices |
| n_loop | number of loops |
| source | SOURCE section of PDB file |
| class | HEADER section of PDB file |
| keyword | keywords |
| creation | date of creation |
| lmod_date | date of last modification |
| lmod_descr | description of last mod. |
+------------+----------------------------------+
EXAMPLE:
Searching for alpha helical (type=0) transmembrane proteins that contain
a six-helix membrane region (numtm=6) and have 2 interfacial helices
0 [type] 6 [numtm] 2 [n_ifh]
An alternative, advanced way for employing database searches is by simply typing the following
URL into the addressbar of your browser. This gives the same results as the previous technique.
http://pdbtm.enzim.hu/?_=/hitlist/Chain/type/=/0/AND/numtm/=/6/AND/n_ifh/=/2
As you can see, the main difference between the last two methods is that the second one
enables the definition of various relations for each parameter, while the first one enables only
equalities.
Downloading PDBTM entries
The PDBTM web server offers various files for download through
the download menu. The first set of downloads offers various sets
of raw PDBTM xml files. There are xml files
for alpha-helical or beta-barrel transmembrane proteins
separately (pdbtmalpha, pdbtmbeta respectively), or both in one
file (pdbtmall). The file "pdbtm" contains all entries that are in
the pdb database including transmembrane and not-transmembrane proteins
as well.
The next section of the download area contains only sequence information
of the various sets defined above. The non-redundant sequence sets
are procuded by using the
cd-hit algorithm with parameters
"-c 0.4 -n 2 -l 30" (wordsize: 2, percent identity: 40%, sequence
length longer than 30).
The last part of downloads contains some C library functions for
researchers developing programs using PDBTM xml files.
PDBTM update statistics
The PDBTM database is updated in every Thursday (see
PDBTM updates for
details). The first part of the statistics page shows the monthly increase on a graph.
It is followed by the weekly increase in
tabular form. In both cases the statistics of alpha helical and
beta-barrel transmembrane proteins are shown separately as well.
PDBTM updates
The PDBTM database is updated in every Thursday after the PDB database
update. The update is done by a semi-automated method. First, the
TMDET algorithm is
run on every new pdb file. The results are checked manually by
inspecting the structure of all new transmembrane protein
candidates using the
OpenAstexViewer molecular visualization program.
PDBTM list view
Search results comprising multiple entries are shown by list view. The list includes
the PDB identifier codes, some icons to handle the item,
and the creation and modification dates in the header line for each entry.
The following lines contain a thumbnail of the molecule structure
as well as the header information from PDB and finally the chain list
with the number of transmembrane helices in parenthesis.
The following icons are used in the header line:
|
Show the original PDB entry.
|
|
Download the original PDB entry in gzip format.
|
|
Show the transformed PDB entry.
|
|
Download the transformed PDB entry.
|
|
Show the PDBTM xml file of the entry.
|
|
Download the PDBTM xml file of the entry.
|
|
Launch the PDBTM molecule viewer
|
The thumbnail of the molecule structure is generated using
OpenAstexViewer.
PDBTM molecule viewer
The PDBTM molecule viewer uses three windows: one for displaying
the structure of proteins, one for showing sequences together
with the segmentation information generated by the TMDET algorithm
and one window to assist the user in downloading related information.
The PDBTM molecule viewer can only be applied on transmembrane proteins.
Upon requesting non-transmembrane proteins it only returns a
"pdbId is not transmembrane proteins" message.
The viewer uses java and javascript, therefore both of these should
be installed for the user's browser.
NOTE
Newer Java versions block this applet, since our modified OpenAstex viewer is not digitally signed.
To solve this issue, please, lower your security level in you Java control.
Structure window
In this window the cartoon representation of the given protein
colored according to the segment type is shown generated by the OpenAstexViewer molecule viewer java applet.
The molecule can be rotated by dragging using the left mouse button while
zooming in and out can be done using the mouse wheel (or with SHIFT+drag).
Right clicking brings up a menu for further manipulation. Please refer to the
OpenAstexViewer
manuals for further details about this menu.
Should you find any unusual behavior in the image generation, please reload the page -
this should solve the problem.
Sequence window
In the sequence window the sequence regions are coloured according to the
segment type determined by the TMDET algorithm. By clicking on a
sequence region, the representation type of the corresponding structure part
turns from cartoon to sphere. When the mouse is on a sequence segment,
a small tooltip box pops up and displays the segment type and its position
in the sequence.
There is a combobox, containing the chain IDs of the given protein. This enables the selection
of the desired chain ID if the structure contains multiple chains.
Download window
In this window the same icons are used as in
PDBTM list view for downloading pdb files and the raw pdbtm, xml files.
Links window
In this window links related to the given protein are shown.
There are two links, one to the corresponding PDB entry and
one to the PDBsum entry.
PDBTM data format
Format overview
The PDBTM database uses the xml file format to store data generated
by the TMDET algorithm. The PDBTM xml format is defined in the
PDBTM XML Schema Definition
Records details
COPYRIGHT record
The copyright notice. It should read:
All information, data and files are copyright. The PDBTM database is
produced in the Institute of Enzymology, Budapest, Hungary. There
are no restrictions on its usage by non-profit institutions as long
as its content are in no way modified and this statement is not
removed from the entries. Usage by and for commercial entities requires
a license agreement (send an email to tusi at enzim dot hu).
CREATE_DATE record
The date when the entry was created.
<CREATE_DATE>2003-08-11</CREATE_DATE>
MODIFICATION record
Description of the modifications made on an existing entry.
The record has two child records containing the date of the modification
and the description of the modification.
<MODIFICATION>
<DATE>2005-04-06</DATE>
<DESCR>Format has been changed to pdbtm format v2.0</DESCR>
</MODIFICATION>
RAWRES record
This record contains some primary data determined by
the TMDET algorithm. These are TMRES,
TMTYPE, SPRES
and PDBKWRES.
The TMRES record contains the Q-value, i.e. the numerical
results of the TMDET algorithm (see in
our article).
TMTYPE is the type of the protein according to the
membrane spanning region, determined by TMDET.
For non-transmembrane proteins TMTYPE should be
Soluble, No_Protein, Nucleotide, Virus, Pilus or
Ca_Globular. For transmembrane proteins the following
types are defined:
- Tm_Alpha : alpha-helical TM protein
- Tm_Beta : beta-barrel TM protein
- Tm_Coil : secondary structure of TM part can not be determined
- Tm_Ca : low-resolution alpha-helical TM protein
- Tm_Part : globular fragment of a (probable) TM protein.
If TMTYPE is Tm_Part, then TMP should be "no".
SPRES is determined by using the corresponding
SwissProt entry. The following types are defined:
- Soluble : in SwissProt marked as non transmembrane.
- Tm_Alpha : in SwissProt "FT TRANSMEM" line found and
protein type is Tm_Alpha.
- Tm_Beta : in SwissProt "FT TRANSMEM" line found and
protein type is Tm_Beta.
- Tm_Part : in SwissProt "FT TRANSMEM" line found
but the region does not overlap with the pdb sequence.
- Unknown : no corresponding SwissProt file has been
found.
PDBKWRES is set to "yes" if the transmembrane character is
explicitely stated in the PDB header, otherwise it is set to "no".
<RAWRES>
<TMRES>80.54</TMRES>
<TMTYPE>Tm_Alpha</TMTYPE>
<SPRES>Unknown</SPRES>
<PDBKWRES>yes</PDBKWRES>
</RAWRES>
BIOMATRIX record
BIOMATRIX contains the matrix transformations used to generate the
"Biomolecule", i.e. the oligomer structure of the protein that has
been shown (or is believed) to be functional (see in
our article).
The transformation should be applied to the chains defined in the
APPLY_TO_CHAIN record, where the chain identification of the newly
generated chain is given by the NEW_ID attribute.
The transformation is defined in the TMATRIX record. TMATRIX is a
usual 4x3 transformation matrix. By using notations XX for ROWX X,
XY for ROWX Y ... ZT for ROWZ T,
the coordinates of the atoms of chain NEW_ID
can be generated from the coordinates of chain ID by the following formulas:
(chain NEW_ID)x=(chain ID)x*XX+(chain ID)y*XY+(chain ID)z*XZ+XT
(chain NEW_ID)y=(chain ID)x*YX+(chain ID)y*YY+(chain ID)z*YZ+YT
(chain NEW_ID)z=(chain ID)x*ZX+(chain ID)y*ZY+(chain ID)z*ZZ+ZT
<BIOMATRIX>
<NOTE>
The original biomatrix is described in the PDB entry 1e12.
The same transformation matrices were applied to the
chain ID="B", containing the lipid molecules.
</NOTE>
<MATRIX ID="1">
<APPLY_TO_CHAIN CHAINID="A" NEW_CHAINID="C"/>
<APPLY_TO_CHAIN CHAINID="B" NEW_CHAINID="D"/>
<TMATRIX>
<ROWX X="-0.50000000" Y="0.86602497" Z="0.00000000" T="-33.65000153"/>
<ROWY X="-0.86602497" Y="-0.50000000" Z="0.00000000" T="58.28350830"/>
<ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="0.00000000"/>
</TMATRIX>
</MATRIX>
<MATRIX ID="2">
<APPLY_TO_CHAIN CHAINID="A" NEW_CHAINID="E"/>
<APPLY_TO_CHAIN CHAINID="B" NEW_CHAINID="F"/>
<TMATRIX>
<ROWX X="-0.50000000" Y="-0.86602497" Z="0.00000000" T="33.65000153"/>
<ROWY X="0.86602497" Y="-0.50000000" Z="0.00000000" T="58.28350830"/>
<ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="0.00000000"/>
</TMATRIX>
</MATRIX>
<DELETE CHAINID="Z"/>
</BIOMATRIX>
MEMBRANE record
The MEMBRANE record contains the information for the most likely
localization of the membrane relative to the molecule. This is
given by a transformation matrix, which transforms the molecule coordinates
in such a way that the membrane planes are parallel with the XY plane,
and the origin is in the middle of the membrane. The NORMAL record
contains the data of the membrane plane's normal vector. Because
of the matrix transformation, the X and Y component of the normal
vector should be zero or close to zero. The Z component of the
normal vector is the half of the membrane width.
<MEMBRANE>
<NORMAL X="-0.00000050" Y="-0.00000021" Z="15.50000000"/>
<TMATRIX>
<ROWX X="1.00000000" Y="0.00000000" Z="0.00000000" T="0.00000260"/>
<ROWY X="0.00000000" Y="1.00000000" Z="0.00000000" T="-38.85568237"/>
<ROWZ X="0.00000000" Y="0.00000000" Z="1.00000000" T="-81.04399872"/>
</TMATRIX>
</MEMBRANE>
CHAIN record
All protein chains generated using the biomatrix or listed in the
pdb file have a CHAIN record. This record has three attributes:
-
CHAINID: the chain identifier given in the original pdb file or
generated by the biomatrix transformation;
-
NUM_TM: the number of transmembrane segments;
-
TYPE: the type of transmembrane segments (alpha, beta
or coil (i.e. non alpha and non beta)) or the type of the chain if
it does not cross the membrane (non_tm) or if it is not a protein chain (lipid).
Each CHAIN record contains one or more REGION records which locates
the chain segment in the space relative to the membrane. The type of
REGION can be 1, 2, B, H, C, I, L, F and U for Side1, Side2, Beta-strand,
alpha-helix, coil, membrane-inside, membrane-loop, interfacial helix and unknown
localizations, respectively. Side1 and Side2 refers to the two sides of the membrane (based solely
on the information from the PDB file it is not possible to determine which side is outside or inside).
Membrane-inside is the inside part of a beta barrel. Membrane-loop
corresponds to a region of the polypeptide chain which does not cross the membrane,
just dips into the membrane (for example in aquaporins or potassium-channels).
Interfacial helices are alpha helical regions longer than 4 consecutive residues that are close to the membrane
surface with a tilt angle smaller than a pre-determined threshold.
The pdb_beg and pdb_end attributes contain the segment localization using the
pdb numbering while the seq_beg and seq_end use the numbering in the sequence
found in the SEQ record.
The sequence in SEQ record is generated by the alignment
<CHAIN CHAINID="A" NUM_TM="7" TYPE="alpha">
<SEQ>
AVRENALLSS SLWVNVALAG IAILVFVYMG RTIRPGRPRL IWGATLMIPL
VSISSYLGLL SGLTVGMIEM PAGHALAGEM VRSQWGRYLT WALSTPMILL
ALGLLADVDL GSLFTVIAAD IGMCVTGLAA AMTTSALLFR WAFYAISCAF
FVVVLSALVT DWAASASSAG TAEIFDTLRV LTVVLWLGYP IVWAVGVEGL
ALVQSVGATS WAYSVLDVFA KYVFAFILLR WVANNERTVA VAGQTLGTMS
SDD
</SEQ>
<REGION seq_beg="1" pdb_beg="22" seq_end="2" pdb_end="23" type="U"/>
<REGION seq_beg="3" pdb_beg="24" seq_end="9" pdb_end="30" type="1"/>
<REGION seq_beg="10" pdb_beg="31" seq_end="31" pdb_end="52" type="H"/>
<REGION seq_beg="32" pdb_beg="53" seq_end="38" pdb_end="59" type="2"/>
<REGION seq_beg="39" pdb_beg="60" seq_end="59" pdb_end="80" type="H"/>
<REGION seq_beg="60" pdb_beg="81" seq_end="84" pdb_end="105" type="1"/>
<REGION seq_beg="85" pdb_beg="106" seq_end="104" pdb_end="125" type="H"/>
<REGION seq_beg="105" pdb_beg="126" seq_end="109" pdb_end="130" type="2"/>
<REGION seq_beg="110" pdb_beg="131" seq_end="131" pdb_end="152" type="H"/>
<REGION seq_beg="132" pdb_beg="153" seq_end="138" pdb_end="159" type="1"/>
<REGION seq_beg="139" pdb_beg="160" seq_end="160" pdb_end="181" type="H"/>
<REGION seq_beg="161" pdb_beg="182" seq_end="176" pdb_end="197" type="2"/>
<REGION seq_beg="177" pdb_beg="198" seq_end="196" pdb_end="217" type="H"/>
<REGION seq_beg="197" pdb_beg="218" seq_end="208" pdb_end="229" type="1"/>
<REGION seq_beg="209" pdb_beg="230" seq_end="230" pdb_end="251" type="H"/>
<REGION seq_beg="231" pdb_beg="252" seq_end="241" pdb_end="262" type="2"/>
<REGION seq_beg="242" pdb_beg="263" seq_end="253" pdb_end="274" type="U"/>
</CHAIN>
PDBTM XML Schema Definition
Detailed definition can be found here:
pdbtm.xsd
|