Analytical results of the GTOP database are stored as master files. A master file is a collection of one-line descriptions summarizing analytical results for a protein, with each line initiated by the header, GT:ID and ended by the string, "//". The lines in the master file have a description headed by a header of 15 characters, which describes the source or property of the information. When a protein has multiple pieces of information under one header, multiple lines of the header are entered.
GT:ID The GTOP ID
GT:GENE The gene name in GTOP
GT:PRODUCT The product description in GTOP
GT:DATABASE The accession number in the DDBJ/Genbank/EMBL database
GT:EXON Information of exon - intron structures.
number of exons | first residues - last residue in the first exon:phase of intron | first residues - last residue in the second exon:phase of intron | ....
GT:ORG The organism abbreviation in GTOP
GB:ACCESSION The accession number in the DDBJ/Genbank/EMBL database GB:LOCATION The location of the gene in the genome GB:FROM The start position of the gene GB:TO The end position of the gene GB:DIRECTION The orientation of the gene in the genome GB:GENE The information in the "/gene" line in the "CDS" section GB:PRODUCT The information in the "/product" line in the "CDS" section GB:FUNCTION The information in the "/function" line in the "CDS" section GB:NOTE The information in the "/note" line in the "CDS" section GB:PROTEIN_ID The information in the "/protein_id" line in the "CDS" section GB:GENE:GENE The information in the "/gene" line in the "gene" section GB:GENE:NOTE The information in the "/note" line in the "gene" section
LENGTH The length of the entry
SQ:AASEQ The amino acid sequence SQ:SECSTR Secondary structure information obtained by reverse PSI-BLAST search
Identifications with Swiss-Prot entries are based on sequence similarities. An entry, which has a sequence with more than 90 % identity covering more than a half of the query, is regarded as the sequence identical with the query. SW:ID The information in the "ID" line SW:DE The information in the "DE" line SW:GN The information in the "GN" line SW:KW The information in the "KW" line(s) SW:EXACT If the sequence is not exactly the same as that in the DDBJ/Genbank/EMBL database, an F is entered in this line. SW:FUNC Sufficiency of functional information. Please refer to How to assign function tag
PROS The region | ProSite ID | motif name | documentation ID of the motif |
SEG The region | amino acid sequence in the region |
In the master file, we denote only representatives of the family of sequences
matching the query sequence. When the query has multiple domains hit by different families,
one representative of each family is chosen.
BL:SWS Information on results of BLAST search against Swiss-Prot
BL:SWS:NREP The number of the representative sequences
BL:SWS:REP Information on the representative sequences
region | Swiss-Prot ID | E value | % identity | (length of the similar region) / (length of the query) |
BL:PDB Results of BLAST search against PDB
BL:PDB:NREP The number of the representative sequences
BL:PDB:REP Information on the representative sequences
region | PDB ID | E value | % identity | (length of the similar region) / (length of the query) |
In the master file, we denote only representatives of the family of sequences
matching the query sequence. When the query has multiple domains hit by different families,
one representative of each family is chosen.
RP:PDB Results of reverse PSI-BLAST search against PDB
RP:PDB:NREP The number of the representative sequences
RP:PDB:REP Information on the representative sequences
region | PDB ID | E value | % identity | (length of the similar region) / (length of the query) |
RP:PFM Results of reverse PSI-BLAST search against PFAM
RP:PFM:NREP The number of the representative PFAM domains
RP:PFM:REP Information on the representative PFAM domains
region | PFAM ID | E value | % identity | (length of the similar region) / (length of the query) |
RP:SCP Results of reverse PSI-BLAST search against SCOP
RP:SCP:NREP The number of the representative sequences
RP:SCP:REP information on the representative sequences
region | SCOP ID | E value | % identity | (length of the similar region) / (length of the query) |
In the master file, we denote only representatives of the family of sequences
matching the query sequence. When the query has multiple domains hit by different families,
one representative of each family is chosen.
HM:PFM:REP Information on the representative PFAM domains
region | PFAM ID | E value | % identity | (length of the similar region) / (length of the query) |PFAM name
HM:SCP:REP information on the representative SCOP sequences
region | SCOP ID | E value | % identity | (length of the similar region) / (length of the query) |SCOP code|SCOP name
TM:NTM The number of the predicted transmembrane regions TM:REGION The predicted transmembrane region
COIL:NAA The length of the total coiled-coil regions in amino acid residues COIL:NSEG The number of the predicted coiled-coil regions COIL:REGION The region
OP:NHOMO The number of the homologs found in the organisms in GTOP OP:NHOMOORG The number of the organisms to which the homologs belong OP:PATTERN A string summarizing homolog information. Also see the "orgPattern" section in what is GTOP?.
STR:NPRED The number of residues in the predicted secondary structure STR:RPRED The coverage of the predicted region in the query SQ:SECSTR The sequence representing secondary structure predicted by reverse PSI-BLAST (#, sites out of the hit regions)
A GO is assigned when a query has similarity to any of the Swiss-Prot proteins or the Pfam domains. The GO(s) assigned to the hit sequences or domains in Swiss-Prot or Pfam is denoted.
GO:SWS:NREP The number of GOs assigned by similarity to Swiss-Prot
GO:SWS Information on the assigned GOs
GO ID | description of the GO | Swiss-Prot keyword linked to the GO
GO:SWS:TREE Information on the hierarchal tree of the GO terms assigned by Swiss-Prot key words.
The last word in the line is the assigned GO term, and the words appeared former than the assigned word
are those located in the lower level of the tree. The details of the tree structure can be referred to Gene OntologyTMConsortium.
GO:PFM:NREP The number of GOs assigned by similarity to Pfam domains
GO:PFM Information on the assigned GOs
GO ID | description of the GO | Pfam ID linked to the GO| InterPro ID linked to the GO
GO:PFM:TREE Information on the hierarchal tree of the GO terms assigned by Pfam searches.
The last word in the line is the assigned GO term, and the words appeared former than the assigned word
are those located in the lower level of the tree. The details of the tree structure can be referred to Gene OntologyTMConsortium.
DISOP:02AL Intrinsically disordered regions.
PSIPRED The sequence representing predicted secondary structure.