Query language for protein molecular structures

Dominika Wieczorek, Bożena Małysiak-Mrozek, Dariusz Mrozek

Abstract


Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching the database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.

Keywords


bioinformatics; proteins; secondary structure; similarity

Full Text:

PDF (Polski)

References


Eidhammer I., Inge J., Taylor W.R..: Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis. John Wiley & Sons, 2004.

Branden C., Tooze J.: Introduction to Protein Structure. Garland 1991.

Dickerson R.E., Geis I.: The Structure and Action of Proteins. 2nd ed. Benjamin/Cummings, Redwood City, Calif. Concise 1981.

Creighton T.E.: Proteins: Structures and molecular properties. 2nd ed. Freeman, San Francisco 1993.

Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., et al.: The Protein Data Bank. Nucleic Acids Res., 2000, No. 28, s. 235-242.

Gibrat J.F., Madej T., Bryant S.H.: Surprising similarities in structure comparison. Curr Opin Struct Biol, 6(3), 1996, s. 377-385.

Shapiro J., Brutlag D.: FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res., 32, 2004, s. 536-41.

Can T., Wang Y.F.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. Proc. of the 2003 IEEE Bioinformatics Conference, 2003, s. 169-179.

Yang J.: Comprehensive description of protein structures using protein folding shape code. Proteins, 71 (3), 2008, s. 1497-518.

Mrozek D., Małysiak B.: Searching for Strong Structural Protein Similarities with EAST. Journal of Computer Assisted Mechanics and Engineering Sciences, 2007, No. 14, s. 681-693.

Hammel L., Patel J.M.: Searching on the secondary structure of protein sequences. Proceedings of the 28th intemational conference on Very Large Data Bases, Hong Kong, China, 2002, s. 634-645.

Tata S., Patel J.M., Friedman J.S., Swaroop A.: Declarative Querying for Biologia Sequences. Proc. of the 22nd International Conference on Data Engineering, IEEE Computer Society, 2006, s. 87-98.

Wang Y., Sunderraman R., Tian H.: A Domain Specific Data Management Architecture for Protein Structure Data. Proceedings of the 28th IEEE EMBS Annual International Conference, New York City, USA, IEEE, 2006, s. 5751-5754.

Murzin A.G., Brenner S.E., Hubbard T., Chothia C: SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 247, 1995, s. 536-540.

Orengo C.A., Michie A.D., Jones S., et al.: CATH - A hierarchic classification of protein, domain structures. Structure, Vol 5. No 8. 1997, s. 1093-1108.

Smith T.F., Waterman M.S.: Identification of common molecular subsequences. J Mol Biol, 147, 1981, s. 195-197.

Apweiler R., Bairoch A., Wu C.H., Barker W.C., et al.: UniProt: the Universal l Protein knowledgebase. Nucleic Acids Res., 32 (Database issue), 2004, s. 115-9.

Frishman D., Argos P.: Incorporation of non-local interactions in protein secondary| structure prediction from the amino acid sequence. Protein Eng, 9(2), 1996, s. 133-142.




DOI: http://dx.doi.org/10.21936/si2010_v31.n2A.369