Object Oriented Representation and Processing of Genomic Data in Relational Database

Adam Pelikant

Abstract


The article shows a method of storing genomic data as the object in the relational database server. It presents a method of source data migration which is stored in the weakly determined text files on ftp servers. What is more, it describes the formal structure of the Common Language Runtime (CLR) class used to define user data type. Implementations of compulsory and optional methods are also presented. Furthermore, the paper shows a set of implemented matching algorithms and methods of using them to build adherence matrix. Finally, the paper – presents some efficiency tests which prove the advantages of the proposed algorithms.


Keywords


genome; object oriented representation; relational database; searching methods; string matching

Full Text:

PDF

References


Karp R.M., Rabin M.O.: Efficient randomized pattern-matching algorithms. IBM Jour-nal of Research and Development, Vol. 31, No. 2, 1987, p. 249÷260.

Knuth D.E., Morris J.H., Pratt V.R.: Fast Pattern Matching in Strings. SIAM Journal on Computing, Vol. 6(2), 1977, p. 323÷350.

Boyer R.S., Moore J.S.: A Fast String Searching Algorithm. Communications of the ACM, Vol. 20(10), 1977, p. 762÷772.

Horspool R.N.: Practical Fast Searching in Strings. Software Practice and Experience, Vol. 10(6), 1980, p. 501÷506.

Apostolico A., Giancarlo R.: The Boyer-Moore-Galil string searching strategies revisit-ed. SIAM Journal on Computing, Vol. 15(1), 1986, p. 98÷105.

Kamińska D., Sapiński T., Anbarjafari G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP Journal on Audio, Speech, and Music Pro-cessing, 2017.

Sunday D.M.: A very fast substring search algorithm. Communications of the ACM, Vol. 33, No. 8, 1990, p. 132÷142.

Lao P.J., Forsdyke D.R.: Crossover hot-spot instigator (Chi) sequences in Escherichia coli occupy distinct recombination/transcription islands. Gene, Vol. 243, 2000 (1÷2), p. 47÷57.

Pribnow D.: Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc. Natl. Acad. Sci. U.S.A., Vol. 72(3), 1975, p. 784÷788.

Kozak M.: Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell, Vol. 44(2), 1986, p. 283÷292.

Shine J., Dalgarno L.: Determinant of cistron specificity in bacterial ribosomes. Nature, Vol. 254(5495), 1975, p. 34÷38.

Giel-Pietraszuk M., Hoffmann M., Dolecka S., Rychlewski J., Barciszewski J.: Palin-dromes in proteins. J. Protein Chem., Vol. 22 (2), 2003, p. 109÷113.

Lagrange T., Kapanidis A.N., Tang H., Reinberg D., Ebright R.H.: New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA bind-ing by transcription factor IIB. Genes & Development, Vol. 12 (1), 1998, p. 34÷44.

Mantovani R.: The molecular biology of the CCAAT-binding factor NF-Y. Gene, Vol. 239(1), 1999, p. 15÷27.

Sunday D.M.: A very fast substring search algorithm. Communications of the ACM, Vol. 33, Issue 8, 1990, p. 132÷142.

Pelikant A.: Programowanie serwera Oracle 11g SQL PL/SQL. Helion, Gliwice 2009.

Pelikant A.: MS SQL Server. Zaawansowane metody programowania. Helion, Gliwice 2014.

Kakazu Y., Nakamura M., Otaki J.M.: GPU Acceleration for Availability Scoring of Short Constituent Amino Acid Sequences. Third International Symposium on Compu-ting and Networking, 2015, p. 598÷600.

Abu-Doleh A., Kaya K., Abouelhoda M., Çatalyürek Ü. V.: Extracting Maximal Exact Matches on GPU. Proceedings of the IEEE International Parallel & Distributed Pro-cessing Symposium Workshops, 2014, p. 1417÷1426.

Bhalekar S.R., Chilveri P.G.: A review: FPGA based word matching stage of BLASTN. International Conference on Pervasive Computing (ICPC), 2015.

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

Smith J.A.: RNA Search with Decision Trees and Partial Covariance Models. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 6, No. 3, 2009, p. 517÷527.

Tauhidul Islam A.K.M., Pramanik S., Ji X., James R., Cole J.R., Zhu Q.: Back translat-ed peptide K-mer search and local alignment in large DNA sequence databases using BoND-SD-tree indexing. IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), 2015.

Karpenko O., Dai Y.: Relational Database Index Choices for Genome Annotation Data. International Conference on Bioinformatics and Biomedicine Workshops, 2010, p. 264÷268.

Fahim M., Flight R.M., Harrison B.J., Petruska J.C., Rouchka E.C.: Interval Trees for Detection of Overlapping Genetic Entities. IEEE International Conference on Bioin-formatics and Bioengineering (BIBE), 2011, p. 278÷281.

Mendes P.N., McKnight B., Sheth A.P., Kissinger J.C., Tcruzi K.B.: Enabling Complex Queries for Genomic Data Exploration. The IEEE International Conference on Seman-tic Computing, 2008, p. 432÷439.

Berrar D., Dubitzky W., Solinas-Toldo S., Bulashevska S., Granzow M., Conrad C., Kalla J., Lichter P., Eils R.: A Database System for Comparative Genomic Hybridiza-tion Analysis. IEEE Engineering in Medicine and Biology, 2001, p. 75÷83.

Sargent R., Fuhrman D., Critchlow T., Di Sera T., Mecklenburg R., Lindstrom G., Cartwright P.: The Design and Implementation of a Database For Human Genome Re-search. Proceedings of International Conference on Scientific and Statistical Database Systems, 1996, p. 220÷225.

Aho A.V., Corasick M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM, Vol. 18 (6), 1975, p. 333÷340.




DOI: http://dx.doi.org/10.21936/si2017_v38.n4.825