Gene Ontology based gene analysis in graph database environment

Michał Kozielski, Łukasz Stypka


The article presents evaluation of the application of Neo4j graph database to Gene Ontology graph analysis. Graph-based term similarity measures are calculated in order to assess the effectiveness of the system. Two types of common ancestor search are presented and evaluated, and parallel execution of the analysis is also evaluated.


graph database; gene analysis; gene ontology term similarity; Gene Ontology

Full Text:



Al Mubaid H., Nagar A.: Comparison of four similarity measures based on GO annotations for Gene Clustering. IEEE Symposium on Computers and Communications, ISCC 2008, p. 531÷536.

Couto F. M., Silva M. J., Coutinho P. M.: Measuring semantic similarity between Gene Ontology terms. Data Knowledge Engineering, Vol. 61, 2007, p. 137÷152.

Dijkstra E. W.: A note on two problems in connexion with graphs. Numerische Mathematik, Vol. 1, 1959, p. 269÷271.

Eisen M. B., Spellman P. T., Brown P. O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, Vol. 95, 1998, p. 14863÷14868.

GO-Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 2004, 32 (

Gruca A., Kozielski M., Sikora M.: Fuzzy Clustering and Gene Ontology Based Decision Rules for Identification and Description of Gene Groups. AISC, Vol. 59, 2009, p. 141÷149.

Jiang J. J., Conrath D. W.: Semantic similarity based on corpus statistics and lexical ontology. Proc. on Int. Conference on Research in Computational Linguistics, 1997, p. 19÷33.

Kozielski M., Gruca A.: Evaluation of Semantic Term and Gene Similarity Measures. LNCS, Vol. 6744, Springer, 2011, p. 406÷412.

Kozielski M., Gruca A.: Visual comparison of clustering Gene Ontology data when different similarity measures are applied. Studia Informatica, Vol. 32. No. 2A (96), Gliwice 2011, p. 169÷180.

Lin D.: An information-theoretic definition of similarity. Proc. of the 15th Int'l Conference on Machine Learning, 1998, p. 296÷304.

Pesquita C., Faria D., Falca A. O., Lord P., Couto F. M.: Semantic Similarity in Biomedical Ontologies PLoS Comput. Biol. 5(7), 2009, p. 1÷12.

Resnik P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J. Artif. Intell. Res. (JAIR), Vol. 11, 1999, p. 95÷130.

Wang H., Azuaje F., Bodenreider O., Dopazo J.: Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. In Proc. of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology CIBCB '04, 2004, p. 25÷31.

Warchał Ł.: Using Neo4j graph database in social network analysis. Studia Informatica, Vol. 33, No. 2A, Gliwice 2012, p. 271÷279.