Methods of normalization the results of Gene Ontology term similarity

Łukasz Stypka, Michał Kozielski


The article addresses the issue of improvement of the results quality when Gene Ontology (GO) term similarity is calculated. Several GO similarity measures produce results out of the range [0; 1]. Whereas, in order to compare different similarity measures or apply further processing, it is needed to normalise the results to this range. The most popular and well-known method of normalization is the min-max normalization. The article introduces seven normalization functions of different characteristics that can improve the results of the analysis. The comparison of the analysed methods on three different gene datasets and their evaluation is presented in this paper.


Gene Ontology; Gene Ontology term similarity; normalization; normalization function

Full Text:



Alvarez M. A., Qi X., Yan C.: A shortest-path graph kernel for estimating gene product semantic similarity. J. Biomedical Semantics, 2, 3, 2011.

Ashburner M. et al.: Gene Ontology: tool for the unification of biology. Nature genetics 25.1, 2000, p. 25÷29.

Azuaje F., Wang H., Bodenreider O.: Ontology-driven similarity approaches to supporting gene functional assessment. Proceedings Of The Eighth Annual Bio-Ontologies Meeting, Michigan 2005, p. 9÷10.

Cho R. J., Campbell M. J., Winzeler E. A., Steinmetz L., Conway A., Wodicka L., Wolfsberg T. G., Gabrielian A. E., Landsman D., Lockhart D. J., Davis, R. W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 1998, p. 65÷73.

Couto F. M., Silva M. J., Coutinho, P. M.: Measuring semantic similarity between Gene Ontology terms. Data & knowledge engineering, 61(1), 2007, p. 137÷152.

Eisen M. B., Spellman P. T., Brown P. O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 1998, p. 14863÷14868.

GO-Consortium: The Gene Ontology (GO) database and informatics resource, Nucleic Acids Research, 32, 2004 (

Iyer V. R., Eisen M. B., Ross D. T., Schuler G., Moore T., Lee J. C., Trent J. M., Staudt L. M., Hudson J., Boguski M., Lashkari D., Shalon D., Botstein D., Brown P.: The transcriptional program in the response of human fibroblasts to serum. Science, 283, 1999, p. 83÷87.

Jain S., Bader G.: An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics, 11(1), 2010, 562.

Jiang J. J., Conrath D. W.: Semantic similarity based on corpus statistics and lexical ontology. Proc. on International Conference on Research in Computational Linguistics, 1997, p. 19÷33.

Kozielski M., Gruca A.: Evaluation of semantic term and gene similarity measures. Pattern Recognition and Machine Intelligence, 2011, p. 406÷411.

Lin D:. An information-theoretic definition of similarity . Proc. of the 15th Int'l Conference on Machine Learning, 1998, p. 296÷304.

Al Mubaid H., Nagar A.: Comparison of four similarity measures based on GO annotations for gene clustering. Computers and Communications. ISCC 2008. IEEE Symposium, 2008, p. 531÷536.

Pesquita C., Faria D., Falcao A. O., Lord P., Couto F. M.: Semantic Similarity in Biomedical Ontologies. PLoS Comput Biol 5(7), 2009, p. 1÷12.

Resnik P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J. Artif. Intell. Res. (JAIR), Vol. 11, 1999, p. 95÷130.

Sevilla J. L., Segura V., Podhorski A., Guruceaga E., Mato J. M., Martinez-Cruz L. A., Corrales F. J., Rubio A.: Correlation between gene expression and GO semantic similarity. IEEE/ACM Trans. on Computational Biology and Bioinformatics, 2(4), 2005, p. 330÷338.

Wang H., Azuaje F., Bodenreider O., Dopazo J.: Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '04, 2004, p. 25÷31

Yang H., Nepusz T., Paccanaro A.: Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. Bioinformatics, 28(10), 2012, p. 1383÷1389.