Statistical methods for analysing proteomic data

Jolanta Kawulok, Joanna Polańska


The aim of the work reported in this paper was to develop statistical tools for mass spectra analysis. They would make it possible to detect cancer at its early stages. The main goal was to construct a classifier which would best distinguish people with cancer from a control group. First, the mass spectral signal is pre-processed. Next, the signals are modeled using Gaussian mixtures and they are later classified. The obtained results confirmed the effectiveness of the presented method.


mass spectrometry; Gaussian Mixture Models; classification

Full Text:

PDF (Polski)


Baggerly K. A., Morris J. S., Coombes K. R.: Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20(5), 2004, s. 777÷758.

Catmull E., Rom R.: A class of local interpolating splines. [in:] Barnhill R. E., Reisenfeld R. F., (eds.): Computer Aided Geometric Design, Academic Press, New York 1974, s. 317÷326.

Cortes C., Vapnik V.: Support vector networks. Machine Learning 20, 1995, s. 1÷25.

Ferguson J.C.: Multi-variable curve interpolation. J. ACM 11(2), 1964, s. 221÷228.

Grzegorzewski P., Bobecka K., Dembińska A., Pusz J.: Rachunek prawdopodobieństwa i statystyka, Wyd. 4, WSISiZ, Warszawa 2003.

Kanji G. K.: 100 Statistical Tests. 3 edn. SAGE Publications Ltd, 2006.

Koronacki J., Ćwik J.Ś Statystyczne systemy uczące się. WNT, Warszawa 2005.

Liu Q., Krishnapuram B., Pratapa P., Liao X., Hartemink E., Carin L.: Identication of differentially expressed proteins using maldi-tof mass spectra. In: Asilomar Conference: Biological Aspects of Signal Processing, 2003.

Lustgarten. J. L., Kimmel Ch., Ryberg H., Hogan W.: EPO-KB: a searchable knowledge base of biomarker to protein links. Bioinformatics 24 (11), 2008, s. 1418÷1419.

Petricoin E. F., Ardekani A. M., Hitt B. A., Levine P. J., Fusaro V. A., Steinberg S. M., Mills G. B., Simone C., Fishman D. A, Kohn E. C., Liotta L. A.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 2002, s. 527÷577.

Pietrowska M., Marczak L., Suwinski R., Stobiecki M., Polanska J., Polanski A., Widlak P., Gawkowska-Suwinska M., Drosik A., Walaszczyk A.: Application of mass spectrometry-based serum proteome pattern analysis in identiffication of lung cancer patients. J Thorac Oncol 5(5, Suppl 1), S60, 2010. Abstract book, 2nd European Lung Cancer Conference, Geneva, Switzerland, 28 April-1 May 2010.

Pietrowska M., Marczak Ł., Widłak P.Ś Proteomika kliniczna – wykorzystywanie metod spektrometrii mas do analizy proteomu surowicy krwi w diagnostyce chorób nowotworowych. In: Na pograniczu chemii i biologii, t. XVII, 2007.

Polanska J., Widlak P., Rzeszowska-Wolny J., Kimmel M., Polanski A.: Gaussian Mixture Decomposition of Time-Course DNA Microarray Data. In: Mathematical Modeling of Biological Systems, vol. I, Modeling and Simulation in Science, Engineering and Technology. Springer, 2007, s. 35ń÷359.

Polanski A., Kimmel M.: Bioinformatics. Springer, 2007.

Morris J. S., Coombes K. R., Koomen J., Baggerly K. A., Kobayashi R.: Features extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 2005, s. 1764÷1775.

Na S., Paek E.: Quality assessment of tandem mass spectra based on cumulative intensity normalization. J Proteome Res. 5(12), 2006, s. 3241÷3248.

Shin H., Sampat M. P., Koomen J. M., Markey M. K.: Wavelet-Based Adaptive Denoising and Baseline Correction for MALDI TOF MS. OMICS 14(3), 2010, s. 283÷295.

Wagner M., Naik D., Pothen A.: Protocols for disease classiffication from mass spectrometry data. Proteomics 3(9), 2003, s. 1692÷1698.

Yu W., He Z., Liu J., Zhao H.: Improving Mass Spectrometry Peak Detection Using Multiple Peak Alignment Results. Journal of proteome research 7(01), 2008, s. 123÷129.