Representations of text documents in context of SPAM detection in polish with english phrases

Piotr Andruszkiewicz


Representation of text documents should be as small as possible and give high accuracy of classification. This paper presents representations of text documents and ways of their reduction in case of SPAM detection in Polish with English phrases.


text document representation; term weighting functions; TF-IDF; reduction of text document representation; classification; SPAM detection

