The mechanism of identification and classification of content

Artur Niewiarowski, Marek Stanuszek


This paper presents the mechanism of identification and classification of content, based on terms weighted method with inversed document frequency analysis and Levenstein distance technique. The proposed mechanism is applied in the analysis of topics and descriptions of selected diploma thesis, to automatic selection of supervisors and reviewers.


keywords extraction; inversed document frequency; term frequency; Levenshtein distance; text mining; database mining

