Aggregation of textual data on example of press information system

Bartosz Dubel, Paweł Kasprowski

Abstract


Huge amount of textual information available in Internet becomes one of the most important problems because analysis of such data is difficult automatically. Typical examples of such big text databases are web services presenting press information. The same or very similar information repeats in different services. That is why so called “aggregators” that aggregate and preprocess information from different services are becoming more and more popular. This paper presents one of such aggregators that collects information from multiple services, parses and analyses it and then tries to classify and collect different statistics.

Keywords


textual data; text aggregation; text parsing

Full Text:

PDF (Polski)

References


RSS (Really Simple Sindication), http://www.wikipedia.pl/wiki/RSS.

Specyfikacja języka XPath, http://www.w3.org/TR/xpath/.

World Wide Web Consortium, http://www.w3.org.

Salton G.: Developments in Automatic Text Reüieval. Science. Vol. 253, s. 974-979.

Sholom W., White B., Apte C: Lightweight Document Clustering. IBM T.J. Watsan Research Center, 2000.

Kłopotek M. A.: Inteligentne wyszukiwarki internetowe. Akademicka Oficyna Wydawnicza EXIT, Warszawa 2001.

Deerwester S., Dumais S.T., Fumas G.W., Landauer T.K., Harshman R.: Indexing By Latent Semantic Analysis. Journal of the American Society For Information Science, Vol. 41, 1990.




DOI: http://dx.doi.org/10.21936/si2011_v32.n2B.311