Text documents’ representation and retrieval at databases systems

Jakub Cieślewicz, Adam Pelikant

Abstract


The article propels the problem of retrieval documents due to their real content. It describes the model of continuous texts representation as vectors and presents the mechanisms of weights assignment to the individual document features as well as the algorithm of comparing leaning on cosines’ measure between vector representations of documents and query.

Keywords


businnes intelligence; text mining; data mining; Oracle

Full Text:

PDF (Polski)

References


Makhoul J., Kubala F., Schwartz R., Weischedel R.: Performance measures for information extraction. In Proceedings of DARPA Broadcast News Workshop, 1999, s. 249-252.

Jing L., Huang H., Shi H.: Improved feature selection approach TFIDF in text mining. School of Computer & Information Technology, University Beijing, 2002.

Wang P., Hu J., Zeng H., Chen L., Chen Z.: Improving Text Classification by Using Encyclopedia Knowledge. Seventh IEEE International Conference on Data Mining.

Mazur P.: Text Segmentation in Polish. ISDA'05.

Morzy T., Morzy M., Leśniewska A.: Eksploracja tekstu I, Eksploracja tekstu II, wykłady poświęcone eksploracji danych, http://wazniak.mimuw.edu.pl/.




DOI: http://dx.doi.org/10.21936/si2009_v30.n2A.489