Zusammenfassung
Centroid terms are single, descriptive words that semantically and topically characterise text documents and thus can act as their very compact representation in automated text processing tasks that strongly rely on the semantic similarity of texts. Algorithms to classify and cluster them make use of this information. In this book, the novel, brain- and physicsinspired concept of centroid terms is introduced and deeply discussed. Furthermore, their unique properties and practical usage in major natural language processing and text mining tasks are covered. In this regard, a new graph-based method for their fast calculation is presented as well. In contrast to methods relying on the bag-of-words model, the derived centroid distance measure can uncover
a topical relationship between texts even when their wording differs. As centroid terms can also represent short texts, the presented first fully integrated, P2P-based web search engine, called “WebEngine”, therefore makes heavy use of...
Schlagworte
Centroids Application Text Processing Text Centroid Co-occurrence Graph Spreading Activation Text Categorisation Librarian of the Web P2P-system Decentralised Search WebEngine Web Search Engine- Kapitel Ausklappen | EinklappenSeiten
- 131–139 Addendum 131–139
- 140–144 Authors 140–144