Theory and Application of Text-representing Centroids
Zusammenfassung
Centroid terms are single, descriptive words that semantically and topically characterise text documents and thus can act as their very compact representation in automated text processing tasks that strongly rely on the semantic similarity of texts. Algorithms to classify and cluster them make use of this information. In this book, the novel, brain- and physicsinspired concept of centroid terms is introduced and deeply discussed. Furthermore, their unique properties and practical usage in major natural language processing and text mining tasks are covered. In this regard, a new graph-based method for their fast calculation is presented as well. In contrast to methods relying on the bag-of-words model, the derived centroid distance measure can uncover
a topical relationship between texts even when their wording differs. As centroid terms can also represent short texts, the presented first fully integrated, P2P-based web search engine, called “WebEngine”, therefore makes heavy use of...
Schlagworte
- I–VI
- 1–6 Is a `Librarian of the Web' really needed? 1–6
- 7–26 Centroid Terms as Text Representatives 7–26
- 27–38 Spreading Activation: A Fast Calculation Method for Text Centroids 27–38
- 39–54 Empiric Experiments with Text-representing Centroids 39–54
- 55–78 Towards a Librarian of the Web 55–78
- 79–90 A Concept Supporting a Resilient, Fault-tolerant and Decentralised Search 79–90
- 91–106 An Associative Ring Memory to Support Decentralised Search 91–106
- 107–120 The WebEngine – A Fully Integrated, Decentralised Web Search Engine 107–120
- 121–130 On Evolving Text Centroids 121–130
- 131–139 Addendum 131–139
- 140–144 Authors 140–144