abstract |
A system for extracting and analyzing information from a collection of linked documents at a locality to enable categorization of documents and prediction of documents relevant to a focus document. The system obtains and analyzes topology, usage and path information from for a collection at a locality, e.g. a web locality on the world wide web. For categorization, document meta information is represented as document vectors. Predefined criteria is applied to the document vectors to create lists of "similar" types of documents. For relevance prediction, networks representing topology, usage path and text similarity amongst the documents in the collection are created. A spreading activation technique is applied to the networks starting at a focus document to predict the documents relevant to the focus document. Using category and relevance prediction information, tools can be built to enable a user to more efficiently traverse through the collection of linked documents. |