Skip to main content

Selecting Information From Text

Exploitable results

In the domain of technical documentation a common difficulty is in pinpointing the location of vital facts relating to a specific task. SIFT helps users of instruction manuals for personal computer (PC) software. The project involves construction of a demonstration intelligent help system for online computer software manuals based on two key ideas: the Vector Space Model of information retrieval on the one hand and the use of distributed patterns to capture the meaning of textual information on the other. The final prototype will accept a user's query in natural language concerning the software and return a list of pointers into the manual texts indicating where passages answering the query might be found. These will be arranged in descending order of relevance to the query, allowing the user to investigate the most promising parts of the text first. The project will also serve to demonstrate the usefulness of distributed patterns in practical natural language processing (NLP) systems and their compatibility with existing work on lexical databases and robust lexicalistic parsing. Two main prototypes are involved: SIFT-1 uses syntactic category information and semantic patterns to capture utterance meanings; SIFT-2 exploits semantic case relations obtained via robust parsing. A version of SIFT-1 with 735 entry points into the Lotus Ami Pro User's Guide is under evaluation. The main findings of the project so far are that the semantic distance measure works well but that accurate sense disambiguation of the text is essential if concept-based retrieval is to out-perform keyword methods.