AsiaIPEX is a one-stop-shop for players in the IP industry, facilitating IP trade and connection to the IP world. Whether you are a patent owner interested in selling your IP, or a manufacturer looking to buy technologies to upgrade your operation, you will find the portal a useful resource.

Method and System for Identifying Significant Topics of a Document

Summary
Lead Inventors: Faye (Nina) P. Wacholder, PhD Problem or Unmet Need:Statistical techniques for identifying keywords in a document traditionally rely on calculating the frequency of individual stems and/or words. However, stems are ambiguous ("trad" might refer to "trader" or "tradition"), as are the meanings of isolated words ("state" might mean a political entity or a mode of being). Thus, the resulting keyword lists often do not often accurately represent the aboutness of a document. Algorithms have since been developed to process proper nouns, as well as technical terms in scientific documents, but these algorithms are not domain-general. That is, they are not suited for identifying common noun phrases in an open-ended set of document types, particularly short articles. The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, and (3) ranking the clustered simplex NPs by head in accordance with a significance measure. As simplex NPs may contain a determiner, an adjective, a noun, and no further speech elements such as a preposition or participial verb, the head, or noun, is inevitably the last word of the simplex NP in English. Simplex NPs, which are semantically and syntactically coherent, provide adequate content representation out of the context of the entire document, thus making the invention useful for various natural language processing applications.
Technology Benefits
Simplex noun phrases, in contrast with individual stems or words, are unambiguous Simplex noun phrases are semantically and syntactically coherent Domain-general method, which does not depend on domain-specific particulars of a document Extensive applications in current information-rich, but knowledge-poor environment
Technology Application
Automatic indexing of print or electronic texts Summarization of document content Keyword content with which to filter documents relevant for a specific query Advanced information extraction where information about a specific topic from different parts of a document can be merged
Detailed Technology Description
The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, a...
*Abstract
None
*Inquiry
Calvin Chu Columbia Technology Ventures Tel: (212) 854-8444 Email: TechTransfer@columbia.edu
*IR
MS98/05/05
*Principal Investigator
*Web Links
"Method and system for indentifying significant topics of a document": USPTO issued patent
Country/Region
USA

For more information, please click Here
Mobile Device