亚洲知识产权资讯网为知识产权业界提供一个一站式网上交易平台,协助业界发掘知识产权贸易商机,并与环球知识产权业界建立联系。无论你是知识产权拥有者正在出售您的知识产权,或是制造商需要购买技术以提高操作效能,又或是知识产权配套服务供应商,你将会从本网站发掘到有用的知识产权贸易资讯。

Method and System for Identifying Significant Topics of a Document

总结
Lead Inventors: Faye (Nina) P. Wacholder, PhD Problem or Unmet Need:Statistical techniques for identifying keywords in a document traditionally rely on calculating the frequency of individual stems and/or words. However, stems are ambiguous ("trad" might refer to "trader" or "tradition"), as are the meanings of isolated words ("state" might mean a political entity or a mode of being). Thus, the resulting keyword lists often do not often accurately represent the aboutness of a document. Algorithms have since been developed to process proper nouns, as well as technical terms in scientific documents, but these algorithms are not domain-general. That is, they are not suited for identifying common noun phrases in an open-ended set of document types, particularly short articles. The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, and (3) ranking the clustered simplex NPs by head in accordance with a significance measure. As simplex NPs may contain a determiner, an adjective, a noun, and no further speech elements such as a preposition or participial verb, the head, or noun, is inevitably the last word of the simplex NP in English. Simplex NPs, which are semantically and syntactically coherent, provide adequate content representation out of the context of the entire document, thus making the invention useful for various natural language processing applications.
技术优势
Simplex noun phrases, in contrast with individual stems or words, are unambiguous Simplex noun phrases are semantically and syntactically coherent Domain-general method, which does not depend on domain-specific particulars of a document Extensive applications in current information-rich, but knowledge-poor environment
技术应用
Automatic indexing of print or electronic texts Summarization of document content Keyword content with which to filter documents relevant for a specific query Advanced information extraction where information about a specific topic from different parts of a document can be merged
详细技术说明
The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, a...
*Abstract
None
*Inquiry
Calvin Chu Columbia Technology Ventures Tel: (212) 854-8444 Email: TechTransfer@columbia.edu
*IR
MS98/05/05
*Principal Investigation
*Web Links
"Method and system for indentifying significant topics of a document": USPTO issued patent
国家/地区
美国

欲了解更多信息,请点击 这里
移动设备