亞洲知識產權資訊網為知識產權業界提供一個一站式網上交易平台,協助業界發掘知識產權貿易商機,並與環球知識產權業界建立聯繫。無論你是知識產權擁有者正在出售您的知識產權,或是製造商需要購買技術以提高操作效能,又或是知識產權配套服務供應商,你將會從本網站發掘到有用的知識產權貿易資訊。

Method and System for Identifying Significant Topics of a Document

總結
Lead Inventors: Faye (Nina) P. Wacholder, PhD Problem or Unmet Need:Statistical techniques for identifying keywords in a document traditionally rely on calculating the frequency of individual stems and/or words. However, stems are ambiguous ("trad" might refer to "trader" or "tradition"), as are the meanings of isolated words ("state" might mean a political entity or a mode of being). Thus, the resulting keyword lists often do not often accurately represent the aboutness of a document. Algorithms have since been developed to process proper nouns, as well as technical terms in scientific documents, but these algorithms are not domain-general. That is, they are not suited for identifying common noun phrases in an open-ended set of document types, particularly short articles. The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, and (3) ranking the clustered simplex NPs by head in accordance with a significance measure. As simplex NPs may contain a determiner, an adjective, a noun, and no further speech elements such as a preposition or participial verb, the head, or noun, is inevitably the last word of the simplex NP in English. Simplex NPs, which are semantically and syntactically coherent, provide adequate content representation out of the context of the entire document, thus making the invention useful for various natural language processing applications.
技術優勢
Simplex noun phrases, in contrast with individual stems or words, are unambiguous Simplex noun phrases are semantically and syntactically coherent Domain-general method, which does not depend on domain-specific particulars of a document Extensive applications in current information-rich, but knowledge-poor environment
技術應用
Automatic indexing of print or electronic texts Summarization of document content Keyword content with which to filter documents relevant for a specific query Advanced information extraction where information about a specific topic from different parts of a document can be merged
詳細技術說明
The invention is a method for identifying significant topics of a document, comprised of the following steps: (1) extracting from a document a list of simplex noun phrases (NPs) and their corresponding heads, (2) clustering the simplex NPs by head, a...
*Abstract
None
*Inquiry
Calvin Chu Columbia Technology Ventures Tel: (212) 854-8444 Email: TechTransfer@columbia.edu
*IR
MS98/05/05
*Principal Investigation
*Web Links
"Method and system for indentifying significant topics of a document": USPTO issued patent
國家/地區
美國

欲了解更多信息,請點擊 這裡
移動設備