亚洲知识产权资讯网为知识产权业界提供一个一站式网上交易平台,协助业界发掘知识产权贸易商机,并与环球知识产权业界建立联系。无论你是知识产权拥有者正在出售您的知识产权,或是制造商需要购买技术以提高操作效能,又或是知识产权配套服务供应商,你将会从本网站发掘到有用的知识产权贸易资讯。

Computer-implemented method for data classification and hierarchical clustering

总结
Lead Inventors: Hassan Haider MalikProblem or Unmet Need:Classification is a fundamental machine learning problem that has broad application to areas that depend on being able to sub-group data that may be presented in a variety of forms (numeric, text, etc.). For the applications in internet search, customized advertising, email filtering and computational biology, an efficient method of categorizing large amount of data information is needed. This technology presents an highly efficient classification algorithm, identifying features that have a large information value, which is defined in the training mode. It has shown promise on highly sparse or unbalanced data as well, with unmatched accuracy on web page classification in comparison to other methods. This technology presents a unique approach to the classification problem, which calculates a score for every pair of features in the training instance, considering global, local and class-based importance. A novel score adjustment scheme is applied and test instances are classified using this metric. Data is split into training and testing sets for k-fold cross validation. Training instances are traversed and a score is calculated for each feature (item) pair, based on frequency and "global interestingness." The top scoring item sets are selected and placed in a class-item set tree, and scores are adjusted using a scheme empirically identified to have the best performance.
技术优势
Efficient, processing each feature only once, and stores this knowledge in a compact form Provides a pattern-based hierarchical clustering technique that can build a cluster hierarchy without requiring mining for globally significant patterns
技术应用
Text categorization -- particular useful over the internet Customized Internet advertising based on the above Filtering spam from non-spam email Identifying credit card transactions which are fraudulent from those which are valid Authentication based on face, speech or handwriting recognition Computational Biology -- splitting disease from non-disease patients
详细技术说明
This technology presents a unique approach to the classification problem, which calculates a score for every pair of features in the training instance, considering global, local and class-based importance. A novel score adjustment scheme is app...
*Abstract
None
*Inquiry
Calvin Chu Columbia Technology Ventures Tel: (212) 854-8444 Email: TechTransfer@columbia.edu
*IR
M07-091
*Principal Investigation
*Web Links
WIPO: WO/2008/154029
国家/地区
美国

欲了解更多信息,请点击 这里
移动设备