Search
  • 網站搜尋
亞洲知識產權資訊網為知識產權業界提供一個一站式網上交易平台,協助業界發掘知識產權貿易商機,並與環球知識產權業界建立聯繫。無論你是知識產權擁有者正在出售您的知識產權,或是製造商需要購買技術以提高操作效能,又或是知識產權配套服務供應商,你將會從本網站發掘到有用的知識產權貿易資訊。
返回搜索結果

Computer-implemented method for data classification and hierarchical clustering


總結

Lead Inventors: Hassan Haider MalikProblem or Unmet Need:Classification is a fundamental machine learning problem that has broad application to areas that depend on being able to sub-group data that may be presented in a variety of forms (numeric, text, etc.). For the applications in internet search, customized advertising, email filtering and computational biology, an efficient method of categorizing large amount of data information is needed. This technology presents an highly efficient classification algorithm, identifying features that have a large information value, which is defined in the training mode. It has shown promise on highly sparse or unbalanced data as well, with unmatched accuracy on web page classification in comparison to other methods. This technology presents a unique approach to the classification problem, which calculates a score for every pair of features in the training instance, considering global, local and class-based importance. A novel score adjustment scheme is applied and test instances are classified using this metric. Data is split into training and testing sets for k-fold cross validation. Training instances are traversed and a score is calculated for each feature (item) pair, based on frequency and "global interestingness." The top scoring item sets are selected and placed in a class-item set tree, and scores are adjusted using a scheme empirically identified to have the best performance.


技術優勢

Efficient, processing each feature only once, and stores this knowledge in a compact form Provides a pattern-based hierarchical clustering technique that can build a cluster hierarchy without requiring mining for globally significant patterns


技術應用

Text categorization -- particular useful over the internet Customized Internet advertising based on the above Filtering spam from non-spam email Identifying credit card transactions which are fraudulent from those which are valid Authentication based on face, speech or handwriting recognition Computational Biology -- splitting disease from non-disease patients


詳細技術說明

This technology presents a unique approach to the classification problem, which calculates a score for every pair of features in the training instance, considering global, local and class-based importance. A novel score adjustment scheme is app...


國家/地區

美國

欲了解更多信息,請點擊 這裡
Business of IP Asia Forum
桌面版