Integrate Multiple Hypothesis Tests to Control False Discovery Rate (FDR)
- Technology Benefits
- Betterperformance in analyzing high-dimensional data & identifying subtle yetmeaningful changes.
- Technology Application
- Can be practiced and accommodated in fields including biotech, healthcare, pharmaceutical, and finance.
- Detailed Technology Description
- False discovery rate (FDR) control is essentialfor identifying significant features in analyzing high dimensional datasets(e.g., genome-wide datasets). Conventional FDR controlling methods use singlestatistical hypothesis tests to call significant features. Each statisticaltest has its own advantages in picking up different aspects of differentialinformation between two populations. To fully utilize the advantages ofdifferent statistical tests, this invention discloses an integrated statistic Composite-Index that combines multiplestatistical tests, and formulates FDR control as an optimization problem. Thisinvention also develops an algorithm, named as Composite-Cut, which implements a special case of the aboveconcept.
- Countries
- United States
- Application No.
- 15/518,403
- *Abstract
-
Composite-Cutintegratesmultiple base statistics into a new statistics Composite-Index to maximize the utilization of differentialinformation, which makes Composite-Cutsubstantially more powerful in detecting differential features, especially, thosewith subtle signals. A series of comparisons on simulation datasets, DNAMicroarray datasets, and RNA-seq gene expression datasets has been conducted todemonstrate that Composite-Cut significantlyoutperforms existing approaches, such as, the Benjamin-Hochberg approach, the Storeyapproach, Significance Analysis of Microarrays, voom, limma, DSeq/DSeq2,PossionSeq, edgeR, NBPSeq, EBSeq, baySeq, ShrinkSeq, and so on. The resultswere endorsed by various supplementary analyses, such as, literature search,gene ontology enrichment analysis, gene set enrichment analysis, survivalanalysis, dependency analysis, and classification analysis. Literary evidencesuggests that the genes called significant only by Composite-Cut are indeed relevant to the underlying biology. Composite-Cut has ability to dig deeperinto data and is more sensitive to subtle yet statistically significantevidence while defying the effects of noise. The experimental results also validated that Composite-Cut was capable of identifyingrelatively more subtle changes (e.g., features with small fold-changes). Suchsubtle changes were showed to be relevant to the underlying biology. In complexsystems, detecting subtle changes can be extremely important because thesystematic aggregation and propagation of subtle changes at upstream can causedramatic downstream effects, which are easier to detect. This invention bringssuch a capability which ultimately will lead to more insight, discovery, andknowledge in practice.
The invention can be widely practiced toanalyze other types of high-dimensional datasets, which require multiplecomparisons correction, in many fields including biotech, healthcare,pharmaceutical, finance, and so on. The executable of Composite-Cut can be downloaded from http://www.cs.brandeis.edu/~hong/CC/.
- *IP Issue Date
- None
- *IP Type
- Utility
- Country/Region
- USA
