AsiaIPEX is a one-stop-shop for players in the IP industry, facilitating IP trade and connection to the IP world. Whether you are a patent owner interested in selling your IP, or a manufacturer looking to buy technologies to upgrade your operation, you will find the portal a useful resource.

Back to search results

Automated Debugging for Data-Intensive Scalable Computing

Technology Benefits

BIGSIFT improves the accuracy of fault localizability by several order-of-magnitude (103-107) compared to Titian data provenanceImproves performance by up to 66x compared to Delta DebuggingAble to localize fault-inducing data within 62% of the original job running time for each faulty output

Technology Application

Debugging for Data-Intensive Scalable Computing (DISC) systems

Detailed Technology Description

Researchers at UCLA have developed a new faulty data localization approach called BIGSIFT, which combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failure-inducing inputs. BIGSIFT redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads.

Others

State Of Development

The BIGSIFT is ready to be used for DISC systems.

Background

Data-Intensive Scalable Computing (DISC) systems draw valuable insights from massive data sets to help make business decisions and scientific discoveries. Similar to other software development platforms, developers often deal with program errors and incorrect inputs that require error debugging. When errors (e.g. program crash, outlier results) arise, developers often have to go through a lengthy and expensive process of manual trial and error debugging by identifying a subset of the input data that is able to reproduce the problem.

Current approaches such as Data Provenance (DP) and Delta Debugging (DD) are not suitable for debugging DISC workloads because 1) DD does not consider the semantics of data-flow operators and thus cannot prune input records known to be irrelevant; 2) DD’s search strategy is iterative, which is prohibitively expensive for large datasets such as DISC; 3) DP over-approximates the scope of failure-inducing inputs by considering that all intermediate inputs mapping to the same key contribute to the erroneous output.

For complex DISC systems, it is therefore crucial to equip developers with toolkits that can better pinpoint the root cause of an error.

Related Materials

Tech ID/UC Case

29154/2018-151-0

Related Cases

2018-151-0

Country/Region

USA

For more information, please click Here