Automated Debugging for Data-Intensive Scalable Computing
BIGSIFT improves the accuracy of fault localizability by several order-of-magnitude (103-107) compared to Titian data provenanceImproves performance by up to 66x compared to Delta DebuggingAble to localize fault-inducing data within 62% of the original job running time for each faulty output
Debugging for Data-Intensive Scalable Computing (DISC) systems
Researchers at UCLA have developed a new faulty data localization approach called BIGSIFT, which combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failure-inducing inputs. BIGSIFT redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads.
State Of Development The BIGSIFT is ready to be used for DISC systems. Background Data-Intensive Scalable Computing (DISC) systems draw valuable insights from massive data sets to help make business decisions and scientific discoveries. Similar to other software development platforms, developers often deal with program errors and incorrect inputs that require error debugging. When errors (e.g. program crash, outlier results) arise, developers often have to go through a lengthy and expensive process of manual trial and error debugging by identifying a subset of the input data that is able to reproduce the problem. Current approaches such as Data Provenance (DP) and Delta Debugging (DD) are not suitable for debugging DISC workloads because 1) DD does not consider the semantics of data-flow operators and thus cannot prune input records known to be irrelevant; 2) DD’s search strategy is iterative, which is prohibitively expensive for large datasets such as DISC; 3) DP over-approximates the scope of failure-inducing inputs by considering that all intermediate inputs mapping to the same key contribute to the erroneous output. For complex DISC systems, it is therefore crucial to equip developers with toolkits that can better pinpoint the root cause of an error. Related Materials Tech ID/UC Case 29154/2018-151-0 Related Cases 2018-151-0
USA
