Artificial Intelligence Methods for Software Engineering

Chapter 9 - On the Application of Machine Learning in Software Testing

9.1 Introduction

9.2 Background

9.2.1 Software Testing

Limit: Testing shows the presence, not the absence of bugs.

Testing comprises 2 parts:

  • Test suite generation, methos includes:
    • Model-base testing
    • Combinatorial testing
    • Random/fuzz testing
    • ...
  • Test suite execution
    • JUnit
    • ...

SUT: System under test

Categories for machine learning in software testing:

  • Software fault prediction
  • Test oracle automation
  • Test case generation
  • Test suite reduction, prioritization and evaluation
  • ...

9.2.2 Machine Learning

This chapter will focus on:

  • Detecting potential errors
  • Constructing test oracles
  • Generating test cases

9.3 Applications of Machine Learning in Software Testing

9.3.1 Machine Learning for Software Fault Prediction

Goal: track and reduce latent software defects as early as possible.

K-Means Clustering

Features used including Lines of Code, Cyclomatic Complexity, Unique Operator...

GA (Genetic Algorithms)

  1. Determine the most important attributes for predicting faulty modules.
  2. Predicting software errors.

ANN

DTR

Predict the number of faults in given software modules under 2 different scenarios.

Intra-release model and inter-release model.

Ensemble Classification Models

Multiple models are better than individuals.

Feature Selection Technique

Naive Bayes + SVM

9.3.2 Machine Learning for Test Oracles Automation

Chapter 10 - Creating Test Oracles Using Machine Learning Techniques

10.1 Introduction

Two types of oracles: implicit and specified oracles. [p1-p2]

  • Implicit: system crashes, null pointer, unhandled exceptions
  • Specified: formal specification

Some motivated research questions

  • Anomaly detection
  • Data: what to choose and how to convert to features
  • Fault: what fault are detected and its relationship with the approaches

10.2 Background on Test Oracles

Two parts of a test oracle:

  • Oracle information that represents expected output
  • Oracle procedure that compares the oracle information with actural output

A perfect and complete automated test oracle:

  • Have the source of infomation which makes it possible to produce equivalent behavior to SUT.
  • Accept all entries for specified system and always produce the correct result.
  • Have the answer to the data used in test.

10.2.1 Test Oracles Based on Individual Test Cases

10.2.2 Test Oracles Based on Formal Specifications

Can be usee in regression testing and mutation testing.

10.3 Related Work

4 Topics:

  • Assertion
  • Specification
  • State-based performance
  • Log file analysis

3 kinds of oracles:

  • Specified oracles: accurate but hard to realize
  • Implicit oracles: detect system crash
  • Derived oracles: built from properties of the SUT

10.4 Test Oracles Based on Machine Learning Techniques

Main concept: detect unexpected patterns (fault behaviors) in a large set of observations, events or items.

10.4.1 Test Oracles Based on Supervised Learning Techniques

The least generally applicable approach.

[Why do we need to simulate? Some papers proposed that in regression testing, they compare the output with ANN result.]

  • Single-ANN oracle
  • Multi-ANN oracle: where single ANN was defined for each item in the output. Training could be expensive.

Using SVM/ANN/DT to evaluate pass/fail executions traces.

10.4.2 Test Oracles Base on Semi-Supervised Learning Techniques

Clustering is often performed as a preliminary step in data mining process.

Refining classification of failures.

A common way: clustering + use labeled data, then analysing input/output/execution trace to predict pass/fail

Roper proposed one workflow [p14 [44] May have a closer look]:

  1. Cluster the tests
  2. Prioritize the smaller clusters - they are possibly the faults
  3. Label them, and apply semi-supervised learning

10.4.3 Test Oracles Based on Unsupervised Learning Techniques

Assumption: abnormal instances are relatively infrequent.

[Idea: can this be applied to fuzzing?]

Clustering approach out performed with coverage based approach in terms of fault detection rate.

10.4.4 Summary and Findings

10.5 Discussion

Scalability: able to handle any size of the software

Fault detection ability

False positive rate: the biggest issue with automated oracles

Cost of effort and resources: supervised > semi-supervised > unsupervised

10.6 Further Research Direction

10.6.1 Improving the Accuracy

10.6.2 Improving the Scalability

Handle more complex formate of data sets.

10.7 Conclusion