Automatic Bug Localization

To build some software, engineers spend a lot of time to write the code files to implement the individual components of the software program. They try to make their code modular and manageable by dividing different tasks in different files. But what if there is some bug in the program? Where to locate what is causing the problem in the program? πŸ™‡


The testers and the debuggers provide the descriptions of the issues present in the program in the form of the bug report.  Now, its the responsibility of the developer to find the corresponding source file which has the bug in it according to the descriptions present in the bug report. This task of going through a large number of source files to find the file with the bug in it is called bug localization. 
What's the problem?
Bug localization can be very time - consuming !!! πŸ˜ͺ

Automated Bug Localization

To avoid the wastage of precious time and to reduce the developer's effort, the effective automation task of bug localization can be very useful. There are two techniques to automate such task:
Since the static techniques just require the source code, it is not necessary for the program to be complete and such techniques can be applied at any stage during development.
There are many static techniques which can be used for automating the bug localization task.

  1. Image Source [2]

    IR Techniques:  The features can be generated both from the bug report and the source files and depending on the similarity between them, the potential source files which are buggy can be found. Various IR techniques which can be applied are Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), probabilistic ranking with execution scenarios, relevance feedback mechanism, advanced Vector Space Model, etc [1].
    In work presented by [2], it was found that IR technique like LDA gives sufficient accuracy.


  2. ML Techniques: Machine Learning is also a possible option for bug localization automation. Various classifiers like Naive Bayes [3] can be used to classify the source files and assign them to the corresponding bug report file. This approach is based on the topic modeling. But there is a problem of lexical mismatch between technical terms present in the report and the terms/keywords present in the source code. The terms present in the documentation was also used but still, there was a problem.

What can be done better?     

Combine both the techniques !!!😼😼

In the work done by [1], it was found that if the features found by the text similarity using IR techniques are combined with the Deep Neural Networks (DNNs) to learn the features and extract the details, then it outperforms the results obtained by using the techniques individually.

Image Source: [1]
It is expected that the DNN will be able to learn the abstract concepts which are present in both bug report and the source code files. This way the problem of lexical mismatch can be solved.  An advanced IR technique, rVSM is used to extract the text similarity features between both bug report and source code. DNN-based autoencoder is also used to take care of the model scalability. So, it was found that the DNN model and the IR technique, rVSM complement each other and proved to be better than individual techniques in terms of accuracy.


Conclusion

The combined approach of DNN and rVSM, an advanced IR technique is found to be best till now and is the state-of-the-art in my knowledge. Other combinations of ML models and other techniques can also be experimented to make the life of developers even more easier. Thanks to such automated bug localization systems, the developers can now save their time and do some productive work in its place. πŸ˜ƒπŸ˜ƒ 


References:

[2] https://www.sciencedirect.com/science/article/pii/S0950584910000650
[3] D. Kim, Y. Tao, S. Kim, and A. Zeller, “Where should we fix this bug? a two-phase recommendation model,” IEEE Transactions on Software Engineering, vol. 39, no. 11, pp. 1597–1610, 2013. 

Comments