Use of Information Retrieval in Mitigating Security Vulnerabilities

Source: Google

The software breaches in IT industry is not a new term nowadays. The major source of vulnerabilities occurs from the source codes written by the developers. Thus it's a relevant problem. Proper training to software developers can reduce such loss to organizations. However the traditional training methods do not include the source code into account.

Hence the hypothesis is that, if the bugs are identified at an early stage then the complexity of error occurrence over the time can be reduced. Using information retrieval techniques, a new system can be proposed which uses a public repository as knowledge base. Using a recommender system, relevant information and facts can be retrieved from the source codes, reusable components, APIs etc.

The CWE repository is used as the knowledge base, which has more than 700 articles related to security vulnerabilities and 4 vulnerability description files. The IR model uses Jaccard Index or Jaccard similarity coefficient and Vector Space Model (VSM) with tf-idf value is calculated to measure the similarity between the vulnerability description file and CWE articles.

Jaccard index calculation for the similarity score between the vulnerability description file v and CWE article a:

Cosine similarity for the cosine of the angle between the vectors of v and a:



Preprocessing of documents includes stop-word removal and stemming. Each document is assigned some weight, i.e. the tf-idf score and the similarity between the documents is measured using the cosine similarity. For evaluation, a commercial tool was used to evaluate the recommender system.


The vulnerabilities taken for the study were: XSS - Cross Site Scripting vulnerability, SQL Injection vulnerability, HTTP Response Splitting vulnerability and Hard-coded constant database password. Three criteria were used for evaluation: Relevance, suitability for training and amount of information in the article.


Fig. 1. Overall result for all vulnerabilities. [1]


The Figure 1 shows that the overall result with - approach is better than the Jaccard Index approach as it's not weighted. Whereas when compared with the commercial tool, the difference is negligible when the relevant articles with flagged vulnerabilities are searched. Thus the CWE repository and the repository used by the commercial tool is useful enough for training software developers. This is a low-cost recommender system which can be used by the organization for training purpose.

Referneces:

[1]. Muhammad Nadeem, Byron J. Williams, Gary L. Bradshaw, Edward B. Allen , Human Subject Evaluation of Computer–Security Training Recommender.

Comments