Teaching johnny the Semantic method for Query-Document matching!



Today we are going to look at how deep structured semantic model is helpful for web search, how it is more accurate than any other techniques for document matching based on a certain query. Semantic modeling for <query, documents> matching is one of the research topics on which people have worked for a long period of time in the IR community.
Throughout the blog whenever I say semantic model, I will be referring to latent semantic model(like LSA- Latent Semantic Analysis).


Why do we need Semantic Models?

During the web search, we give certain query and get all the relevant documents based on the keyword matching between the query and those present in the document. Given such a task to do, the first step we do is tokenization which has its own drawbacks. Due to the fact that, there might be a vast difference of vocabularies and language style between the query and the documents which can lead to inaccurate results. This is were semantic models come into play, they address the language discrepancy between search queries and the documents by making a cluster based on terms that occur in similar context. This is also called semantic clustering.


What is LSA?

LSA is one of the most famous linear projection models. The basic assumption in LSA is that the words that are close in meaning will always occur in the same document. In LSA, we first have to do a singular value decomposition of a matrix(document-term). Each query or a document is mapped to low dimensional vector. Where A is a projection matrix and Q,D are the low dimensional vector representation of query and document.




The relevance score of a query with a document is proportional to the cosine similarity score and is given by:



How we can use Deep Learning for Semantic Modeling?

In the research paper listed in reference, they have used a Depp Neural Network architecture for mapping the text features present in the document or query to the feature space in semantic level. This can be visualized using the below diagram form the same paper. The input to the DNN will be a higher dimensional vector and the result from it will be a low dimensional vector in semantic space.






Were Q-> query and D-> document. Here we can see that the authors have also used word hashing. We will dive into that in a bit. Above model is called the Deep Structured Semantic Model a.k.a DSSM.

What is Word Hashing and its advantages?

It is similar to an ordinary hashing and is used to store the entire vocabulary in a more efficient space.
Consider an example of word 'queen', we start with adding a start and end mark to the word. In this case, it will be like '#queen#'. Then we simply break the word into n-gram letters. Considering the case where n = 3. We will end up getting '#qu', 'que', 'uee', 'een', 'en#'. So now the word is represented by a letter n-gram vector.

This type of hashing helps in reducing the vocabulary size to great lengths, as now the same vocabulary can be represented by lower dimension vector. Consider a vocabulary of size 500K words in it. After using the word hashing the same vocabulary can be represented by a 30,621-dimensional vector by only doing tri-gram letters. Getting a 16-fold reduction in dimensionality.

The only drawback of any hashing is the collision problem. But in the above example, they got a collision rate of 0.0044% which is quite negligible.
As the size of the vocabulary is very large for web search. In DSSM, using a simple raw vector representation of the term will be unmanageable for model training. But instead, we use word hashing at the first layer of DSSM which make the life easier and provides a great help in training the model.

Results




We can clearly infer from the above table that the semantic model performed better than other models. It actually outperformed the baseline models like TF-IDF, BM25, etc...

References



Comments