New Techniques for Research Paper Recommendation System


Warning: Don’t Try This at Home, the content of this blog is highly experimental, and is a result of excessive day-dreaming with little to no actual implementation backing it.

Dataset Requirement:  All the algorithms mentioned below, requires near-complete dataset of Research Papers. If you are running the algorithm on…wait! I told you not to run these, these algorithms are ‘fun and not run’ basis. For these algorithms to work on any domain, we need densely connected research papers, where a connection is represented by a reference from one research paper to another. Sparsely connected dataset won’t give high accuracies.

Basic Tool Used: Neural Network, the thing that you use when you don’t know anything about data science.


Input: Input to all the below-mentioned algorithm is a research paper itself, what we do is we give a research paper as input asks the algorithm to recommend new research papers based on the given research paper.

Technique 1:

This technique is inspired by Word2Vec, what word2vec does is it models the meaning of a word into a vector, based on its neighbours. At each epoch it asks the neural network to predict the neighbours and then it backpropagates based on actual neighbours. This results in similar words to cluster together in vector space. Pretty Neat, eh?

Initial Thought: If we could somehow, using methods similar to Word2Vec, cluster similar paper together, then we could recommend papers from those clusters based on person’s interest. But isn’t that Doc2Vec do? Yes, Doc2Vec converts documents to vectors using Word2Vec. But research papers are no ordinary documents, they contain references, which are the documents actually similar to given document or even if not similar, those documents are a good candidate for recommendations.\

What are we going to do?
We are going to add another layer to Doc2Vec and make it our output layer, or maybe we can have multiple hidden layers between Doc2Vec layer and output. The number of nodes in output layer will be the number of total unique references we have in all the papers combined. We’ll use the references in the paper as expected output for the paper. A diagram is given below:


Doc2Vec represents the meaning of research paper in a fixed number of dimensions, we connect that to the output layer via hidden layers. For each document, we forward propagate creating document vector and then finding the probability of each paper to be referenced by input paper at output nodes. Model is trained using actual references.

Finally, when our input document comes, on basis of which we need to make the recommendation, we pass it through our neural network, find the probabilities of each paper being referenced by our given paper, remove the actual references and choose the top k papers. Now we can recommend these papers to the user.

Technique 2:

This technique is inspired by Restrictive Boltzmann Machines, and as I know very little about them right now, many concepts could be wrong. Restrictive Boltzmann Machines are used for recommender systems, a simple version has 2 layers, first input layer which has all the items present for the recommendation, second hidden layer which captures latent variables. For example, user1 came and bought 2 items, nodes corresponding to those 2 nodes are activated, let's say he bought Natural Language Processing Book, Information Retrieval Book, user2 came and bought NLP book and ML book, user3 came and bought Network Fundaments book and Internet Protocols book. As this goes on the latent variable captured by the hidden layer is that NLP, IR, and ML books belong to similar category, which it doesn’t know, but it belongs to same category and user who bought one are more likely to buy another one from one of these rather than that of Networking books.

Initial Thoughts: We could advance our previous method by trying to capture the latent semantics of the research paper. But to recommend any paper like we were recommending the books, we need user data, which we don’t have according to our problem statement. But we do have research papers and their references. When we are writing a paper we do Literature Review, let say Literature Review is shopping, and there are many papers we search, let’s say that is equivalent to browsing items in a shop, in the end, we reference some of those papers, which is equivalent to buying those items in our case. So, using research paper and reference data we can model user behaviour. The according to the concept that people who bought these also bought these, we can create people who referenced these also referenced these.

What are we going to do?
We are going to add another layer to our model after output layer, make it similar to RBMs. In training phase, we give out document and references, the document goes into input layer and references are used to train the model. Wait, where is the Boltzmann machine part? Oh, keen observer, eh? What we are doing is training to networks simultaneously using shared layers. Over the time our network is able to correctly relate the paper to reference, which is similar to being able to observe the correct buying pattern and based on that buying pattern our Boltzmann Machine would be able to predict the next thing to buy, that is next paper to read.






Comments