Myths of Information retrieval

Here I want to discuss some preconceptions I had before I took information Retrieval class. The reason for discussing them is because I think these are very common preassumptions that can be made by anyone new in the field of Information retrieval.

  •      Information Extraction and Information Retrieval are same.

In simplest terms, information retrieval is about getting documents relevant to a search query(i.e. user information need). These retrieved documents are generally ranked with the aim that any document with higher rank will be more relevant to user information need.

Whereas Information Extraction is more towards in NLP direction, where you train a model to derive some hidden information from the raw text. In IE you have some structured or semi-structured data and you convert it into some structured data so it is easily readable, maybe not by humans but by a computer program. At the end of Information Extraction, you build some knowledge from the information you have and now can answer some questions.

  •     Google is a pioneer in Information Retrieval.
This is true that Google has the most commercially successful Information Retrieval System but the field of IR is relatively older than Google. And Google is not even the first search engine, it was Archie which came in 1992(Google in 1998).

The idea of using computers to search for relevant pieces of information was popularized in the article "As We May Think" by Vannevar Bush in 1945. Automated information retrieval systems were first introduced in the 1950s. The first large information retrieval research group was formed by Gerard Salton at Cornell in 1960s.


  •     Information Retrieval is used in Search Engines only.

 A search engine is one type of many different types of IR systems
. Information retrieval is a very broad term while search engine is a type of information retrieval that works basically on web documents.
Search bar which you see in any e-commerce website is also another type of Infomation Retrieval System.

  •     Only challenge in Information Retrieval is ranking documents. 

Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.
-- Marissa Mayer(CEO Yahoo)

She had made a valid point. I agree that we have made a great deal of progress in searching but there are still some challenges left like how to express new forms of content like video, maps.

It's true that ranking documents efficiently is a challenge but not the only one. Many challenges hover around relevance like multimedia retrieval, filtering, distributed IR and integrated solutions



References:
  1. https://www.quora.com/What-is-the-difference-between-Information-Extraction-and-Information-Retrieval
  2. https://en.wikipedia.org/wiki/Information_retrieval#History
  3. https://www.quora.com/What-is-the-difference-between-search-engine-and-information-retrieval-system
  4. https://www.searchenginepeople.com/blog/5-common-information-retrieval-myths.html

Comments