Information can be counter-intuitive!
An intuitive definition of intuition is, "knowledge or reasoning ability that does not have to be learned". Unfortunately, intuition is not the purest form of reasoning in the Scientific method. Over the last few centuries, it has frequently taken a backseat to "the burden of proof". The development trends of information retrieval methods make this evident.
How, one may ask?
To answer this question, one must look at the developmental trends of IR systems. Intuition suggests that any user would prefer a ranked-retrieval system with natural-language-based queries. It is quite intuitive to expect any user to look for such convenience in searching for information. Yet, most IR systems were based on Boolean Retrieval. It is only in the last decade or so that ranked-retrieval systems have started dominating. A Boolean-retrieval system suffers from the following limitations:
- No ranking of documents - The result is simply the list of all documents which satisfy the input query.
- Conditional Cognitive Overload/Information lack - Depending on the way the query is structured, either too few or too many documents are matched.
- Requirement of Syntactical and Domain Expertise - A Boolean-retrieval system works well for domain experts who have the required knowledge of keywords and the ability to appropriately structure queries. General users, who lack these traits, are likely to suffer from cognitive overload or information lack.
- No scope for indecisiveness - Indecisive users looking for more knowledge are unlikely to find the right words for the query.
A natural-language-based ranked-retrieval system is intuitively ideal for the general user. A general user often wants to learn about something she has no prior knowledge about. A free-text query is preferred in such a scenario. On the other hand, a user might know the approximate content of a document, but not the actual words. Even in such a situation, a free-text query is ideal. Moreover, document ranking helps prevent cognitive overload, since no end-user has the time or the patience to sift through thousands of irrelevant results. Despite this intuitive preference for ranked-retrieval systems, the I.T. industry kept employing Boolean-retrieval systems for many years. The possible reasons for this are:
- Prevalence of Query-based Database Systems - Database Systems relying on structured data were commonly used. Industry generally prefers working with platforms for which domain experts are easily available.
- Human Nature - It is human nature to follow the trend.
- Perceived Infeasibility of Natural-Language systems - Systems employing natural-language queries were perceived to be infeasible with the limited capability hardware of the time.
- Lack of availability of Computers - A personal computer was a prized commodity. Very few people back then could afford one. Naturally, systems were developed only for domain experts with access to industrial technology.
The above reasons sound reasonable. However, history has shown that when the need arises, people tend to come up with reasonable low-cost solutions to problems which are supposedly "infeasible". The insistence of the Scientific method on "proof of feasibility" often kills originality. An analogy can be drawn to Democracy. It is safe, but slow.
Decent ranked-retrieval systems could indeed have been developed, had someone tried sincerely. Too much information can indeed be counter-intuitive!
Comments
Post a Comment