I Underestimated WIKIPEDIA…!!


To me, Wikipedia was just a source of information which I can access to get an 
overview of almost any topic. But, I never knew about its collection’s importance 
until I came across its few related works.

Let’s start with some basic info about Wikipedia and will dive deep into its 
technical aspects.

All of us know it as an ENCYCLOPEDIA ( which contains summaries). It is freely 
available to all. It was launched by Jimmy Wales and Larry Sanger on 15th January 
2001.

CURRENT STATISTICS as of February 2014.
More than 4 Million articles
299 different languages

Now, with the advancement of technology in the field of NLP 
( Natural Language Processing ), IR ( Information Retrieval ), Data Mining there 
appeared the need for Lexical Databases a lot. Wikipedia is contributing to that 
need on a huge scale.

HOW…?

The whole concept revolutionized through the idea of using  
WIKIPEDIA AS A KNOWLEDGE BASE.

Basically, it arranges the concepts, articles in a structured way where the related 
topics are connected through edges and hence a graph is established.
As an example,

Ontology is set between different articles, which can further help to find Semantic
Relatedness which subsequently will be helpful in various applications.
There has already been some good work done in this direction. 
This is an instance from the research paper.

Here the semantic relatedness between CAT and DOG is represented. 
This information can be used to solve problems like,

  • Building a search engine which gives the result on the basis of semantically 
    related terms of the query.

  • Suggesting users different pages, for example, if the user has searched for 
    ‘Mother Teresa’, giving him suggestions of ‘Albert Schweitzer’ who is another 
    famous Philanthropist.

There are different ways of providing further suggestions/ recommendations. They can 
be suggested on the basis of feedback from the users, by analyzing which pages does 
user refers to and using it to calculate further results.

I came across a beautiful model WIKI GALAXY. It shows the relationships between 
different articles pictorially.

An instance from that galaxy is


Here different articles are shown as small objects/ stars and are linked by the edges. 
It also lists the links to related articles by categorizing them into closely related, less 
related, distant etc.

BASIS OF THE STRUCTURE

There are 2 main concepts behind the structures formed:

1) Hyperlinks

The hyperlink structure of the pages is extremely wide and helpful.

2) Categorization of terms

The terms being used in the articles are categorized and hence relationships are
 developed on the basis of what, articles are related.

With more years, this will grow into more dense network aiding more 
problems to be solved.


References:
  1. https://en.wikipedia.org/wiki/Main_Page


Comments

Post a Comment