To me, Wikipedia was just a source of information which I can access to get an
overview of almost any topic. But, I never knew about its collection’s importance
until I came across its few related works.
overview of almost any topic. But, I never knew about its collection’s importance
until I came across its few related works.
Let’s start with some basic info about Wikipedia and will dive deep into its
technical aspects.
All of us know it as an ENCYCLOPEDIA ( which contains summaries). It is freely
available to all. It was launched by Jimmy Wales and Larry Sanger on 15th January
2001.
available to all. It was launched by Jimmy Wales and Larry Sanger on 15th January
2001.
CURRENT STATISTICS as of February 2014.
More than 4 Million articles
299 different languages
Now, with the advancement of technology in the field of NLP
( Natural Language Processing ), IR ( Information Retrieval ), Data Mining there
appeared the need for Lexical Databases a lot. Wikipedia is contributing to that
need on a huge scale.
( Natural Language Processing ), IR ( Information Retrieval ), Data Mining there
appeared the need for Lexical Databases a lot. Wikipedia is contributing to that
need on a huge scale.
HOW…?
The whole concept revolutionized through the idea of using
WIKIPEDIA AS A KNOWLEDGE BASE.
WIKIPEDIA AS A KNOWLEDGE BASE.
Basically, it arranges the concepts, articles in a structured way where the related
topics are connected through edges and hence a graph is established.
topics are connected through edges and hence a graph is established.
As an example,
Ontology is set between different articles, which can further help to find Semantic
Relatedness which subsequently will be helpful in various applications.
Relatedness which subsequently will be helpful in various applications.
There has already been some good work done in this direction.
This is an instance from the research paper.
This is an instance from the research paper.
Here the semantic relatedness between CAT and DOG is represented.
This information can be used to solve problems like,
This information can be used to solve problems like,
- Building a search engine which gives the result on the basis of semantically
related terms of the query.
- Suggesting users different pages, for example, if the user has searched for
‘Mother Teresa’, giving him suggestions of ‘Albert Schweitzer’ who is another
famous Philanthropist.
There are different ways of providing further suggestions/ recommendations. They can
be suggested on the basis of feedback from the users, by analyzing which pages does
user refers to and using it to calculate further results.
be suggested on the basis of feedback from the users, by analyzing which pages does
user refers to and using it to calculate further results.
I came across a beautiful model WIKI GALAXY. It shows the relationships between
different articles pictorially.
different articles pictorially.
An instance from that galaxy is
Here different articles are shown as small objects/ stars and are linked by the edges.
It also lists the links to related articles by categorizing them into closely related, less
related, distant etc.
It also lists the links to related articles by categorizing them into closely related, less
related, distant etc.
BASIS OF THE STRUCTURE
There are 2 main concepts behind the structures formed:
1) Hyperlinks
The hyperlink structure of the pages is extremely wide and helpful.
2) Categorization of terms
The terms being used in the articles are categorized and hence relationships are
developed on the basis of what, articles are related.
developed on the basis of what, articles are related.
With more years, this will grow into more dense network aiding more
problems to be solved.
problems to be solved.
References:
- https://en.wikipedia.org/wiki/Main_Page
(Y)
ReplyDelete