GIR
is formed by combining techniques of Information Retrieval and Geographical
Information System to develop an application that can solve textual queries
which include geographic dimension. GIR can comprehend the geographical
knowledge provided within user queries and web documents, and is able to answer
quite satisfactorily. for eg. GIR can be used to answer queries like,
"restaurants in Delhi". GIR can also be used to extract and resolve
locations meaning in an unstructured text. After correctly identifying the
location references a GIR system index the information for searching and
retrieving the query output.
GIR System
Issues in GIR :
· Detecting
Geographic References : First
issue in building a GIR System is detecting the genuine geographic references
as a place name can refer to both a geographical location and an organization's
name. for e.g. "talks with Washington". For solving this issue we
need to analyze text to distinguish between a geographical place and some other
entity.
· Disambiguating
place names : Once it has been confirmed that a place name has been
referred in geographical sense, the next challenge comes in uniquely
determining the place to which the name refers. for e.g. Newport, Springfield.
Ambiguity is removed by using knowledge obtained from contextual clues within
the document.
· Vague
Geographic Terminology : Sometimes users inputs jargons in
the form of queries and it becomes difficult for a GIR System to produce
results to these absurd queries. for e.g. South of France, the Midwest in the USA. To resolve
such queries GIR make use of gazetteer defined in the later section of this blog.
· Geographical
relevance ranking : Once a system finds some set of
relevant results, ranking them in order of relevance to user query is very
important. Relevance can be computed by a score which takes into account the
frequency of occurrence of query terms in the retrieved documents. Spatial
score can also be used to find geometric match between query footprint and
document footprint.
Components of GIR :
1)
Semantic Similarity : Semantic Similarity is the
technique of calculating similarity (or say distance) between set of documents
or terms on the basis of meaning or semantic content. SS can be computed by
topological similarity or using several tools like WNetSS API, a JAVA API based
on WordNet semantic resource. In GeoInformatics, SIM-DL similarity server is
used to calculate similarity between concepts stored in ontology.
2)
Word-sense disambiguation : WSD is the
process of correctly identifying the sense of
a word having multiple meanings in a sentence. There are 2 approaches to
disambiguate word sense - Deep Approaches
that tries to analyze the complete text. It is not used in practice, because
we mainly don't get access to complete body of knowledge. Shallow Approaches that doesn't analyze the complete text. These
approaches just use the surrounding words and then tag these word according to
its sense. for e.g. If bass has nearby words like fishing or river then it must
be in the fish sense.
An example of Word Sense Ambiguity
Gazetteer :
Gazetteer is a geographical directory used
in addition to an atlas. GIR often relies on a gazetteer to obtain information
regarding social statistics, physical features of a country, city or a region.
The content varies from peaks and waterways dimensions to population data, GDP
and literacy data.
Research Areas of GIR :
· Automatic generation of natural
language photo captions.
· Exploitation of 3D city models to
acquire knowledge about camera images view.
·
Building Spatial search engines.
· Developing web mining techniques
and creating a web questionnaire to acquire knowledge absurd place names.
References :
·
Geographic Information Retrieval
by Christopher B. Jones & Ross S. Purves.
·
http://www.cs.unibo.it/~montesi/CBD/Labs/GIR_UNIBO.pdf
·
https://en.wikipedia.org/wiki/Geographic_information_retrieval
Comments
Post a Comment