Cross-language Information Retrieval

Cross-language Information retrieval (CLIR) is getting the answer in a different language than the language of the query. To find information in different languages is becoming a necessity due to the globalization of the economy. The CLIR is mostly based on the translation. Other non-translation methods also exit such as cognate matching, latent semantic indexing, relevance models, etc. The translation can be of document, query or both. The query is translated to target language, and the document is converted to query language (source language).

1.Query Translation

The Query translation is done due to simplicity and fast translation speed. The problem with query translation is whether to translate phrases or word or not. The disambiguation is done by translating word by word and use co-occurrence information. This method works well with patent information but is not very efficient. To get better results syntactic dependency can be used.

Query expansion is also used to improve retrieved results. The CLIR uses two type of query expansion: post-translation and pre-translation. Both pre-translation and post-translation, when used together, give better results.

2.Document Translation

Document translation is done using machine translation systems. The document translation helps in translating words more accurately or into a synonym from used in the query. The document translation performs better than query translation. However, it not feasible to translate the documents in most of the cases due to high computation expensive. Other problem includes non-availability of machine translation system for different languages.

3.Query and Document translation

In this method query as well as the document is translated. This translation is expensive but gives better results as discussed in [1]. The reason is that translation is bi-directional. The first from the source language to target and then from target language to source language.

4.Translation methods

The different translation methods are parallel corpora, dictionary-based and machine translation systems. The different translation methods are used according to the need. Like for Query translation, corpus-based or dictionary-based translation methods are used. For document translation, machine translation systems are used.

Dictionary-based

In dictionary-based approach, we use the bilingual dictionary. The bilingual dictionary has words in a language and their translation in another language. This method is used more than the other two methods due to its cost-effectiveness. This method sometimes also contain translation probabilities to give weights to the word.

Machine Translation

It uses translation system to translate query or document. . This method is computationally expensive, and method is not very useful for web documents due to its high cost.

Parallel corpora

This method gives better results than dictionary-based approaches. However, it is very complicated and time consuming process and mostly it is difficult to find parallel corpora for some languages and the one which is large enough to be used.This method is more useful for query translation.

5.REFERENCE

[1] McCarley, J. S.,Should we translate the documents or the queries in cross-language information retrieval?, in:Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (1999), pp. 208–214.

IIITD IR MELANAGE

Search This Blog

Cross-language Information Retrieval

Comments

Post a Comment