In an increasingly connected world, reducing (if not removing) language barrier between people from different parts of the world is an important problem to solve. Language translation means given a idea, knowledge or message encoded in a language, L1, be re-encoded in to another language of choice, L2. A language (not to be confused with Language Theory) is not just a collection of words connected according to rules of Grammar and syntax. The interplay between sentences over huge paragraphs and documents accounts for the major part of communication. Below I have discussed about various algorithms proposed over the past decades, followed by the state-of-the-art.
Dictionary-Based Translation
The first public demonstration of this method (in 1954) showed translation of 250 words between Russian and English languages. It was based on matching the words in the source language to the words in the Target language based on their meaning as pre-entered into respective dictionaries.
This algorithm blatantly ignored the mutual arrangement and placement of the constituent words and hence, failed to capture the underlying syntactic structure. Even the semantic part of the language could be captured by the word meanings, only partially.
Interlingual Translation
The next generation of systems used a specialised transitional language based on some set rules. Hence, the source language was decoded into an “Interlingua” before encoding the message into the target language.
This model of machine language translation was more efficient than the dictionary based method as the Interlingua did capture the syntactic structure. Semantic coverage was similar to the dictionary-based model.
Statistical Translation
This class of methods is based on using synced bilingual corpora to compare pieces of source text with excerpts of source language in the corpora and using the corresponding excerpt of the target language in the corpora.
These systems perform reasonably well given gargantuan amounts of human-translated documents. IBM and Google used this method for years. The major drawbacks are
- their inability to translate into languages where syntax effects the semantics (the meaning of the sentence is dependent on the placement or form of words)
- their inability to learn about words or phrases with infrequent occurrence.
Neural Translation Models
These include the Deep Learning based approach.
Encoder-Decoder Based Architectures
These work on the principle that data exists in a lower dimension space and hence, can be represented in a much more compact form. This is done such that the encoded low-dimensional representation is a nonlinear function of the input and the output is a nonlinear function of the encoded representation.
These encoder and decoder functions can be learnt using the standard neural network learning procedures like Standard Gradient Descent (SGD).
A major drawback of these architectures is the requirement to fit a variable sized sequence of words into a fixed size (the input-size of the encoder) which means no matter how long or short the input sentence is, the output of the translation model will be of fixed size. This is obviously a problem.
Soft-Attention Based Architectures
These models use sophisticated data structures called LSTM (Long Short Term Memory) units arranged as bi-directional Recurrent Neural Networks (bi-RNNs). Each unit takes a word (or character) as an input and receives another “memory input” from the previous unit. This memory input is a manifestation of how much context from the source input to preceding LSTM units and how much “attention” should be given to the source input to the present unit in order to generate the output corresponding to the present unit.
These models work exceptionally well. Adding multiple layers of these Bi-RNNs approaches State-of-the-art.
References:
Comments
Post a Comment