
Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. However, there has been little work exploring useful architectures for attention-based NMT. 61 Fax 61.)Īn attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. (Copies available exclusively from MIT Libraries, Rm. In addition, the thesis examines an approach for combining OOV modelling with recognition confidence scoring.

The thesis also extends this OOV approach to modelling multiple classes of OOV words. Starting with individual phones, this procedure uses the maximum mutual information principle to successively merge phones to obtain longer subword units. We also propose a data-driven iterative bottom-up procedure for automatically creating a multi-phone subword inventory. Such language models are utilized within the subword search space to help recognize the underlying phonetic transcription of OOV words. We present a dictionary-based approach for estimating subword language models. In dealing with this challenge, we explore several research issues related to designing the subword lexicon, language model, and topology of the OOV model. The main challenge with such an approach is ensuring that the OOV model does not absorb portions of the speech signal corresponding to in-vocabulary (IV) words. Subword units have the attractive property of being a closed set, and thus are able to cover any new words, and can conceivably cover most utterances with partially spoken words as well. Examples of such subword units are phones, syllables, or some automatically-learned multi-phone sequences. This OOV model achieves open-vocabulary recognition through the use of more flexible subword units that can be concatenated during recognition to form new phone sequences corresponding to potential new words. To achieve this goal, an explicit and detailed model of OOV words is constructed and then used to augment the closed-vocabulary search space of a standard speech recognizer. We propose a novel approach for handling OOV words within a single-stage recognition framework. This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. We discuss the suitability of different word segmentation techniques, including simple character ngram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English!German and English!Russian by up to 1.1 and 1.3 BLEU, respectively.

This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem.
