Microsoft is making upgrades to Translator and other Azure AI services powered by a new family of artificial intelligence models its researchers have developed called Z-code, which offer the kind of performance and quality benefits that other large-scale language models have but can be run much more efficiently.
“Our goal is to help everyone and every organization on the planet to communicate better, and to achieve that goal there are really two important dimensions — we want the quality of translations to be as good as possible and we want to support as many languages as possible,” said Xuedong Huang, Microsoft technical fellow and Azure AI chief technology officer.
Z-code takes advantage of shared linguistic elements across multiple languages via transfer learning —which applies knowledge from one task to another related task — to improve quality for machine translation and other language understanding tasks. It also helps extend those capabilities beyond the most common languages across the globe to underrepresented languages that have less available training data.
“With Z-code we are really making amazing progress because we are leveraging both transfer learning and multitask learning from monolingual and multilingual data to create a state-of-the-art language model that we believe has the best combination of quality, performance and efficiency that we can provide to our customers,” Huang said.
These models use a sparse “Mixture of Experts” approach that is more efficient to run because it only needs to engage a portion of the model to complete a task, as opposed to other architectures that have to activate an entire AI model to run every request. This architecture allows massive scale in the number of model parameters while keeping the amount of compute constant.
To put these models in production, Microsoft is using NVIDIA GPUs and Triton Inference Server to deploy and scale them efficiently for high-performance inference.
Microsoft has recently deployed Z-code models to improve common language understanding tasks such as name entity recognition, text summarization, custom text classification and key phrase extraction across its Azure AI services. But this is the first time a company has publicly demonstrated that it can use this new class of Mixture of Experts models to power machine translation products.
The new Z-code-based translation model is now available, by invitation initially, to customers using document translation in Translator, a Microsoft Azure Cognitive Service which is a part of Azure AI.
Microsoft’s Z-code models consistently improved translation quality over current production models, according to common industry metrics. In contrast with typical multilingual transfer learning approaches, which typically show AI quality gains in languages that have fewer direct translation examples available for training, the Z-code Mixture of Experts models show consistent gains even in the largest languages.
Human evaluators in a blind test commissioned by Microsoft found that the Z-code Mixture of Experts models improved translations across languages, with an average gain of 4%. For instance, the models improved English to French translations by 3.2 %, English to Turkish by 5.8 %, Japanese to English by 7.6%, English to Arabic by 9.3% and English to Slovenian by 15%.
Creating more powerful and integrative AI systems
Z-code is part of Microsoft’s larger XYZ-code initiative that seeks to combine models for text, vision, audio and multiple languages to create more powerful and integrative AI systems that can speak, hear, see and understand people better.
Over the past five years, Microsoft has developed models that have matched human performance in conversational speech recognition, machine translation, image captioning, SuperGLUE natural language understanding and commonsense question answering. These breakthroughs provide the foundation to realize more ambitious AI systems that can achieve multisensory and multilingual learning that is closer to how people learn and understand, Huang said.
“Those are the pieces, the building blocks that we are using to build a truly differentiated intelligence…and to form production systems that are cost efficient,” Huang said.
Z-code models were developed as part of Microsoft’s AI at Scale and Turing initiatives, which seek to develop large models that are pretrained on vast amounts of textual data to understand nuances of language — which can be integrated in multiple Microsoft products and also made available to customers for their own uses.
The same underlying model can be fine-tuned to perform different language understanding tasks such as translating between languages, summarizing a speech, offering ways to complete a sentence or generating suggested tweets, instead of having to develop separate models for each of those narrow purposes.