AI that understands speech by looking as well as hearing
Meta AI is working on new conversational AI systems that can recognize nuanced correlations between what they see and what they hear in conversation. People use AI for a wide range of speech recognition and understanding tasks. But oftentimes these speech understanding systems don’t work well in everyday situations when we need them most. Meta AI’s AV-HuBERT is the first system to jointly model speech and lip movements from unlabeled data.
Aligning Language Models to Follow Instructions
InstructGPT can respond to tasks defined implicitly via a prompt, without an explicit instruction. It can give wrong or misleading outputs when the instruction assumes a premise that is not true. When given a sensitive prompt or instruction, it is less likely than GPT-3 to produce biased or toxic outputs. Instruct GPT is trained to follow instructions, it can be susceptible to misuse.
Harmful content can evolve quickly. Our new AI system adapts to tackle it.
Few-Shot Learner (FSL) is a new AI technology that can adapt to take action on new or evolving types of harmful content within weeks instead of months. It works in more than 100 languages, and it also learns from different kinds of data, such as images and text. Unlike previous systems that relied on pattern-matching with labeled data, FSL is pretrained on general language, as well as policy-violating and borderline content.
XLS-R: Self-supervised speech processing for 128 languages
XLS-R is based on wav2vec 2.0, our approach to self-supervised learning of speech representations. It’s based on 436,000 hours of publicly available speech recordings. XLSR-53 was released last year, covering nearly two and a half times more languages than its predecessor. We found that our largest model, containing over 2 billion parameters, performs much better than smaller models, since more parameters are critical.
Data2vec: The first high-performance self-supervised algorithm that works for speech, vision, and text
Meta AI announces data2vec, the first high-performance self-supervised algorithm that works for multiple modalities. Data2vec outperformed the previous best single-purpose algorithms for computer vision and speech. It also does not rely on contrastive learning or reconstructing the input example. It will enable us to develop more adaptable AI, which we believe will be able to perform tasks beyond what today’s systems can do.
Introducing Pathways: A next-generation AI architecture
In 2001, some colleagues at Google realized they could use an obscure technique called machine learning to help correct misspelled Search queries. Today, AI augments many of the things that we do, whether that’s helping you capture a nice selfie, or providing more useful search results. 20 years of advances in research have helped elevate AI from a promising idea to an indispensable aid in billions of daily lives.
A decade in deep learning, and what’s next
In 2001 we used a simpler version of machine learning, statistical ML, to detect spam and suggest better spellings for people’s web searches. But it would be another decade before we had enough computing power to revive a more computationally-intensive machine learning approach called deep learning. Deep learning uses neural networks with multiple layers (thus the “deep”), so it can learn subtler patterns of patterns.
Summarizing Books with Human Feedback
To safely deploy powerful, general-purpose artificial intelligence in the future, we need to ensure that machine learning models act in accordance with human intentions. This challenge has become known as the alignment problem. Our best model is fine-tuned from GPT-3 and generates sensible summaries of entire books. It achieves a 6/7 rating (similar to the average human-written summary) from humans who have read the book.
NeuralProphet: The neural evolution of Meta’s Prophet
Meta AI is releasing NeuralProphet, a scalable and easy to use framework for hybrid forecasting models. It builds on the legacy of Facebook Prophet, the open source forecasting library that we released in 2017. It is built entirely in PyTorch and is highly scalable and extensible, as it is built. The growing scale for data demands new approaches to forecasting time series forecasts.
Introducing Text and Code Embeddings in the OpenAI API
New endpoint uses neural network models, which are descendants of GPT-3, to map text and code to a vector representation. Embeddings are useful for working with natural language and code, because they can be readily consumed and compared by machine learning models and algorithms like clustering or search. The new endpoint in the OpenAI API provides text and Code embeddings with a few lines of code: /embeddings.
2021 Year in Review: Google Quantum AI
Google’s Quantum AI team has had a productive 2021. Despite ongoing global challenges, we’ve made significant progress in our effort to build a fully error-corrected quantum computer. We are now able to reset our qubits with high fidelity, allowing us to reuse qubits in quantum computations. At the same time, we have continued our commitment to realizing the potential of quantum computers in various applications.
Building a conversational parser for on-device voice assistants
Meta AI researchers have pushed the future of conversational voice assistants forward with two new works that significantly reduce latency and provide a framework for on-device processing. Such assistants rely on semantic parsing to convert a user’s request into a structured form, consisting of intents and slots to allow for downstream execution. The request usually needs to go off-device in order to access larger models running on the cloud.
Solving Math Word Problems
We’ve trained a system that solves grade school math problems with nearly twice the accuracy of a fine-tuned GPT-3 model. It solves about 90% as many problems as real kids: a small sample of 9-12 year olds scored 60% on a test from our dataset. This is important because today’s AI is still quite weak at commonsense multistep reasoning, which is easy for grade school kids.
Q&A with machine translation pioneer: The future of MT is multilingual
Philipp Koehn, a Meta AI research scientist, is one of the inventors of the modern method of phrase-based MT. He talks about the latest advances in machine translation (MT) and promising directions on the path toward universal translation. Multilingual models translate multiple language pairs in a single model and are a key evolution because they generalize knowledge across many language pairs, particularly helpful for low-resource languages.
Multimodal Neurons in Artificial Neural Networks
OpenAI announced CLIP, a general-purpose vision system that matches the performance of a ResNet-50, but outperforms existing vision systems. One such neuron, for example, is a “Spider-Man” neuron that responds to images of a spider, an image of the text “spider,” and the comic book character “Spider” either in costume or illustrated.
Qubit the dog on the big questions in quantum computing
This week we sat down for an interview with Google Quantum AI’s Qubit the dog. Julian Kelly: Where do you think we are in the “hype cycle” of quantum computing? We’ve seen a number of exciting firsts — the first demonstration of a. beyond-classical computation of any kind on a quantum computer in 2019, the most impressive simulation on a. quantum computer.
The first-ever multilingual model to win WMT, beating out bilingual models
Machine translation (MT) field needs to solve fundamental limitations in order to make that future a reality. Most MT systems today use groups of bilingual models, which typically require extensive labeled examples for each language pair and task. This approach fails many languages with scarce training data (e.g., Icelandic and Hausa) Its high complexity also makes it impractical to scale to practical applications on Facebook, where billions of people post in hundreds of languages every day.
WebGPT: Improving the factual accuracy of language models through web browsing
We’ve fine-tuned GPT-3 to more accurately answer open-ended questions using a text-based web browser. Our best-performing model produces answers that are preferred 56% of the time to answers written by our human demonstrators. We’re excited about developing more truthful AI, but challenges remain, such as coping with unfamiliar types of questions. The system is trained to answer questions from ELI5, a dataset from the “Explain Like I’m Five” subreddit.