Chatbots were all the craze last decade, and they’re now commonly found when interacting with support lines. Today, conversational agents are a bit limited, and Google is working towards a human-like chatbot “that can chat about anything.”
The Google Brain research team today detailed “Meena.” This end-to-end trained neural conversational model tries to correct the “critical flaw” of current highly specialized chatbots:
They sometimes say things that are inconsistent with what has been said so far, or lack common sense and basic knowledge about the world. Moreover, chatbots often give responses that are not specific to the current context.
Google’s Meena model focuses on understanding the context of a conversation to provide a sensible reply. The goal is to create something that can “chat about virtually anything a user wants.” For example, two conversations show users asking Meena for show recommendations, while another sees it reply with jokes.
It’s trained on 341GB of text from public domain social media conversations, which is 8.5x more data than existing state-of-the-art generative models.
In working towards a realistic model, Google created a new quality benchmark for chatbots. The Sensibleness and Specificity Average (SSA) “captures basic, but important attributes for natural conversations.” Humans evaluators are asked to judge if a response is “reasonable in context.”
If anything seems off — confusing, illogical, out of context, or factually wrong — then it should be rated as, “does not make sense”. If the response makes sense, the utterance is then assessed to determine if it is specific to the given context.
For example, if A says, “I love tennis,” and B responds, “That’s nice,” then the utterance should be marked, “not specific”. That reply could be used in dozens of different contexts. But if B responds, “Me too, I can’t get enough of Roger Federer!” then it is marked as “specific”, since it relates closely to what is being discussed.
On this Google-created benchmark, Meena does better than existing models, and is “closing the gap with human performance.” Practical applications for human-like chatbots include humanizing how people use computers and making relatable interactive movie or game characters.
Moving forward, Google wants to look beyond its focus on sensibleness and specificity for its human-like chatbot to tackle personality and factuality. Safety and bias is another important area, with the company not releasing a research demo today as a result.
We are evaluating the risks and benefits associated with externalizing the model checkpoint, however, and may choose to make it available in the coming months to help advance research in this area.