The Illusion of Understanding: MIT Unmasks the Myth of AI’s Formal Specifications

Advertisement
AI Human Interpretation Art Concept

A study by MIT Lincoln Laboratory suggests that formal specifications, despite their mathematical precision, are not necessarily interpretable to humans. Participants struggled to validate AI behaviors using these specifications, indicating a discrepancy between theoretical claims and practical understanding. The findings highlight the need for more realistic assessments of AI interpretability.

Some researchers see formal specifications as a way for autonomous systems to “explain themselves” to humans. But a new study finds that we aren’t understanding.

As autonomous systems and artificial intelligence become increasingly common in daily life, new methods are emerging to help humans check that these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural-language expressions. Some researchers claim that this method can be used to spell out decisions an AI will make in a way that is interpretable to humans.

Research Findings on Interpretability

<span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

MIT
MIT is an acronym for the Massachusetts Institute of Technology. It is a prestigious private research university in Cambridge, Massachusetts that was founded in 1861. It is organized into five Schools: architecture and planning; engineering; humanities, arts, and social sciences; management; and science. MIT's impact includes many scientific breakthroughs and technological advances. Their stated goal is to make a better world through education, research, and innovation.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]”>MIT Lincoln Laboratory researchers wanted to check such claims of interpretability. Their findings point to the opposite: Formal specifications do not seem to be interpretable by humans. In the team’s study, participants were asked to check whether an AI agent’s plan would succeed in a virtual game. Presented with the formal specification of the plan, the participants were correct less than half of the time.

Humans Struggle To Understand Outputs of Formal Specifications
Advertisement

A study finds humans struggle to understand the outputs of formal specifications, a method that some researchers claim can be used to make AI decision-making interpretable to humans. Credit: Bryan Mastergeorge

“The results are bad news for researchers who have been claiming that formal methods lent interpretability to systems. It might be true in some restricted and abstract sense, but not for anything close to practical system validation,” says Hosea Siu, a researcher in the laboratory’s AI Technology Group. The group’s paper was accepted to the 2023 International Conference on Intelligent Robots and Systems held earlier this month.

The Importance of Interpretability

Interpretability is important because it allows humans to place trust in a machine when used in the real world. If a robot or AI can explain its actions, then humans can decide whether it needs adjustments or can be trusted to make fair decisions. An interpretable system also enables the users of technology — not just the developers — to understand and trust its capabilities. However, interpretability has long been a challenge in the field of AI and autonomy. The <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

machine learning
Machine learning is a subset of artificial intelligence (AI) that deals with the development of algorithms and statistical models that enable computers to learn from data and make predictions or decisions without being explicitly programmed to do so. Machine learning is used to identify patterns in data, classify data into different categories, or make predictions about future events. It can be categorized into three main types of learning: supervised, unsupervised and reinforcement learning.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]”>machine learning process happens in a “black box,” so model developers often can’t explain why or how a system came to a certain decision.

“When researchers say ‘our machine learning system is accurate,’ we ask ‘how accurate?’ and ‘using what data?’ and if that information isn’t provided, we reject the claim. We haven’t been doing that much when researchers say ‘our machine learning system is interpretable,’ and we need to start holding those claims up to more scrutiny,” Siu says.

The Challenge of Translating Specifications

For their experiment, the researchers sought to determine whether formal specifications made the behavior of a system more interpretable. They focused on people’s ability to use such specifications to validate a system — that is, to understand whether the system always met the user’s goals.

Applying formal specifications for this purpose is essentially a by-product of its original use. Formal specifications are part of a broader set of formal methods that use logical expressions as a mathematical framework to describe the behavior of a model. Because the model is built on a logical flow, engineers can use “model checkers” to mathematically prove facts about the system, including when it is or isn’t possible for the system to complete a task. Now, researchers are trying to use this same framework as a translational tool for humans.

“Researchers confuse the fact that formal specifications have precise semantics with them being interpretable to humans. These are not the same thing,” Siu says. “We realized that next-to-nobody was checking to see if people actually understood the outputs.”

In the team’s experiment, participants were asked to validate a fairly simple set of behaviors with a robot playing a game of capture the flag, basically answering the question “If the robot follows these rules exactly, does it always win?”

Participants included both experts and nonexperts in formal methods. They received the formal specifications in three ways — a “raw” logical formula, the formula translated into words closer to natural language, and a decision-tree format. Decision trees in particular are often considered in the AI world to be a human-interpretable way to show AI or robot decision-making.

The results: “Validation performance on the whole was quite terrible, with around 45 percent <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

accuracy
How close the measured value conforms to the correct value.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]”>accuracy, regardless of the presentation type,” Siu says.

Overconfidence and Misinterpretation

Those previously trained in formal specifications only did slightly better than novices. However, the experts reported far more confidence in their answers, regardless of whether they were correct or not. Across the board, people tended to over-trust the correctness of specifications put in front of them, meaning that they ignored rule sets allowing for game losses. This confirmation bias is particularly concerning for system validation, the researchers say, because people are more likely to overlook failure modes. 

“We don’t think that this result means we should abandon formal specifications as a way to explain system behaviors to people. But we do think that a lot more work needs to go into the design of how they are presented to people and into the workflow in which people use them,” Siu adds.

When considering why the results were so poor, Siu recognizes that even people who work on formal methods aren’t quite trained to check specifications as the experiment asked them to. And, thinking through all the possible outcomes of a set of rules is difficult. Even so, the rule sets shown to participants were short, equivalent to no more than a paragraph of text, “much shorter than anything you’d encounter in any real system,” Siu says.

The team isn’t attempting to tie their results directly to the performance of humans in real-world robot validation. Instead, they aim to use the results as a starting point to consider what the formal logic community may be missing when claiming interpretability, and how such claims may play out in the real world.

Future Implications and Research

This research was conducted as part of a larger project Siu and teammates are working on to improve the relationship between robots and human operators, especially those in the military. The process of programming robotics can often leave operators out of the loop. With a similar goal of improving interpretability and trust, the project is trying to allow operators to teach tasks to robots directly, in ways that are similar to training humans. Such a process could improve both the operator’s confidence in the robot and the robot’s adaptability.

Ultimately, they hope the results of this study and their ongoing research can better the application of autonomy, as it becomes more embedded in human life and decision-making.

“Our results push for the need to do human evaluations of certain systems and concepts of autonomy and AI before too many claims are made about their utility with humans,” Siu adds.

Reference: “STL: Surprisingly Tricky Logic (for System Validation)” by Ho Chit Siu, Kevin Leahy and Makai Mann, 26 May 2023, Computer Science > Artificial Intelligence.
arXiv:2305.17258

Advertisement