Getting AI to work in a fleshy, messy world is harder than you think

Getty Images

At the warehouses of British online grocery company Ocado Technology, robots, guided by AI, whizz around on rails at speeds of up to four metres per second, picking a 50-item order in minutes. The journeys then taken by Ocado’s delivery trucks are optimised by a neural network that makes more than 14 million last-mile routing calculations per second, and adjusts delivery routes each time a customer places a new order or adds extra items to their shopping lists.

But Ocado’s most ambitious automation efforts involve packing robots. At the time of writing the company has five robotic picking arms powered by computer vision, and other machine-learning systems that can identify the products that need to be packed and use suction power to grab them. Further advances, undertaken in conjunction with two European academic-led projects, are in the pipeline.

Picking and packing aren’t easy if you’re a robot. “From a human’s perspective, it is a fairly simple task to pick and pack, and it doesn’t require an awful lot of training,” says Alex Harvey, chief of advanced technology at Ocado Technology. “For a computer and for a robot, the dexterous manipulation involved is far beyond the state of the art today to be able to pick and pack the full range of items that we do.”

Ocado’s Robotic Suction Pick (RSP) machine is a vacuum cup, powered by an air compressor, that sits at the end of an articulated arm. It uses computer vision and built-in sensors to select items gathered by a bot and place them into a shopping bag. Ocado.com sells a vast range of products, running into the tens of thousands. In terms of outward physical appearance, some are much the same: a tin of chopped tomatoes, for example, is not that different from a tin of lentils. But a tin of chopped tomatoes is very different indeed from a pack of yoghurts, which in turn are more sturdy than, say, a bunch of grapes. And, of course, even grapes aren’t all the same – they vary according to their variety and state of ripeness. Get the pressure of the vacuum cup wrong and the RSP will either drop or crush the item it is attempting to manipulate. Get the sequence wrong and there’s a danger that the tin of tomatoes will squash the grapes.

At present, the company is expanding the number of items its robotic suction system can pick. “There’s no point having for 60,000 different items, 60,000 different control pieces of code,” Harvey says. “What we want at Ocado is generalised control strategies.” But challenges remain. “We need to fit a robot into the same square footage that the person sits in or operates in, and we need the robot system to achieve the same throughput.”

Until a robot can pick and pack as many items in an hour as a human being – Harvey says this is around 600–700 items – it is unlikely to be widely adopted: the impact on productivity would damage service and profits. It also has to be affordable, which means, on the one hand, scaling the technology to the point where it becomes economically worthwhile, and, on the other hand, not over-speccing it (for example, by using a camera with an unnecessarily high resolution). “When we’re deploying stuff in the real world, we want it to be economical in the way that we’re deploying it,” Harvey says. “I don’t want to deploy a supercomputer next to every robot picker.”

Whether or not such AI will ultimately replace humans is the billion-dollar question. Many now believe AI will work alongside humans. Obviously, AI offers the promise of greater efficiency, but so far, at least, this tends to hold good only where the environment is relatively controlled and predictable. Production lines and warehouses may well become fully automated. But where processes interact with the outside world – with all its randomness – it’s harder to envisage a wholly AI future. Delivery drivers, for instance, have to take into account such factors as the weather and the erratic behaviour of some pedestrians. It’s possible that there will be a future without them. It’s also possible, though, that AI will control lorries and delivery trucks on major highways where conditions are relatively predictable, and that human drivers will take over the driving as they reach the outskirts of the villages, towns and cities set as their destinations.

Arguably, it’s the medical sphere where AI’s potential and its limitations have been most apparent – and where its possible future relationship with humans has been most clearly demonstrated. Take one of Google’s computer-vision systems, for example. It’s capable of spotting diabetic retinopathy, a complication of diabetes that can cause sight loss. In lab conditions, it can achieve an accuracy rate of 90 per cent and provide results in ten minutes.

When put to a real-world test in Thailand, however, the deep-learning set-up often struggled. The 11 clinics that operated it used different technology. Only two had a dedicated eye-screen room that could be made dark enough for patients’ pupils to enlarge to the point where high-quality fundus photos could be taken. Most had to make use of nurses’ offices or general-purpose rooms.

Emma Beede, the lead Google Health researcher evaluating the technology, described one room where, because dental checks and patient consultation were going on, it was essential to leave the light on. “That makes sense for them,” she says. “That makes sense when you’re under-resourced.”

Poor internet connection caused problems, too – at one clinic it dropped out altogether for two hours. One way and another, during the first six months of the trial, 21 per cent (393) of the 1,838 images put through the system proved to be of an insufficiently high quality. Nurses and patients felt frustrated. “I’ll do two tries,” one nurse said. “The patients can’t take more than that.”

And then there was the human factor. At one clinic half the patients earmarked for the study declined to be involved after finding out that although the results would be virtually instantaneous (previously some results had taken ten weeks to arrive), a positive diagnosis would see them being referred to a hospital an hour’s drive away. One member of medical staff told the Google researchers that patients “are not concerned with accuracy, but how the experience will be – will it waste my time if I have to go to the hospital?”

There’s little doubt that the technology works: tests in the trial proved accurate when conditions were right. And when things were going well, Beede says, senior nurses felt they had more time to spend with patients and speak to them about their health and lifestyles. But the unpredictability of the everyday world and the needs and concerns of humans cannot be ignored.

“I think the main takeaway from this research is that we need to be designing AI tools with people at the centre,” says Beede. “We need to be considering our success beyond the accuracy of the model. We need to understand how it’s actually going to work for real people in context.” She argues that as more AI is implemented in the real world, more pilot studies will be required to ensure that it works for everyone involved. “That implementation is of equal importance to the accuracy of the algorithm itself, and cannot always be controlled through careful planning,” the Google report concludes.

The need for a proper and carefully considered partnership between humans and AI has been well demonstrated by Finale Doshi-Velez, a Harvard University computer science professor who leads its Data to Actionable Knowledge Lab. In one experiment, she and her colleagues recruited 220 psychiatrists to study case notes for hypothetical patients supposedly suffering from major depressive disorder. Each set of case notes described that particular patient’s condition and was then followed by either an independently verified correct AI-generated recommendation for treatment, an incorrect recommendation or no recommendation at all. Where an AI diagnosis was given, it was accompanied by an explanation for that diagnosis, which varied in length, quality and detail.

What we found was that when the recommendation was correct, overall, everything improved,” Doshi-Velez says. Doctors and AI decisions proved to be a formidable team. “Humans may have already had an idea, the recommendation reinforced it or caused them to change their mind to that idea.” However, when the recommendation presented to the volunteer doctors was incorrect, it tended to lead to poorer forms of decision-making. The psychiatrists were influenced by incorrect recommendations from the AI, leading to lower levels of treatment selection accuracy. This is scarcely a new phenomenon.

It’s long been known that if we put machines in charge of simple tasks, humans will, without continuous training, forget how to do them. Hence, at an everyday level, why digital contact books in phones have caused us not to remember phone numbers any more. Hence, at an extreme level, why on June 1, 2009 the pilots of Air France Flight 447, who had come to rely heavily on the plane’s autopilot features, could not cope when the systems failed, and so presided over a crash in which 228 people perished. With AI this “paradox of automation” is only going to become more pronounced.

The way in which information is presented is also an important factor in determining the success or otherwise of the final outcome. “Certain forms of explanation are more effective at preventing a wrong decision,” Doshi-Velez says. It’s something the Google Health researchers have noticed, too. In their field trial in Thailand, says Google product manager Lily Peng, a system notification that an image was upgradable and that there should be a referral could very easily be misinterpreted. “For some people it means being referred to retinal specialists, which is at the higher end of care, versus refer for human review, which is what ended up being part of the protocol,” she explains.

It’s not just about the machine-learning part, it’s about figuring out how to present the information as well,” Doshi-Velez argues, stressing the point that conclusions reached by an AI system need to make humans think. Such conclusions can’t be too easy to accept or they risk becoming relied upon without that crucial element of critical thinking. By the same token, they can’t be too hard to digest or humans will simply ignore or gloss over them. “Models need to be transparent in their limitations, highlighting situations in which the AI prediction may not be accurate or valid,” the Harvard study concludes.

But when humans and AI are in sync, the potential benefits are huge. The more positive outcomes of the Harvard project, for example, suggest AI could eventually help with diagnosis for a mental health condition that is often missed and for which treatments vary wildly. They start to offer hope for the more than 264 million people around the world who battle with depression.

What, then, does the future hold for AI and the world of work in the medium term? It’s likely to be a mixed bag of results. There have been and will continue to be disappointments and failures as use of the technology expands.

In 2018 IBM had to ditch a multi-million-dollar project designed to help the treatment of cancer patients after it was found to be giving clinicians bad advice. During the coronavirus pandemic Walmart abandoned the use of robots to scan shelves and assess levels of stock when it realised humans were just as effective. An October 2020 study conducted by the MIT Sloan Management Review and the Boston Consulting Group, which surveyed more than 3,000 business leaders running companies with annual revenues above $100 million, discovered that only in ten per cent of cases did people feel that the investment they made in AI produced a “significant” return.

AI will cause disruption, too. “Better-educated, better-paid workers will be the most affected by the new technology, with some exceptions,” research from the Brookings Institution found in November 2019. Those whose jobs currently involve a close focus on data will be particularly vulnerable: market researchers, sales managers, computer programmers and personal finance advisers among them. Those whose jobs involve a lot of interpersonal skills, such as those in education and social care, will probably be less affected: AI is very unlikely to replace human compassion and empathy. That said, in Japan care robots are already used to help the country’s ageing population. And in any case, it’s dangerous to make sweeping generalisations. The fact is that AI adoption will vary around the world according to local culture and social attitudes. Automation in finance in Singapore is likely to be very different from automation in finance in Pakistan.

If Google’s work with computer vision or Harvard’s study with psychiatrists are anything to go by, though, it seems likely that the general trend will be for AI not to replace existing jobs but to transform them – and to create new ones, too. Already, thousands of roles exist that would have been unimaginable at the turn of the century. Scores of people now work on AI labelling, helping to compile datasets that train machine learning. Thousands of individuals have been taken on at companies such as Facebook and YouTube to moderate content that might be breaking their platforms’ rules, which has in many cases been initially flagged by an AI.

Researchers at MIT Sloan and Boston argue that those companies poised to benefit most from AI are those who use it to augment and shape traditional processes rather than replace them. In other words, they create an environment in which humans learn from AI and AI learns from humans. The toolmaker Stanley Black & Decker is one example. It has started using computer vision to check the quality of the tape measures it manufactures. The system flags defects in real time, spotting problems early in the production cycle and so reducing wastage. But humans are still on hand to inspect and make judgement calls on the worst faults.

Experts are key to creating trustworthy AI systems, says Ken Chatfield, the vice-president of research at Tractable, an AI firm that uses computer vision to help make decisions about insurance claims after car crashes – its AI is being used in the real world by some of the biggest insurance companies. The company initially trained its AI on thousands of images of vehicles that had been in accidents – involving damaged door panels, broken windscreens and more.

But it saw the biggest improvements in the system’s performance when the damage highlighted in images had been labelled by specialists, with years of experience in assessing crash reports. And it is human insurance agents who take over once the AI has reviewed images and suggested what the next steps should be. “The data in itself is not enough, and also our knowledge as researchers is not enough – we really need to draw on the knowledge of experts in order to be able to train models,” Chatfield explains. “Involving the expert is also what we need to build up trust”.

The London-based lawyer Richard Robinson, the CEO of Robin AI, has struck a not dissimilar balance in the legal field. He quit his job at a large law firm when he became convinced that many of the repetitive tasks that went into contract work could be automated. “A lot of what I would spend my time doing as a lawyer felt like it didn’t need much brain power,” he explains. His view was that machine learning could be utilised for reviewing some types of contract, such as those concerned with employment conditions. The tasks involved seemed simple enough.

It didn’t, however, turn out that way. “The truth is it was much more difficult than we anticipated,” Robinson says. “There are so many random things that could be in that document, that you can’t be confident that the AI will always identify them.” What he therefore did was to create a system in which AI works with human lawyers rather than instead of them. The company’s system has been trained on historical contracts – both those in the public domain and documents provided by clients – and taught to look for particular elements. It’s therefore able to detect whether a non-compete clause has been sneakily added into a business contract, or whether an employment contract stipulates non-standard working hours.

If it finds anomalies, the system alerts a human lawyer via email and they then check the document. The same thing happens if the system is unable to interpret a particular clause or contract. A recent assignment the company took on was checking contracts between big fast-food retailers and their suppliers during the early months of the coronavirus pandemic, to find out what each party’s obligations were in the event of a crisis.

Robinson’s view is that lawyers find checking contracts tedious. At the same time, it’s dangerous to rely wholly on AI, because even if it’s getting things right 96 per cent of the time, that’s not good enough when companies’ and individuals’ lives and livelihoods are at stake. “We want to use AI to make the first attempt at everything in situations where it’s really easy for a person to check and see if it’s wrong or right,” he says.

However organisations end up using AI, there’s no doubt that as it spreads it will become easier to access and operate. At present most AI deployments involve handcrafted technology. In the future, a company’s AI requirements may be handled by a third party, using software that seems as straightforward as that inside word processors or slideshow builders.

A company that wants to use AI to analyse specific datasets or images will be able to use a template to create this. The algorithm it picks may not have been created by the third-party service they’re buying the template from, but from another company further up the chain of businesses developing and industrialising AI. The technology will become plug-and-play. By that point we may hardly notice its interaction with our daily lives.

Once this happens the world really will change significantly. People’s workplaces will face automation at a greater scale than at any point so far this millennium. For many the entire nature of work may change. How we interact with businesses and government services will also be transformed.

Societies that deploy AI will need to learn how people react to the technology and what their expectations of it are. At the same time, individuals will only follow the directions given by an AI if the system works efficiently, is understandable – and can be trusted.

Matt Burgess is WIRED’s deputy digital editor. He tweets from @mattburgess1

More great stories from WIRED

💊 A dying child, a mother’s love and the drug that changed medicine

😷 Coronavirus vaccines are making some long Covid sufferers feel better

🎧 Upgrading your headphones on a budget? We tested all of Amazon’s cheapest sets

🔊 Listen to The WIRED Podcast, the week in science, technology and culture, delivered every Friday

👉 Follow WIRED on Twitter, Instagram, Facebook and LinkedIn