Enlarge Aurich Lawson / Getty Lots of companies are working to develop self-driving cars. And almost all of them use lidar, a type of sensor that uses lasers to build a three-dimensional map of the world around the car.
But Tesla CEO Elon Musk argues that these companies are making a big mistake.
“They’re all going to dump lidar,” Elon Musk said at an April event showcasing Tesla’s self-driving technology. “Anyone relying on lidar is doomed.”
“Lidar is really a shortcut,” added Tesla AI guru Andrej Karpathy. “It sidesteps the fundamental problems of visual recognition that is necessary for autonomy. It gives a false sense of progress, and is ultimately a crutch.”
In recent weeks I asked a number of experts about these claims. And I encountered a lot of skepticism.
“In a sense all of these sensors are crutches,” argued Greg McGuire, a researcher at MCity, the University of Michigan’s testing ground for autonomous vehicles. “That’s what we build, as engineers, as a society—we build crutches.”
Self-driving cars are going to need to be extremely safe and reliable to be accepted by society, McGuire said. And a key principle for high reliability is redundancy. Any single sensor will fail eventually. Using several different types of sensors makes it less likely that a single sensor’s failure will lead to disaster.
“Once you get out into the real world, and get beyond ideal conditions, there’s so much variability,” argues industry analyst (and former automotive engineer) Sam Abuelsamid. “It’s theoretically possible that you can do it with cameras alone, but to really have the confidence that the system is seeing what it thinks it’s seeing, it’s better to have other orthogonal sensing modes”—sensing modes like lidar. Camera-only algorithms can work surprisingly well
On April 22, the same day Tesla held its autonomy event, a trio of Cornell researchers published a research paper that offered some support for Musk’s claims about lidar. Using nothing but stereo cameras, the computer scientists achieved breakthrough results on KITTI, a popular image recognition benchmark for self-driving systems. Their new technique produced results far superior to previously published camera-only results—and not far behind results that combined camera and lidar data.
Unfortunately, media coverage of the Cornell paper created confusion about what the researchers had actually found. Gizmodo’s writeup, for example, suggested the paper was about where cameras are mounted on a vehicle—a topic that wasn’t even mentioned in the paper. (Gizmodo re-wrote the article after researchers contacted them.)
To understand what the paper actually showed, we need a bit of background about how software converts raw camera images into a labeled three-dimensional model of a car’s surroundings. In the KITTI benchmark, an algorithm is considered a success if it can accurately place a three-dimensional bounding box around each object in a scene.
Software typically tackles this problem in two steps. First, the images are run through an algorithm that assigns a distance estimate to each pixel. This can be done using a pair of cameras and the parallax effect. Researchers have also developed techniques to estimate pixel distances using a single camera. In either case, a second algorithm uses depth estimates to group pixels together into discrete objects, like cars, pedestrians, or cyclists. Further Reading
How computers got shockingly good at recognizing images
The Cornell computer scientists focused on this second step. Most other researchers working on camera-only approaches have represented the pixel data as a two-dimensional image, with distance as an additional value for each pixel alongside red, green, and blue. Researchers would then typically run these two-dimensional images through a convolutional neural network (see our in-depth explainer here ) that has been trained for the task.
But the Cornell team realized that using a two-dimensional representation was counterproductive because pixels that are close together in a two-dimensional image might be far apart in three-dimensional space. A vehicle in the foreground, for example, might appear directly in front of a tree that’s dozens of meters away.
So the Cornell researchers converted the pixels from each stereo image pair into the type of three-dimensional point cloud that is generated natively by lidar sensors. The researchers then fed this “pseudo-lidar” data into existing object recognition algorithms that are designed to take a lidar point cloud as an input. “You could close the gap significantly”
Enlarge Aurich / Getty “Our approach achieves impressive improvements over the existing state-of-the-art in image-based performance,” they wrote. In one version of the KITTI benchmark (“hard” 3-D detection with an IoU of 0.5), for example, the previous best result for camera-only data was an accuracy of 30%. The Cornell team managed to boost this to 66%.
In other words, one reason that cameras plus lidar performed better than cameras alone had nothing to do with the superior accuracy of lidar’s distance measurements. Rather, it was because the “native” data format produced by lidar happened to be easier for machine-learning algorithms to work with.
“What we showed in our paper is you could close the gap significantly” by converting camera-based data into a lidar-style point cloud, said Kilian Weinberger, a co-author of the Cornell paper, in a phone interview.
Still, Weinberger acknowledged, “there’s still a fair margin between lidar and non-lidar.” We mentioned before that the Cornell team achieved 66% accuracy on one version of the KITTI benchmark. Using the same algorithm on actual lidar point cloud data produced an accuracy of 86%. “The depth might be completely off”
Enlarge Aurich Lawson / Tesla Comparing average accuracy rates between camera-only and lidar-based systems may understate the advantages of having actual lidar sensors on a self-driving vehicle.
Lidar measures distances by sending out a laser beam and measuring how long it takes to bounce back. The straightforwardness of this approach means it’s likely to work in a wide range of situations. That’s not necessarily true of camera-based techniques.
For example, one of the distance estimation algorithms used in the Cornell paper, developed by two researchers at Taiwan’s National Chiao Tung University, relied on a pair of cameras and the parallax effect. It compared two images taken from different angles and observed how objects’ positions differ between the image—the larger the shift, the closer an object is.
This technique only works if the software correctly matches a pixel in one image with the corresponding image in the other image. If the software gets this wrong, then distance estimates can be wildly off.
It’s not a trivial problem. In their 2018 paper , the National Chiao Tung University researchers explained that it is “difficult to find accurate corresponding points in inherently ill-posed regions such as occlusion areas, repeated patterns, textureless regions, and reflective surfaces.”
Their solution uses a convolutional network trained on example images where actual distances are known. This approach works well much of the time. But Bharath Hariharan, another co-author of the Cornell paper, acknowledged to Ars that it can be brittle.
“If the system is trained in one kind of environment, if it goes into another kind of environment, the matching might be wrong, and the depth might be completely off,” he said. If the software fails to match up pixels correctly, it could dramatically mis-estimate the distance to certain objects, or even fail to recognize an object at all. In other words, a network that works flawlessly in a familiar setting might fail catastrophically in an unfamiliar one.
This isn’t such a concern for lidar. If there’s a large object 20 meters in front of a car, lidar will pick it up. Downstream software may or may not be able to determine what the object is. But it will know that something is in the car’s path.
“Lidar is a great sensor because it gives you a range of things directly,” McGuire said. “You don’t have to do any processing to get that data.” Cameras and lidar have complementary abilities
Enlarge At the same time, Hariharan emphasized that lidar isn’t perfect, either. Lidar point clouds can have phantom points due to reflections, for example. Both camera-only systems and systems with lidar rely on machine-learning techniques to help recognize and filter out one-off mistakes in the underlying systems.
“If you train the system in the streets of Arizona and then you come to Ithaca and see all the snow, then errors might happen that are hard to predict,” Hariharan said. He argues that both systems with lidar and those without it are prone to this kind of mistake, though “the error modes might be different.”
But a key thing to note is that the failure modes of lidar- and camera-based techniques are different . Many situations that trip up camera-based techniques probably won’t fool lidar. So even if camera-based techniques continue to improve, lidar will continue to add value for quite a while.
“In general, combining multiple sensors is probably beneficial,” Hariharan told Ars. “The more information you have and the more different ways you have of estimating depth, the better.” Data from the fleet is a big advantage for Tesla
Enlarge / A Tesla Model 3.Smith Collection/Gado/Getty Images It seems clear that if you have an unlimited budget, it’s better to have a self-driving car with lidar than without it. In the real world, of course, companies don’t have unlimited budgets. And top-of-the-line lidar sensors currently cost tens of thousands of dollars. At those prices, Tesla can’t afford to put lidar on its cars, so Tesla has a powerful incentive to try to make full self-driving work without lidar.
At the same time, Tesla enjoys an important advantage over other car companies: access to massive amounts of data. Tesla has convinced customers to allow the company to run experimental software and collect data from their cars as they drive around. There are now hundreds of thousands of Tesla vehicles on the road, which gives Tesla access to vast amounts of real-world driving data—likely more than all of its competitors put together.
And that’s significant because large amounts of data is valuable for training neural networks. Over the last decade, researchers have demonstrated better and better performance using deeper neural networks. But these deeper neural networks depend on ever-larger amounts of training data.
And this is particularly significant for self-driving cars because a major concern in the industry is about “edge cases”: unusual situations that haven’t been encountered before and that cause the car to malfunction. Tesla doesn’t just have access to vast numbers of raw miles, it also has the ability to query its vehicles for unusual or interesting situations. Data from those situations can be sent back to Tesla headquarters and incorporated into the company’s training sets. Tesla backed itself into a corner by over-promising
Enlarge VCG/VCG via Getty Images This points to one possible defense of Musk’s claims about lidar. Skipping lidar allowed Tesla to get Autopilot technology in the hands of customers much more quickly than would have been possible if they’d waited until they could afford to include a lidar sensor in every car. Shipping a lidar-less version of Autopilot has given Tesla access to a massive amount of data—data that could prove to be a big advantage in the race to full self-driving.
This is true as far as it goes, but it doesn’t follow that (as Musk put it) “anyone relying on lidar is doomed.”
Nobody forced Tesla to promise that the cars it released starting in October 2016 would eventually be capable of full autonomy. The company could have simply told customers that the new cameras were there to enable the current version of Autopilot, Tesla’s driver-assistance technology.
In that scenario, Tesla would have still been able to collect data to improve its self-driving algorithms. It still would have had the option to offer a full self-driving package if its software improved enough to enable it. But it would have also left itself the option to decide that those vehicles’ hardware wasn’t sufficient for full autonomy without angering customers.
Instead, Tesla didn’t just promise customers that its cars were capable of full autonomy, it started letting customers pay for a “full self-driving” package. And having made that promise, Tesla is under a lot of pressure to deliver on it.
But if Tesla ultimately succeeds, it won’t be because it’s easier to achieve full autonomy without lidar with it. It will simply be because Tesla began large-scale data collection from cameras long before other carmakers.
In short, the fact that Tesla backed itself into a corner by promising customers full autonomy without lidar doesn’t prove that other companies won’t find lidar helpful to their own self-driving efforts. Lidar won’t always be expensive
Enlarge / A lidar sensor from Ouster, one of the many startups working to bring down lidar costs. It’s also important to remember that the high cost of lidar is likely to be temporary. New technologies almost always start out expensive, but as manufacturing scales up, prices tend to drop. The current high cost of lidar largely reflects the fact that they’re being sold one unit at a time. As a result, high research and development costs need to be spread across a small number of units.
But in the next few years, we can expect to see carmakers placing orders in the thousands and eventually millions of units. And we can expect that to lead to a big decline in the cost of lidar. Most experts I’ve talked to expect that in the long run, lidar sensors won’t cost much more than $100—and possibly even less than that.
“I can’t see how it won’t be a commodity sensor like DVD players,” Michigan’s McGuire told me. DVD players, he noted, “used to cost $1,000.” Now you can get one for less than $50. You can make similar points about a number of formerly high-tech automotive components, from anti-lock brakes to the radar used in adaptive cruise control systems today. The electronics industry “has a really strong track record of taking a complex mechanical system like that and turning it into solid-state electronics,” McGuire said.
Of course, in the long run we might also see machine-learning techniques to the point where camera-only self-driving systems are viable. After all, human beings drive pretty effectively with the pair of cameras we have in our heads. But Angus Pacala, the CEO of lidar startup Ouster, draws a parallel here to the airline industry, which has steadily improved its safety record over the decades.
Pacala expects the same thing to happen for self-driving cars. Even after autonomous vehicles demonstrate a better safety record than the average human driver, people will still want to push the crash rate down.
“Safety will never stop being pushed,” Pacala said. So “even if lidar isn’t necessary to get to human levels of driving safety,” companies will still want to include it to provide an extra margin of safety. “There’s a wealth of evidence that lidar will be a way of making a significant improvement above a camera.” So as lidar gets cheaper over time, why wouldn’t carmakers use it?