Robotaxi, fire and theft prevention "construction site". Waymo fell.
The North American regulatory authorities have just opened an investigation into Waymo unmanned vehicles, because they received a series of related accident reports.
Some hit parked cars on the side of the road, some hit static obstacles, blocked traffic … and an unusual high-frequency scene:construction site.
In about three months, Waymo accumulated and reported 22 accidents, which attracted the attention of the National Highway Traffic Safety Administration.
According to NHTSA documents, these accidents include collisions between Waymo unmanned vehicles and static and semi-static objects (such as doors), collisions with parked vehicles and violations of traffic safety control devices.
Among them, the official special explanation of "violation of traffic safety control devices" is a key investigation direction, and a typical scene isAbility of automatic driving system to detect and identify traffic cones/ice cream cones.
This point is rare. Because many of the 22 accidents reported this time involve the same scene- construction site.
For example, last month,Waymo Robotaxi with 6 vehicles in formation.After work, I went back to the parking lot. As a result, I encountered temporary traffic control at the construction site and was directly stuck in the temporary traffic area surrounded by ice cream cones, causing traffic congestion for about half an hour.
Netizens with local life experience immediately recognized that this is the ramp at No.101 Potrero Avenue in San Francisco, where the Waymo unmanned vehicle was stuck, which happened to be the high-speed entrance.
Finally, it was the driver’s brother on the road who directly got off the driver to move the ice cream cone, and the team behind him bypassed several "paralyzed" unmanned vehicles in turn.
Waymo made a simple statement to the effect that people were sent to the scene to move the car within 30 minutes, without causing any casualties or property losses, and will cooperate with the investigation later.
But in the construction site of Phoenix, it is not so lucky.
The same Waymo unmanned vehicle ignored the construction area surrounded by the ice cream cone and rushed directly into the construction site.
Fortunately, the speed was not fast enough to hit people, but the vehicle itself and the construction site suffered losses to varying degrees.
There are many accidents like this. Every time an unmanned vehicle "rushes into the construction site", it will go viral on the Internet.
The netizen summed it up vividly:The traffic cone is Robotaxi’s kryptonite.Now, the miraculous automatic driving will be finished if it meets the traffic cone on the closed road.
Huh? It doesn’t seem to be the same as the video officially shown by Waymo.
Waymo The fifth generation autopilot system bypasses the construction area.It has been specifically analyzed as a technical highlight.
In the official Demo, the scene faced by unmanned vehicles is more complicated, except for traffic cones, irregular areas and workers walking back and forth.
Of course, the Waymo unmanned vehicle completed a series of evasive and detour actions effortlessly and successfully passed the construction area:
What is amazing here is that the Waymo unmanned vehicle seems to be able to understand the body language of human traffic command, let it stop and let it go, not just based on road conditions.
How did you do that? Maya Kabkab, the engineer in charge of the prediction algorithm of Waymo, briefly explained to the effect that in the fifth generation technology, Waymo strengthened the ability to understand different objects and targets and the ability to identify passable areas, which enabled the system to plan the passable route better.
The core is to replace CNN with a new model, VectorNet, to extract sensor and high-precision map information.
To put it simply, high-precision maps and sensor input information are represented as points, polygons or curves, while VectorNet represents all road features and trajectories of other objects as corresponding vectors. Based on this simplified view, VectorNet can extract the information of each vector and learn the relationship between different vectors.
The advantage is that VectorNet takes up less computing resources than CNN, produces results faster, and theoretically can extract key scene information more clearly.
But VectorNet still has no core to solve the "construction site" problem-
The "construction site" itself is an exception to high-precision maps.It is impossible to update synchronously, and it can only be sensed by sensors in real time.
However, the data of sensors are transmitted among different sub-models in turn, so it is difficult to completely avoid information loss.
Robotaxi is frequently stuck in the construction site, and the direct reason is the false detection of traffic cones and alien objects.
The underlying reason is thatThe upper limit and ceiling of the existing capacity of the traditional autonomous driving technology paradigmIt is difficult to cover all corner case on the road.
Therefore, whether the construction site can be avoided smoothly has become a probabilistic event: the official Demo has been carefully tested and repeated, which is no problem; If you only measure it on the road, you can only eat according to the weather.
"Never decide quantum mechanics when something happens" is a ridicule.
However, in the field of autonomous driving, if something is undecided, it is indeed possible to "end-to-end".
The so-called "end-to-end" is aimed at the traditional technology paradigm, in which the perception, decision-making, regulation and control of autonomous driving are independent of each other. The data collected by the sensor needs to pass through this series of different algorithm modules before it can finally become an operation instruction.
The information between each independent module is transmitted step by step. In this process, there will inevitably be information loss and errors, and the errors of the previous module will affect the next one, and the information errors between multiple modules will continue to accumulate, thus affecting the overall effect of the automatic driving scheme.
Whether it is pure visual perception or fusion perception, the root of "false detection and missed detection" is here.
Of course, there is also a corresponding solution, that is, through the rules of human handwriting, try to patch it to improve the reliability of perceptual recognition. For example, you can recognize cars and people, but you can’t recognize "people standing in front of cars". That’s easy. Why don’t you just build a separate data set for such targets and use it to train the model?
This is the so-called perceptual "white list" mechanism.
But the problem is that it is difficult to enumerate all kinds of traffic targets and scenes. This time, the problem of "people in front of the car" has been solved, but what if the car changes from a passenger car to a big truck? Or does a person become an adult and take care of children?
The same is true for Robotaxi’s construction site problem. The construction site may appear temporarily and refresh randomly, and it will not limit the area and time, and the construction and construction roadblocks of each construction site are different …
Therefore, a new algorithm paradigm-end-to-end algorithm model is needed to realize the lossless transmission of information from the beginning of perception and make the system truly understand the environment.
The two terminals refer to the data input terminal and the instruction output terminal respectively, and the middle is no longer divided into several independent modules.
The end-to-end model can transfer and generalize its learned abilities and skills to other scenes through a completely data-driven mode, independently and efficiently solve all kinds of new long tail problems in the berthing scene, have faster iterative efficiency and effectively reduce the cost of opening a city.
In layman’s terms, it is to let AI Division learn the mature driving behavior of human beings, see a scene and make corresponding countermeasures. In fact, "end-to-end" has touched the threshold of AGI.
In 2016, the end-to-end model was first proposed by NVIDIA. But the real mass production practice began in these two years. At present, only Tesla’s FSD and China’s AI players’ CVPR 2023 best papers-UniAD.
The Smart Car Reference also asked the opinions of the two domestic self-driving head players about the construction site problems encountered by Waymo.
Horizon from the perspective of engineering practice, think:
AutopilotSite problems and end-to-end technology paradigms are not bound.. Theoretically speaking, the problem can also be solved if the perception ability is strong enough and the perception white list is rich enough.
But obviously, end-to-end autonomous learning ability and humanoid thinking will solve this problem on a larger scale and more efficiently.
And Shang Tang’s view is more from the "first principle", and the relevant technical experts of Jueying Zhijia think:
Do not evaluate specific case. However, the perception of the traditional rule-based intelligent driving scheme still artificially defines the elements and abstracts the perceived information, which will lead to the loss and omission of the information transmission process and make it difficult for the perceptual decision-making module to make correct decisions. End-to-end is a neural network, which can input and transmit the information of the external environment without loss, understand the external traffic environment more accurately and completely, and make plans and decisions.
The rule scheme can solve a scenario by adding rules and patches. But there is not only one such scene, it is infinite. With enough data for learning and training, the end-to-end solution can think and drive like a human and solve more similar corner case by itself.
To sum up, Horizon and Shang Tang have different expressions, but the core is the same, and they all agree.End-to-end is the most effective way to solve Robotaxi site problems.. At the same time, it is the most efficient way to solve all kinds of long tail problems of autonomous driving.
By the way, I put forward UniAD’s CVPR 2023 best paper, which was written by scholars from Horizon and Shang Tang.
End-to-end innovation of traditional technology paradigm has given all players new opportunities: better intelligent driving experience, lower maintenance and generalization costs, and more competitive cost of autonomous driving scheme.
However, the price is the modular and rule-driven technical system in the past, which must be overthrown and reconstructed.
Waymo, the absolute leader of autonomous driving in the past, is now caught in the dilemma of "construction site", which further proves that the track of autonomous driving is "water is impermanent and soldiers are unpredictable":
Old stars may have their advantages reset to zero, and "latecomers" will also gain the lead.