In most AI domains, like image search, “off-the-shelf” standard supervised learning methods are successful in training machine learning models. However, these methods fall short in training robots, like self-driving cars, because a self-driving car, the “learner,” relies on a stream of continuous, time-series data to make decisions and take actions in the physical world. Those actions change the future inputs to the learner by changing not only the vehicle’s own location and velocity, but also how other vehicles in the scene respond.
Through this lens, off-the-shelf supervised machine learning methods aren’t enough for learning decision-making because “mistakes” often have unintended consequences that feed-back into other “mistaken” actions. For example, if a self-driving vehicle mistakenly decides to conduct a lane change, it will start to move laterally. The vehicle’s new position partway into the other lane could then reinforce the decision to continue with the lane change, even though it was originally undesirable. This problem, sometimes called the “feedback effect”, has been well documented in the AI and robotics community.
Recent literature in the field (de Haan et al., 2019) proposes that the Feedback effect stems from “casual confusion” via a “causal confound” (Pearl et al., 2016). The implication here is that the casual structure is being obscured by correlated inputs to the learning algorithm, i.e, the learner is biased by over-indexing on correlated features. However, in this talk, lead engineers from Aurora’s Motion Planning team, Arun Venkatraman and Sanjiban Choudhury, observe that the cited examples do not exhibit casual confounding in the statistical sense. Instead, they posit that the observed issues are fundamentally a result of distributional shift in the features as a result of feedback.
Arun and Sanjiban introduce ALICE, an algorithmic framework that leverages a simulation engine to measure and counter this Feedback Effect. They go on to show preliminary results of ALICE on a prototypical controls problem and discuss the spectrum of Feedback problems and the difficulty in solving them across a variety of different practical setups.