About the Talk: Risk-averse MDPs have optimal policies that achieve high returns with low variability, but these MDPs are often difficult to solve. Only a few risk-averse objectives admit a dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. We derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation for ERM, which is possible because of our novel definition of value function with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. We then use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results show the viability of our formulations and algorithms in discounted MDPs. Finally, we use our results to propose a new framework to jointly model the risk associated with randomness in dynamics (aleatory) and in model uncertainty (epistemic) in MDPs.
About the Speaker: Mohammad Ghavamzadeh received a Ph.D. degree from UMass Amherst in 2005. He was a postdoctoral fellow at UAlberta from 2005 to 2008. He was a permanent researcher at INRIA, France from 2008 to 2013. He was the recipient of the “INRIA award for scientific excellence” in 2011, and obtained his Habilitation in 2014. Since 2013, he has been a research scientist at Adobe, FAIR, Google, and now Amazon. He has published over 120 refereed papers in major machine learning, AI, and control journals and conferences. He has co-chaired more than 10 workshops and tutorials at NeurIPS, ICML, and AAAI. His research has been mainly focused on the areas of reinforcement learning, bandit algorithms, and recommendation systems. Over the last two years, he has also been working on the problem of alignment in generative AI models.
GIPHY App Key not set. Please check settings