About the Talk “Learning Rich and Generalizable Representation for Visual Motor Control”: Visual representation learning has achieved tremendous success in semantic and geometric understanding. But how can this knowledge help embodied agents and robots interact with the physical world and operate in the wild? In this talk, I will introduce our studies on learning rich and generalizable visual representations and applying them for visual motor control. On the vision side, I will discuss our work on learning open vocabulary semantic representations with only text supervision, and 3D scene representations with self-supervision. On the robotics side, I will introduce a novel algorithm that combines model-based and model-free RL, and utilizes the rich representations to help sim2real generalization. Beyond RL, I will also talk about imitation learning using human demonstrations from the 3D understanding of human-object interactions. We apply our methods across different real-world robotics tasks including robot arm manipulation, dexterous hand manipulation, and visual locomotion control with the legged robot.
About the Speaker: Xiaolong Wang is an Assistant Professor in the ECE department at the University of California, San Diego. He is affiliated with the CSE department, Center for Visual Computing, Contextual Robotics Institute, and the TILOS NSF AI Institute. He received his Ph.D. in Robotics at Carnegie Mellon University. His postdoctoral training was at the University of California, Berkeley. His research focuses on the intersection between computer vision and robotics. He is particularly interested in learning visual representation from videos in a self-supervised manner and uses this representation to guide robots to learn. Xiaolong is the Area Chair of CVPR, ICCV, and ECCV. He has co-organized multiple workshops and tutorials in CVPR, ICCV, ECCV, and ICLR. He is the recipient of Facebook Fellowship, Nvidia Fellowship, and Sony Research Award.
