CS422: Interactive and Embodied Learning

Readings

Required readings in bold; others supplementary. The supplementary lists can get quite long -- this is not meant to be imposing! We simply think that each represents a handy constellation of papers for those interested in getting more acquainted with each subarea. We will provide a roadmap as we go with overviews in class. Feel free to ask for more context at any time!

Framing and basic computational considerations

Class 1: Conceptual framing, reading and project organization

Class 2: Early attempts: world models, intrinsic motivations, behavioral & performance metrics

Friston, Karl. "The free-energy principle: a unified brain theory?." Nature reviews neuroscience 11, no. 2 (2010): 127-138.
Gottlieb, Jacqueline, Pierre-Yves Oudeyer, Manuel Lopes, and Adrien Baranes. "Information-seeking, curiosity, and attention: computational and neural mechanisms." Trends in cognitive sciences 17, no. 11 (2013): 585-593.
Oudeyer, Pierre-Yves, Frdric Kaplan, and Verena V. Hafner. "Intrinsic motivation systems for autonomous mental development." IEEE transactions on evolutionary computation 11, no. 2 (2007): 265-286.
Schmidhuber, Jürgen. "Formal theory of creativity, fun, and intrinsic motivation (1990–2010)." IEEE Transactions on Autonomous Mental Development 2, no. 3 (2010): 230-247.
Smith, Linda B., and Lauren K. Slone. "A developmental approach to machine learning?." Frontiers in psychology 8 (2017): 2124.
Dweck, Carol S. "From needs to goals and representations: Foundations for a unified theory of motivation, personality, and development." Psychological review 124, no. 6 (2017): 689.
Spelke, Elizabeth S., and Katherine D. Kinzler. "Core knowledge." Developmental science 10, no. 1 (2007): 89-96.

Learning about environment and embodiment through interaction (3 weeks)

Class 3: First deep RL self-supervised intrinsic motivation methods, failure modes of intrinsic motivation methods (e.g. “white noise problem”), evaluating benchmarks

Achiam, Joshua, and Shankar Sastry. "Surprise-based intrinsic motivation for deep reinforcement learning." arXiv preprint arXiv:1703.01732 (2017).
Burda, Yuri, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A. Efros. "Large-scale study of curiosity-driven learning." arXiv preprint arXiv:1808.04355 (2018).
Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. "Diversity is all you need: Learning skills without a reward function." arXiv preprint arXiv:1802.06070 (2018).
Houthooft, Rein, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. "Vime: Variational information maximizing exploration." In Advances in Neural Information Processing Systems, pp. 1109-1117. 2016.
Pathak, Deepak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. "Curiosity-driven exploration by self-supervised prediction." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16-17. 2017.

Class 4: Successive deep RL intrinsic motivation methods, MuJoCo & real robotics benchmarks, continued analysis of failure modes

Berseth, Glen, Daniel Geng, Coline Devin, Chelsea Finn, Dinesh Jayaraman, and Sergey Levine. "SMiRL: Surprise Minimizing RL in Dynamic Environments." arXiv preprint arXiv:1912.05510 (2019).
Burda, Yuri, Harrison Edwards, Amos Storkey, and Oleg Klimov. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).
Pathak, Deepak, Dhiraj Gandhi, and Abhinav Gupta. "Self-supervised exploration via disagreement." arXiv preprint arXiv:1906.04161 (2019).
Raileanu, Roberta, and Tim Rocktäschel. "RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments." arXiv preprint arXiv:2002.12292 (2020).

Class 5: Learning world models, planning with world models, and pitfalls of model-based approaches

Gelada, Carles, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. "Deepmdp: Learning continuous latent space models for representation learning." arXiv preprint arXiv:1906.02736 (2019).
Ha, David, and Jürgen Schmidhuber. "World models." arXiv preprint arXiv:1803.10122 (2018).
Hafner, Danijar, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. "Dream to control: Learning behaviors by latent imagination." arXiv preprint arXiv:1912.01603 (2019).
Hafner, Danijar, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. "Mastering atari with discrete world models." arXiv preprint arXiv:2010.02193 (2020).
Kaiser, Lukasz, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan et al. "Model-based reinforcement learning for atari." arXiv preprint arXiv:1903.00374 (2019).
Srinivas, Aravind, Michael Laskin, and Pieter Abbeel. "Curl: Contrastive unsupervised representations for reinforcement learning." arXiv preprint arXiv:2004.04136 (2020).

Class 6: Intrinsic motivation, exploration, and model-based RL, DeepMind Control Suite Benchmark

Sekar, Ramanan, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. "Planning to Explore via Self-Supervised World Models." arXiv preprint arXiv:2005.05960 (2020).
Shyam, Pranav, Wojciech Jaśkowski, and Faustino Gomez. "Model-based active exploration." arXiv preprint arXiv:1810.12162 (2018).

Class 7: Goal-based intrinsic motivation and hierarchical RL, procedurally-generated maze environments

Campero, Andres, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, and Edward Grefenstette. "Learning with AMIGo: Adversarially motivated intrinsic goals." arXiv preprint arXiv:2006.12122 (2020).
Florensa, Carlos, David Held, Xinyang Geng, and Pieter Abbeel. "Automatic goal generation for reinforcement learning agents." In International conference on machine learning, pp. 1515-1528. PMLR, 2018.
Florensa, Carlos, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. "Reverse curriculum generation for reinforcement learning." arXiv preprint arXiv:1707.05300 (2017).
Nair, Ashvin V., Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, and Sergey Levine. "Visual reinforcement learning with imagined goals." Advances in Neural Information Processing Systems 31 (2018): 9191-9200.
Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. "Skew-fit: State-covering self-supervised reinforcement learning." arXiv preprint arXiv:1903.03698 (2019).

Class 8: (Change of Plans!) Goals and task curricula, part 2.

Fang, Kuan, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. "Adaptive Procedural Task Generation for Hard-Exploration Problems." arXiv preprint arXiv:2007.00350 (2020).

Learning about other agents through interaction

Class 9: Self-supervised prediction of other agents, theory of mind, and multi-agent RL with cooperation and competition

Baker, Bowen, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. "Emergent tool use from multi-agent autocurricula." arXiv preprint arXiv:1909.07528 (2019).
Hu, Hengyuan, Adam Lerer, Alex Peysakhovich, and Jakob Foerster. "" Other-Play" for Zero-Shot Coordination." arXiv preprint arXiv:2003.02979 (2020).
Foerster, Jakob N., Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. "Learning with opponent-learning awareness." arXiv preprint arXiv:1709.04326 (2017).
Rabinowitz, Neil C., Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Eslami, and Matthew Botvinick. "Machine theory of mind." arXiv preprint arXiv:1802.07740 (2018).
Tacchetti, Andrea, H. Francis Song, Pedro AM Mediano, Vinicius Zambaldi, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, and Peter W. Battaglia. "Relational forward models for multi-agent learning." arXiv preprint arXiv:1809.11044 (2018).
Xie, Annie, Dylan P. Losey, Ryan Tolsma, Chelsea Finn, and Dorsa Sadigh. "Learning Latent Representations to Influence Multi-Agent Interaction." arXiv preprint arXiv:2011.06619 (2020).

Class 10: Intrinsic motivation and multi-agent reinforcement learning

Jaques, Natasha, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, D. J. Strouse, Joel Z. Leibo, and Nando de Freitas. "Intrinsic social motivation via causal influence in multi-agent RL." (2018).
Kim, Kun Ho. "Active world model learning in agent-rich environments with progress curiosity." Proceedings of ICML 2020 (2020).
Ndousse, Kamal, Douglas Eck, Sergey Levine, and Natasha Jaques. "Learning social learning." arXiv preprint arXiv:2010.00581 (2020).

Complementary methods

Class 11: Compositionality and planning, nonstationarity

Barreto, André, Shaobo Hou, Diana Borsa, David Silver, and Doina Precup. "Fast reinforcement learning with generalized policy updates." Proceedings of the National Academy of Sciences 117, no. 48 (2020): 30079-30087.
Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional perspective on reinforcement learning." In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 449-458. JMLR. org, 2017.
Nair, Suraj, and Chelsea Finn. "Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation." arXiv preprint arXiv:1909.05829 (2019).
Padakandla, Sindhu, and Shalabh Bhatnagar. "Reinforcement learning in non-stationary environments." arXiv preprint arXiv:1905.03970 (2019).
Xu, Danfei, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. "Deep Affordance Foresight: Planning Through What Can Be Done in the Future." arXiv preprint arXiv:2011.08424 (2020).

Class 12: Active inference

Okada, Masashi, and Tadahiro Taniguchi. "Variational inference mpc for bayesian model-based reinforcement learning." In Conference on Robot Learning, pp. 258-272. PMLR, 2020.
Sajid, Noor, Philip J. Ball, and Karl J. Friston. "Active inference: demystified and compared." arXiv (2019): arXiv-1909.
Tschantz, Alexander, Beren Millidge, Anil K. Seth, and Christopher L. Buckley. "Reinforcement Learning through Active Inference." arXiv preprint arXiv:2002.12636 (2020).

Class 13: Agent57, catastrophic forgetting, hindsight experience replay, environment design

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. "Hindsight experience replay." In Advances in neural information processing systems, pp. 5048-5058. 2017.
Badia, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, and Charles Blundell. "Agent57: Outperforming the atari human benchmark." arXiv preprint arXiv:2003.13350 (2020).
Dennis, Michael, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. "Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design." Advances in Neural Information Processing Systems 33 (2020).
Kirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan et al. "Overcoming catastrophic forgetting in neural networks." Proceedings of the national academy of sciences 114, no. 13 (2017): 3521-3526.

Learning sensory representations through interaction

Class 14: A brief introduction to learning visual representations.

Bear, Daniel, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz et al. "Learning physical graph representations from visual scenes." Advances in Neural Information Processing Systems 33 (2020).
Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020).
Geirhos, Robert, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness." arXiv preprint arXiv:1811.12231 (2018).
Gidaris, Spyros, Praveer Singh, and Nikos Komodakis. "Unsupervised representation learning by predicting image rotations." arXiv preprint arXiv:1803.07728 (2018).
Haber, Nick, Damian Mrowca, Stephanie Wang, Li F. Fei-Fei, and Daniel L. Yamins. "Learning to play with intrinsically-motivated, self-aware agents." Advances in Neural Information Processing Systems 31 (2018): 8388-8399.
He, Kaiming, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. "Momentum contrast for unsupervised visual representation learning." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729-9738. 2020.
Noroozi, Mehdi, and Paolo Favaro. "Unsupervised learning of visual representations by solving jigsaw puzzles." In European Conference on Computer Vision, pp. 69-84. Springer, Cham, 2016.
Sitzmann, Vincent, Michael Zollhöfer, and Gordon Wetzstein. "Scene representation networks: Continuous 3d-structure-aware neural scene representations." In Advances in Neural Information Processing Systems, pp. 1121-1132. 2019.
Zamir, Amir R., Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese. "Taskonomy: Disentangling task transfer learning." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3712-3722. 2018.
Zhuang, Chengxu, Alex Lin Zhai, and Daniel Yamins. "Local aggregation for unsupervised learning of visual embeddings." In Proceedings of the IEEE International Conference on Computer Vision, pp. 6002-6012. 2019.
Zou, Yuliang, Zelun Luo, and Jia-Bin Huang. "Df-net: Unsupervised joint learning of depth and flow using cross-task consistency." In Proceedings of the European conference on computer vision (ECCV), pp. 36-53. 2018.

Learning about objects and physics through interaction

Class 15, part 1: Self-supervised models of real-world objects and physics, challenges in pixel space, towards end-to-end training

Armeni, Iro, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, and Silvio Savarese. "3d scene graph: A structure for unified semantics, 3d space, and camera." In Proceedings of the IEEE International Conference on Computer Vision, pp. 5664-5673. 2019.
Battaglia, Peter, Razvan Pascanu, Matthew Lai, and Danilo Jimenez Rezende. "Interaction networks for learning about objects, relations and physics." Advances in neural information processing systems 29 (2016): 4502-4510.
Janner, Michael, Jiajun Wu, Tejas D. Kulkarni, Ilker Yildirim, and Josh Tenenbaum. "Self-supervised intrinsic image decomposition." Advances in Neural Information Processing Systems 30 (2017): 5936-5946.
Mrowca, Damian, Chengxu Zhuang, Elias Wang, Nick Haber, Li F. Fei-Fei, Josh Tenenbaum, and Daniel L. Yamins. "Flexible neural representation for physics prediction." In Advances in neural information processing systems, pp. 8799-8810. 2018.
Sanchez-Gonzalez, Alvaro, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. "Learning to simulate complex physics with graph networks." arXiv preprint arXiv:2002.09405 (2020).
Villegas, Ruben, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, and Honglak Lee. "High fidelity video prediction with large stochastic recurrent neural networks." arXiv preprint arXiv:1911.01655 (2019).
Wu, Jiajun, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. "Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling." Advances in neural information processing systems 29 (2016): 82-90.
Yan, Mengyuan, Yilin Zhu, Ning Jin, and Jeannette Bohg. "Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects." IEEE Robotics and Automation Letters 5, no. 2 (2020): 2372-2379.

Class 15, part 2: Learning self-supervised models of real-world objects and physics through interaction, benchmarking interactive learning

Allen, Kelsey R., Kevin A. Smith, and Joshua B. Tenenbaum. "The tools challenge: Rapid trial-and-error learning in physical problem solving." arXiv preprint arXiv:1907.09620 (2019).
Batra, Dhruv, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine et al. "Rearrangement: A Challenge for Embodied AI." arXiv preprint arXiv:2011.01975 (2020).
Shen, Bokui, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Shyamal Buch et al. "iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes." arXiv preprint arXiv:2012.02924 (2020).
Eppner, Clemens, Roberto Martın-Martın, and Oliver Brock. "Physics-based selection of actions that maximize motion for interactive perception." RSS WS: Revisiting Contact-Turning a problem into a solution (2017).
Mandikal, Priyanka, and Kristen Grauman. "Dexterous Robotic Grasping with Object-Centric Visual Affordances." arXiv preprint arXiv:2009.01439 (2020).
Novkovic, Tonci, Remi Pautrat, Fadri Furrer, Michel Breyer, Roland Siegwart, and Juan Nieto. "Object finding in cluttered scenes using interactive perception." In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 8338-8344. IEEE, 2020.
Shao, Lin, Toki Migimatsu, and Jeannette Bohg. "Learning to Scaffold the Development of Robotic Manipulation Skills." arXiv preprint arXiv:1911.00969 (2019).
Ota, Kei, Devesh K. Jha, Diego Romeres, Jeroen van Baar, Kevin A. Smith, Takayuki Semitsu, Tomoaki Oiki, Alan Sullivan, Daniel Nikovski, and Joshua B. Tenenbaum. "Towards Human-Level Learning of Complex Physical Puzzles." arXiv preprint arXiv:2011.07193 (2020).
Patten, Timothy, Michael Zillich, and Markus Vincze. "Action selection for interactive object segmentation in clutter." In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6297-6304. IEEE, 2018.
Xu, Danfei, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. "Deep Affordance Foresight: Planning Through What Can Be Done in the Future." arXiv preprint arXiv:2011.08424 (2020).

Learning language through interaction

Class 16: Grounded language learning

Bisk, Yonatan, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata et al. "Experience grounds language." arXiv preprint arXiv:2004.10151 (2020).
Chen, Kevin, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, and Silvio Savarese. "Text2shape: Generating shapes from natural language by learning joint embeddings." In Asian Conference on Computer Vision, pp. 100-116. Springer, Cham, 2018.
Hill, Felix, Olivier Tieleman, Tamara von Glehn, Nathaniel Wong, Hamza Merzic, and Stephen Clark. "Grounded Language Learning Fast and Slow." arXiv preprint arXiv:2009.01719 (2020).
Lair, Nicolas, Cédric Colas, Rémy Portelas, Jean-Michel Dussoux, Peter Ford Dominey, and Pierre-Yves Oudeyer. "Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning." arXiv preprint arXiv:1911.03219 (2019).
Ross, Candace, Andrei Barbu, Yevgeni Berzak, Battushig Myanganbayar, and Boris Katz. "Grounding language acquisition by training semantic parsers using captioned videos." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2647-2656. 2018.
Suhr, Alane, Mike Lewis, James Yeh, and Yoav Artzi. "Evaluating Visual Reasoning Through Grounded Language Understanding." AI Magazine 39, no. 2 (2018): 45-52.
Williams, Edward C., Nakul Gopalan, Mine Rhee, and Stefanie Tellex. "Learning to parse natural language to grounded reward functions with weak supervision." In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1-7. IEEE, 2018.

Class 17: Language as a cognitive tool, emergent communication, programming language to natural communication

Bullard, Kalesha, Franziska Meier, Douwe Kiela, Joelle Pineau, and Jakob Foerster. "Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations." arXiv preprint arXiv:2010.15896 (2020).
Colas, Cédric, Tristan Karch, Nicolas Lair, Jean-Michel Dussoux, Clément Moulin-Frier, Peter Ford Dominey, and Pierre-Yves Oudeyer. "Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration." arXiv preprint arXiv:2002.09253 (2020).
Lowe, Ryan, Abhinav Gupta, Jakob Foerster, Douwe Kiela, and Joelle Pineau. "On the interaction between supervision and self-play in emergent communication." arXiv preprint arXiv:2002.01093 (2020).
Pu, Yewen, Kevin Ellis, Marta Kryven, Josh Tenenbaum, and Armando Solar-Lezama. "Program Synthesis with Pragmatic Communication." arXiv preprint arXiv:2007.05060 (2020).
Wang, Sida I., Samuel Ginn, Percy Liang, and Christoper D. Manning. "Naturalizing a programming language via interactive learning." arXiv preprint arXiv:1704.06956 (2017).

Higher-level inquiry through interaction

Class 18, part 1: Optimal experimental design, active learning, and their relation to intrinsically-motivated RL

Chen, Annie S., HyunJi Nam, Suraj Nair, and Chelsea Finn. "Batch Exploration with Examples for Scalable Robotic Reinforcement Learning." arXiv preprint arXiv:2010.11917 (2020).
Cox, David Roxbee, and Nancy Reid. The theory of the design of experiments. CRC Press, 2000.
Frazier, Peter I. "A tutorial on bayesian optimization." arXiv preprint arXiv:1807.02811 (2018).
Settles, Burr. Active learning.

Class 18, part 2: Human curiosity and internet search

Coenen, Anna, Jonathan D. Nelson, and Todd M. Gureckis. "Asking the right questions about the psychology of human inquiry: Nine open challenges." Psychonomic Bulletin & Review 26, no. 5 (2019): 1548-1587.
Lydon-Staley, David M., Dale Zhou, Ann Sizemore Blevins, Perry Zurn, and Danielle S. Bassett. "Hunters, busybodies and the knowledge network building associated with deprivation curiosity." Nature human behaviour (2020): 1-10.
Zhou, Dale, David M. Lydon-Staley, Perry Zurn, and Danielle S. Bassett. "The growth and form of knowledge networks by kinesthetic curiosity." arXiv preprint arXiv:2006.02949 (2020).