Thomas Degris' Publications

Selected Publications:

Off-Policy Actor-Critic. T. Degris, M. White, R.S. Sutton (2012). RLPark demo.
In Proceedings of the 29th International Conference on Machine Learning.
Abstract:
This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning with the flexibility in action selection given by actor-critic methods. We derive an incremental, linear time and space complexity algorithm that includes eligibility traces, prove convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.
Scaling-up Knowledge for a Cognizant Robot. T. Degris, J. Modayil (2012).
In notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI.
Abstract:
This paper takes a new approach to the old adage that knowledge is the key for artificial intelligence. A cognizant robot is a robot with a deep and immediately accessible understanding of its interaction with the environment. An understanding the robot can use to flexibly adapt to novel situations. Such a robot will need a vast amount of situated, revisable, and expressive knowledge to display flexible intelligent behaviors. Instead of relying on human-provided knowledge, we propose that an arbitrary robot can autonomously acquire pertinent knowledge directly from everyday interaction with the environment. We show how existing ideas in reinforcement learning can enable a robot to maintain and improve its knowledge. The robot performs a continual learning process that scales-up knowledge acquisition to cover a large number of facts, skills and predictions. This knowledge has semantics that are grounded in sensorimotor experience. We see the approach of developing more cognizant robots as a necessary key step towards broadly competent robots.
Learning the Structure of Factored Markov Decision Process in Reinforcement Learning Problems T. Degris, O. Sigaud, P.-H. Wuillemin (2006). Demo with the Counter Strike video game.
In Proceedings of the 23rd International Conference on Machine Learning.
Abstract:
Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (fmdps). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDyna, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDyna integrates incremental planning algorithms based on fmdps with supervised learning techniques building structured representations of the problem. We describe spiti, an instantiation of sdyna, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that spiti can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.

Publications and Communications:

Prediction and Anticipation for Adaptive Artificial Limbs P.M. Pilarski, M.R. Dawson, T. Degris, J.P. Carey, K.M. Chan, J.S. Hebert, and R.S. Sutton (2013). In IEEE Robotics and Automation Magazine, Special Issue on Assistive Robotics, March 2013, in press.
Towards Prediction-Based Prosthetic Control P.M. Pilarski, T. Degris, M.R. Dawson, J.P. Carey, K.M. Chan, J.S. Hebert, and R.S. Sutton (2012). In Proceedings of the 17th International Functional Electrical Stimulation Society Conference (IFESS), Banff, Canada, pp. 26–29, 2012.
Off-Policy Actor-Critic T. Degris, M. White, R.S. Sutton (2012). In Proceedings of the 29th International Conference on Machine Learning. RLPark demo.
Dynamic Switching and Real-time Machine Learning for Improved Human Control of Assistive Biomedical Robots. P.M. Pilarski, M.R. Dawson, T. Degris, J.P. Carey, and R.S. Sutton (2012). In Proceedings of the 4th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
Scaling-up Knowledge for a Cognizant Robot T. Degris, J. Modayil (2012). In notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI.
Model-Free Reinforcement Learning with Continuous Action in Practice. T. Degris, P. M. Pilarski, R. S. Sutton (2012). In Proceedings of the American Control Conference. RLPark demo.
Tuning-free step-size adaptation. A. R. Mahmood, R. S. Sutton, T. Degris, P. M. Pilarski (2012). In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, in press.
Online Human Training of a Myoelectric Prosthesis Controller via Actor-Critic Reinforcement Learning. P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, R. S. Sutton (2011). Proceedings of the 2011 IEEE International Conference on Rehabilitation Robotics, Zurich, Switzerland.
Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction.R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup (2011). In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan.
Real-time Machine Learning in Rehabilitation Robotics for Adaptable Artificial Limbs. P.M. Pilarski, M.R. Dawson, T. Degris, F. Fahimi, J. Carey, R.S. Sutton (2011). Glenrose Rehabilitation Hospital Research Symposium (Edmonton, Alberta, Canada, abstract and poster presentation).
Real-time Machine Learning in Rehabilitation Robotics for Adaptable Artificial Limbs. P.M. Pilarski, M.R. Dawson, T. Degris, F. Fahimi, J. Carey, R.S. Sutton (2011). Tech Futures Summit 2011: Deploying Innovation (Banff, Canada, August 28-30, 2011, abstract and poster presentation).
An Encouraging Mobile Robot in the Glenrose Rehabilitation Hospital. T. Degris, P.M. Pilarski, J. Modayil, R.S. Sutton, M.J. Cimolini, and J. Raso. (2011). 7th International Congress on Industrial and Applied Mathematics, ICIAM 2011 (Vancouver, Canada, July 18?22, 2011, poster presentation).
Off-policy knowledge maintenance for robots. J. Modayil, P. M. Pilarski, A. White, T. Degris, R. S. Sutton (2010). Robotics Science and Systems Workshop (Towards Closing the Loop: Active Learning for Robotics). Extended abstract and poster.
Small-Timescale Reinforcement Learning for Power Management on a Mobile Robot. P.M. Pilarski, T. Degris, and R.S. Sutton (2010). 2010 Alberta Power Industry Consortium Power and Energy Innovation Forum (Edmonton, Canada, Nov. 4, 2010, invited poster presentation).
Continuous Actor-Critic Methods for Adaptive Prosthetics. P.M. Pilarski, M.R. Dawson, T. Degris, F. Fahimi, J. Carey, and R.S. Sutton (2010). In proceedings of the 2010 MITACS / CORS Joint Annual Conference (Edmonton, Alberta, May 25-28, 2010; abstract and oral presentation).
Robust Step-size Adaptation for Online Optimization. A. Mahmood, T. Degris, P.M. Pilarski, and R.S. Sutton (2010). In proceedings of the 2010 MITACS/CORS Joint Annual Conference (Edmonton, May 25-28, 2010, abstract, poster, and oral presentation).
The Critterbot: a Subjective Robotic Project. M. Bellemare, M. Bowling, T. Degris, A. Koop, C. Rayner, M. Sokolsky, R. Sutton, A. White, E. Wiewiora (2009). Multidisciplinary Symposium on Reinforcement Learning
Exploiting additive structure in factored mdps for reinforcement learning. T. Degris, O. Sigaud, O., and P.-H. Wuillemin (2008). In the European Workshop Reinforcement Learning
Learning the Structure of Factored Markov Decision Process in Reinforcement Learning Problems. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). In the Proceedings of the 23rd International Conference on Machine Learning
Chi-square Tests Driven Method for Learning the Structure of Factored MDPs. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). In the Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence
Rapid Response of Head Direction Cells to Reorienting Visual Cues: a Computational Model.T. Degris, O. Sigaud, S. I. Wiener, and A. Arleo (2004). In Neurocomputing, vol 58-60C
A Spiking Neuron Model of Head-direction Cells for Robot Orientation. T. Degris, L. Lachèze, C. Boucheny, and A. Arleo (2004). In the Proceedings of the Eighth International Conference on Simulation of Adaptive Behavior

Publications en Français:

Apprentissage par Renforcement sans Modèle et avec Action Continue T. Degris, P. M. Pilarski, R. S. Sutton (2012). Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes
Apprentissage par Renforcement Factorisé pour le Comportement de Personnages non Joueurs T. Degris, O. Sigaud, P.-H.Wuillemin (2009). Numéro spécial sur les Jeux Vidéo. Revue d'Intelligence Artificielle.
Représentations Factorisées. Chapitre du livre: Processus décisionnels de Markov en intelligence artificielle (volume 2)
Apprentissage par Renforcement exploitant la Structure Additive des MDP Factorisés.T. Degris, O. Sigaud, P.-H. Wuillemin (2007). Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes
Apprentissage de la structure des processus de décision markoviens factorisés pour l'apprentissage par renforcement. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes

Thèse de Doctorat: Apprentissage par Renforcement dans les Processus de Décision Markoviens Factorisés. (2007).