Thomas Degris' Publications
Selected Publications:
- Off-Policy Actor-Critic.
T. Degris, M. White, R.S. Sutton (2012).
RLPark demo.
In Proceedings of the 29th International Conference on Machine Learning.
Abstract:
This paper presents the first actor-critic algorithm for off-policy
reinforcement learning. Our algorithm is online and incremental, and
its per-time-step complexity scales linearly with the number of learned weights.
Previous work on actor-critic algorithms is limited to the on-policy setting
and does not take advantage of the recent advances in off-policy gradient
temporal-difference learning. Off-policy techniques, such as Greedy-GQ,
enable a target policy to be learned while following and obtaining data from
another (behavior) policy. For many problems, however, actor-critic methods are
more practical than action value methods (like Greedy-GQ) because they explicitly
represent the policy; consequently, the policy can be stochastic and utilize a
large action space. In this paper, we illustrate how to practically combine
the generality and learning potential of off-policy learning with the flexibility
in action selection given by actor-critic methods. We derive an incremental,
linear time and space complexity algorithm that includes eligibility traces, prove
convergence under assumptions similar to previous off-policy algorithms, and
empirically show better or comparable performance to existing algorithms on
standard reinforcement-learning benchmark problems.
- Scaling-up Knowledge for a
Cognizant Robot. T. Degris, J. Modayil (2012).
In notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI.
Abstract:
This paper takes a new approach to the old adage that
knowledge is the key for artificial intelligence. A cognizant robot is a
robot with a deep and immediately accessible understanding of its interaction
with the environment. An understanding the robot can use to flexibly adapt
to novel situations. Such a robot will need a vast amount of situated, revisable,
and expressive knowledge to display flexible intelligent behaviors. Instead of
relying on human-provided knowledge, we propose that an arbitrary robot can
autonomously acquire pertinent knowledge directly from everyday interaction
with the environment. We show how existing ideas in reinforcement learning
can enable a robot to maintain and improve its knowledge. The robot performs
a continual learning process that scales-up knowledge acquisition to cover a
large number of facts, skills and predictions. This knowledge has semantics
that are grounded in sensorimotor experience. We see the approach of developing
more cognizant robots as a necessary key step towards broadly competent robots.
- Learning the Structure of Factored Markov
Decision Process in Reinforcement Learning Problems
T. Degris, O. Sigaud, P.-H. Wuillemin (2006).
Demo with the Counter Strike video game.
In Proceedings of the 23rd International Conference on Machine Learning.
Abstract:
Recent decision-theoric planning algorithms are able to find optimal
solutions in large problems, using Factored Markov Decision Processes
(fmdps). However, these algorithms need a perfect knowledge of the
structure of the problem. In this paper, we propose SDyna, a general
framework for addressing large reinforcement learning problems by
trial-and-error and with no initial knowledge of their structure.
SDyna integrates incremental planning algorithms based on fmdps with
supervised learning techniques building structured representations of
the problem. We describe spiti, an instantiation of sdyna, that uses
incremental decision tree induction to learn the structure of a
problem combined with an incremental version of the Structured Value
Iteration algorithm. We show that spiti can build a factored
representation of a reinforcement learning problem and may improve the
policy faster than tabular reinforcement learning algorithms by
exploiting the generalization property of decision tree induction
algorithms.
Publications and Communications:
- Prediction and Anticipation for Adaptive Artificial Limbs
P.M. Pilarski, M.R. Dawson, T. Degris, J.P. Carey, K.M. Chan, J.S. Hebert, and R.S. Sutton (2013).
In IEEE Robotics and Automation Magazine, Special Issue on Assistive Robotics, March 2013, in press.
- Towards Prediction-Based Prosthetic Control
P.M. Pilarski, T. Degris, M.R. Dawson, J.P. Carey, K.M. Chan, J.S. Hebert, and R.S. Sutton (2012).
In Proceedings of the 17th International Functional Electrical Stimulation Society Conference (IFESS), Banff, Canada, pp. 26–29, 2012.
- Off-Policy Actor-Critic
T. Degris, M. White, R.S. Sutton (2012). In Proceedings of the 29th International Conference on Machine Learning.
RLPark demo.
- Dynamic Switching and Real-time Machine Learning for Improved Human Control of Assistive Biomedical Robots.
P.M. Pilarski, M.R. Dawson, T. Degris, J.P. Carey, and R.S. Sutton (2012).
In Proceedings of the 4th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
- Scaling-up Knowledge for a Cognizant Robot
T. Degris, J. Modayil (2012). In notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI.
- Model-Free Reinforcement Learning with Continuous Action in Practice.
T. Degris, P. M. Pilarski, R. S. Sutton (2012). In Proceedings of the American Control Conference.
RLPark demo.
- Tuning-free step-size adaptation.
A. R. Mahmood, R. S. Sutton, T. Degris, P. M. Pilarski (2012). In Proceedings of the
IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, in press.
-
Online Human Training of a Myoelectric Prosthesis Controller via Actor-Critic Reinforcement Learning.
P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, R. S. Sutton (2011).
Proceedings of the 2011 IEEE International Conference on Rehabilitation Robotics, Zurich, Switzerland.
- Horde: A scalable real-time architecture for learning knowledge from
unsupervised sensorimotor interaction.R. S. Sutton, J. Modayil, M. Delp, T. Degris,
P. M. Pilarski, A. White, D. Precup (2011). In Proceedings of the Tenth International
Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan.
- Real-time Machine Learning in Rehabilitation Robotics for Adaptable Artificial Limbs.
P.M. Pilarski, M.R. Dawson, T. Degris, F. Fahimi, J. Carey, R.S. Sutton (2011). Glenrose
Rehabilitation Hospital Research Symposium (Edmonton, Alberta, Canada, abstract and poster
presentation).
- Real-time Machine Learning in Rehabilitation Robotics for Adaptable Artificial Limbs.
P.M. Pilarski, M.R. Dawson, T. Degris, F. Fahimi, J. Carey, R.S. Sutton (2011). Tech Futures
Summit 2011: Deploying Innovation (Banff, Canada, August 28-30, 2011, abstract
and poster presentation).
- An Encouraging Mobile Robot in the Glenrose Rehabilitation Hospital.
T. Degris, P.M. Pilarski, J. Modayil, R.S. Sutton, M.J. Cimolini, and J. Raso. (2011).
7th International Congress on Industrial and Applied Mathematics, ICIAM 2011
(Vancouver, Canada, July 18?22, 2011, poster presentation).
- Off-policy knowledge maintenance for robots. J. Modayil, P. M. Pilarski,
A. White, T. Degris, R. S. Sutton (2010). Robotics Science and Systems Workshop (Towards
Closing the Loop: Active Learning for Robotics). Extended abstract and poster.
- Small-Timescale Reinforcement Learning for Power Management on a Mobile Robot.
P.M. Pilarski, T. Degris, and R.S. Sutton (2010). 2010 Alberta Power Industry Consortium Power
and Energy Innovation Forum (Edmonton, Canada, Nov. 4, 2010, invited poster presentation).
- Continuous Actor-Critic Methods for Adaptive Prosthetics. P.M. Pilarski, M.R. Dawson,
T. Degris, F. Fahimi, J. Carey, and R.S. Sutton (2010). In proceedings of the 2010 MITACS / CORS
Joint Annual Conference (Edmonton, Alberta, May 25-28, 2010; abstract and oral presentation).
- Robust Step-size Adaptation for Online Optimization. A. Mahmood, T. Degris, P.M. Pilarski,
and R.S. Sutton (2010). In proceedings of the 2010 MITACS/CORS Joint Annual Conference
(Edmonton, May 25-28, 2010, abstract, poster, and oral presentation).
- The Critterbot: a Subjective Robotic Project.
M. Bellemare, M. Bowling, T. Degris, A. Koop, C. Rayner, M. Sokolsky, R. Sutton, A. White,
E. Wiewiora (2009). Multidisciplinary Symposium on Reinforcement Learning
- Exploiting additive structure in factored mdps
for reinforcement learning. T. Degris, O. Sigaud, O., and P.-H. Wuillemin (2008). In the
European Workshop Reinforcement Learning
- Learning the Structure of Factored Markov Decision Process
in Reinforcement Learning Problems. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). In the
Proceedings of the 23rd International Conference on Machine Learning
- Chi-square Tests Driven Method for Learning the Structure
of Factored MDPs. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). In the Proceedings of the 22nd
Conference on Uncertainty in Artificial Intelligence
- Rapid Response of Head Direction Cells to Reorienting
Visual Cues: a Computational Model.T. Degris, O. Sigaud, S. I. Wiener, and A. Arleo (2004).
In Neurocomputing, vol 58-60C
- A Spiking Neuron Model of Head-direction Cells for Robot
Orientation. T. Degris, L. Lachèze, C. Boucheny, and A. Arleo (2004). In the Proceedings of the
Eighth International Conference on Simulation of Adaptive Behavior
Publications en Français:
- Apprentissage par Renforcement sans Modèle et avec Action Continue
T. Degris, P. M. Pilarski, R. S. Sutton (2012). Actes des Journées Francophones sur la
Planification, la Décision et l'Apprentissage pour la conduite des systèmes
- Apprentissage par Renforcement Factorisé pour le Comportement de Personnages non Joueurs
T. Degris, O. Sigaud, P.-H.Wuillemin (2009). Numéro spécial sur les Jeux Vidéo. Revue d'Intelligence
Artificielle.
- Représentations Factorisées. Chapitre du livre:
Processus décisionnels de Markov en intelligence artificielle (volume 2)
- Apprentissage par Renforcement exploitant la Structure Additive
des MDP Factorisés.T. Degris, O. Sigaud, P.-H. Wuillemin (2007). Actes des Journées Francophones sur la
Planification, la Décision et l'Apprentissage pour la conduite des systèmes
- Apprentissage de la structure des processus de décision markoviens
factorisés pour l'apprentissage par renforcement. T. Degris, O. Sigaud, P.-H. Wuillemin (2006). Actes des
Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes
Thèse de Doctorat:
Apprentissage par Renforcement dans les Processus de Décision
Markoviens Factorisés. (2007).