Thomas Degris' Publications

Selected Publications:

  • Off-Policy Actor-Critic. T. Degris, M. White, R.S. Sutton (2012). RLPark demo.
    In Proceedings of the 29th International Conference on Machine Learning.
    This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning with the flexibility in action selection given by actor-critic methods. We derive an incremental, linear time and space complexity algorithm that includes eligibility traces, prove convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.
  • Scaling-up Knowledge for a Cognizant Robot. T. Degris, J. Modayil (2012).
    In notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI.
    This paper takes a new approach to the old adage that knowledge is the key for artificial intelligence. A cognizant robot is a robot with a deep and immediately accessible understanding of its interaction with the environment. An understanding the robot can use to flexibly adapt to novel situations. Such a robot will need a vast amount of situated, revisable, and expressive knowledge to display flexible intelligent behaviors. Instead of relying on human-provided knowledge, we propose that an arbitrary robot can autonomously acquire pertinent knowledge directly from everyday interaction with the environment. We show how existing ideas in reinforcement learning can enable a robot to maintain and improve its knowledge. The robot performs a continual learning process that scales-up knowledge acquisition to cover a large number of facts, skills and predictions. This knowledge has semantics that are grounded in sensorimotor experience. We see the approach of developing more cognizant robots as a necessary key step towards broadly competent robots.
  • Learning the Structure of Factored Markov Decision Process in Reinforcement Learning Problems T. Degris, O. Sigaud, P.-H. Wuillemin (2006). Demo with the Counter Strike video game.
    In Proceedings of the 23rd International Conference on Machine Learning.
    Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (fmdps). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDyna, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDyna integrates incremental planning algorithms based on fmdps with supervised learning techniques building structured representations of the problem. We describe spiti, an instantiation of sdyna, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that spiti can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.

Publications and Communications:

Publications en Français:

Thèse de Doctorat: Apprentissage par Renforcement dans les Processus de Décision Markoviens Factorisés. (2007).