Department of Computing

Michael Fairbank

PhD Student

Photo of Mr Michael Fairbank Mr Michael Fairbank
PhD Student
Room:
Department of Computing
School of Informatics
City University
London EC1V OHB

abdy934@soi.city.ac.uk
tel: +44 20 7040
fax: +44 20 7040 8887

I am a part time PhD student, researching neural networks and reinforcement learning.


My PhD supervisor is Eduardo Alonso.


Reinforcement Learning by Value Gradients


In Reinforcement Learning (RL) the objective is for an agent (e.g. a robot) to learn to navigate through an environment so as to maximise a given reward function. Reinforcement learning often makes use of a value-function which is a scalar field of the state space that the robot is moving within. This value-function assigns a score to each point of state space rating how "good" that position is.

The idea of my research is to attempt to explicitly learn the gradient of this value-function with respect to the state vector. This has the advantage of obviating the need for local exploration, hence increasing efficiency. It also has yielded a successful convergence proof for value-function learning with a general smooth function approximator for control problems. This convergence proof was achieved by proving equivalence, under certain conditions, to gradient ascent on the total reward function (a process known as policy gradient learning), hence also providing a theoretical connection between these two rival paradigms of RL.


My research is most closely related to Dual Heuristic Programming (DHP) and Globalized Dual Heuristic Programming (GDHP) by Paul Werbos. These are two extremely powerful methods used by the Adaptive Dynamic Programming (ADP) community, but strangely neglected by the RL community.


Gentle introductions:


In greater depth:


Applications:


Recurrent Neural Networks applied to Control problems

These applications do not use a value-function at all. A recurrent neural network controls the agent directly thus allowing it to tackle non-Markovian environments.

Applications:

An essay on how close these two applications come to intelligence.