I’m working on the topic of an agent learning a non-Markovian reward model, given the agent’s interactions with it environment — a Markov decision process (MDP). It is taking me longer than I hoped to get results.
I am also the daily advisor of three Masters students. The topics are
- Extending and developing a better understanding of my work on “probabilistic belief revision via similarity of worlds modulo evidence”.
- Investigating probabilistic belief update and an analysis of partially observable MDP (POMDP) and dynamic Bayesian Networks (DBN) methods to do so.
- A simplification, analysis and implementation of my work on maximising expected impact of an agent in a network of potentially hostile agents.
I am also involved in setting up one of the topics and evaluating students in this semester’s Capita Selecta course. The topic I am leading is Safe AI and Reinforcement Learning.