Last week, I attended the 13th International Conference on Agents and Artificial Intelligence (ICAART). On 4 February I presented my work about Online Learning of Non-Markovian Reward Models. On 5 Feb. I chaired two technical sessions. On the last day, (Saturday, 6 Feb.) I could simply attend talks.
There were several interesting papers (for me) and a few interesting keynote talks. The keynote by Gerhard Widmer titled Con Espressione! AI, Machine Learning, and Musical Expressivity was especially interesting: I was not aware how much research has been done in the field of music. Another keynote I found interesting, especially related to my work, was by Guy Van den Broeck about Probabilistic Circuits and Probabilistic Programs. Guy argues that many of the currently used probabilistic graphical models are ‘underpowered’ or incorrectly applied, and that Probabilistic Circuits and Probabilistic Programs are newer, richer and often tractable. There is a three-hour tutorial here (independent of ICAART).
From the middle of last year, I’ve been advising or co-advising five Masters students at KU Leuven. I’ll list the topics they are working on below.
- The goal is to define how an MDP can be modelled by a first-order ProbLog program and to make a program that can solve such an MDP by using the techniques that were applied in DT-ProbLog. ProbLog is a probabilistic logic programming language developed by Luc De Raedt et al. at KU Leuven. DT-ProbLog is a Decision-Theoretic approach implemented in ProbLog.
- Mealy Machines have been used to represent non-Markovian reward models in MDP settings. In this topic, we want to define a Mealy Reward Machine (MRM) with transitions depending on the satisfaction of propositional logic formulae. And we want to learn these logical MRMs. Logical MRMs could be exponentially smaller than regular/flat MRMs, more human readable, and possibly more efficient to use.
- I previously developed the Hybrid POMDP-BDI (HPB) agent architecture. An HPB agent maintains a probability distribution over possible states it can be in, and updates this distribution every time it makes a new observation (due to an action it executes). In this topic, we want to extend the HPB architecture to deal with streams of observations. We are looking at complex event processing for inspiration.
- Angluin’s L* algorithm is a method for learning Mealy (Reward) Machines in an active-learning setting. MRMs represent non-Markovian (temporally dependent) reward models. There is a relationship between non-Markovian reward models and partially observable MDPs (POMDPs). So one might think that one could learn a reward model using L* in a POMDP. In recent work, I could not get it right. So the student is investigating this issue.
- One student is investigating general game playing (GGP) and how to extend the associated game description language (GDL) to deal with stochastic actions. GDL for imperfect information (II) already exists, which can deal with situations where not all players know what other players know, and where actions or moves have uniformly probabilistic effects (like a fair dice). The student is extending GDL II to deal with non-uniform move effects. He is also implementing a user-friendly interface for his version of GGP.
This is the start of my third and last year at KU Leuven as a postdoc.
Eventually, we published a paper about learning non-Markovian reward machines in an active learning setting. So, in an MDP, if the agent’s rewards have temporal dependencies, we learn a finite state machine that models this temporal reward behaviour.
At the moment, we are finishing up a project about learning safety properties given a set of MDP states labeled as either safe or dangerous. What makes our work novel is that we are in a relational MDP setting, and the safety properties are expressed as probabilistic computation tree logic (pCTL) formulae.
I’m working on the topic of an agent learning a non-Markovian reward model, given the agent’s interactions with it environment — a Markov decision process (MDP). It is taking me longer than I hoped to get results.
I am also the daily advisor of three Masters students. The topics are
- Extending and developing a better understanding of my work on “probabilistic belief revision via similarity of worlds modulo evidence”.
- Investigating probabilistic belief update and an analysis of partially observable MDP (POMDP) and dynamic Bayesian Networks (DBN) methods to do so.
- A simplification, analysis and implementation of my work on maximising expected impact of an agent in a network of potentially hostile agents.
I am also involved in setting up one of the topics and evaluating students in this semester’s Capita Selecta course. The topic I am leading is Safe AI and Reinforcement Learning.
I arrived in Leuven, Belgium (with my wife and two cats) on the first day of 2019.
I’ll been working in the VeriLearn project, that is, Verifying Learning in Artificial Intelligence.
I’m in Luc De Raedt’s group, which is part of the DTAI group in the Computer Science department.
One full paper was accepted and and a shorter paper was also accepted that will be presented as a poster at the German conference on AI. The former is about investigations into generalizing approaches to probabilistic belief revision, going from Bayesian conditioning to Lewis imaging. The latter is about how to formally describe how agents should act given that their impact on each other could be positive or negative to some degree. A notion of reputation is used.
The third paper – about probabilistic belief update – was accepted at a co-located workshop called Formal and Cognitive Reasoning.
I’ll present the papers and poster in Berlin in September.
I got the opportunity to share my knowledge with postgrad students and staff of the Faculty of Computer Science at the University of Ljubljana, Slovenia. Over two weeks, I taught two two-hour lessons on Probabilistic Belief Change, and one three-hour crash-course on Partially Observable Markov Decision Processes.
The nature in Slovenia is wonderful. I walked a lot in the forests. The Shangri La Hotel spoiled me with their breakfasts. I can recommend the Shangri La.
I was there for the last two weeks of May 2018.
I was invited to Macquarie University in Sydney, Australia to collaborate with Abhaya Nayak on probabilistic belief revision, and trust between agents. The visit was for six weeks, ending early in December 2017.
My wife accompanied me. We stayed at the student hostel across the road. Sydney is nice. One can travel for a flat rate the whole day on Sundays.
I was registered for my Doctorate with the University of KwaZulu-Natal (UKZN), South Africa. My main supervisor was Thomas (Tommie) Meyer and joint-supervisor Gerhard Lakemeyer.
I spent a year at the RWTH Aachen University in Germany with Prof. Lakemeyer. The rest of the time, i was living in Pretoria, South Africa, while working on the PhD.
The thesis culminated in the Stochastic Decision Logic (SDL), a formal logic for specifying and reasoning about POMDPs.
After completing my Doctorate, i received a two-year post-doctorate fellowship with UKZN, under the supervision of Dr. Deshendran Moodley. However, i still worked remotely from Pretoria. My two main publications in this time were about probabilistic belief revision (European Conference on Artificial Intelligence, 2016) and proposing an agent architecture which combines the POMDP framework and the belief-desire-intention (BDI) architecture (Journal of Cognitive Systems Research).
Currently i am in the middle of a two-year post-doctorate fellowship, sponsored by the Claude Leon Foundation, and situated at the University of Cape Town. My wife and i moved down to Cape Town for this one — we’re actually just two kilometers from the campus.
I’m working on four papers: (1) an extension to POMDPs involving trust between multiple agents, (2) general probabilistic belief update, (3) probabilistic belief change based on similarity weighting and (4) combining description logics with probabilistic belief revision. All this is in collaboration with various researchers.
In August 2017, i presented A Stochastic Belief Management Framework for Agent Control at the first workshop on Architectures for Generality and Autonomy.
Before the Ph.D.
I’ve been interested in Artificial Intelligence since about 2000, when i started my studies in Computer Science. In fact, i decided to study CS because of my interest in AI.
At first, i wanted to study Natural Language Processing (NLP) or Artificial Neural Networks (ANNs). The university where i was studying (University of South Africa; UNISA) had a strong Formal Logic (FL) track and i found that i enjoyed the subject quite a lot. Nonetheless, i did take an Honous (4th year) course in each of NLP and ANNs. My Honours project was on Constraint Logic Programming.
In my Masters degree i studied the Situation Calculus and the programming language based on it, called Golog. I also learnt DTGolog, an extension of Golog which is based on decision theory. And i got interested in partially observable Markov decision processes (POMDPs). My dissertation was thus to extend DTGolog to deal with partial observablility. (DTGolog can be thought of a programming language for MDPs.) My resulting programming language was called PODTGolog.
In a next post, i’ll talk about my PhD and beyond.