From the middle of last year, I’ve been advising or co-advising five Masters students at KU Leuven. I’ll list the topics they are working on below.
- The goal is to define how an MDP can be modelled by a first-order ProbLog program and to make a program that can solve such an MDP by using the techniques that were applied in DT-ProbLog. ProbLog is a probabilistic logic programming language developed by Luc De Raedt et al. at KU Leuven. DT-ProbLog is a Decision-Theoretic approach implemented in ProbLog.
- Mealy Machines have been used to represent non-Markovian reward models in MDP settings. In this topic, we want to define a Mealy Reward Machine (MRM) with transitions depending on the satisfaction of propositional logic formulae. And we want to learn these logical MRMs. Logical MRMs could be exponentially smaller than regular/flat MRMs, more human readable, and possibly more efficient to use.
- I previously developed the Hybrid POMDP-BDI (HPB) agent architecture. An HPB agent maintains a probability distribution over possible states it can be in, and updates this distribution every time it makes a new observation (due to an action it executes). In this topic, we want to extend the HPB architecture to deal with streams of observations. We are looking at complex event processing for inspiration.
- Angluin’s L* algorithm is a method for learning Mealy (Reward) Machines in an active-learning setting. MRMs represent non-Markovian (temporally dependent) reward models. There is a relationship between non-Markovian reward models and partially observable MDPs (POMDPs). So one might think that one could learn a reward model using L* in a POMDP. In recent work, I could not get it right. So the student is investigating this issue.
- One student is investigating general game playing (GGP) and how to extend the associated game description language (GDL) to deal with stochastic actions. GDL for imperfect information (II) already exists, which can deal with situations where not all players know what other players know, and where actions or moves have uniformly probabilistic effects (like a fair dice). The student is extending GDL II to deal with non-uniform move effects. He is also implementing a user-friendly interface for his version of GGP.