I attended ICAART in Porto, Portugal in February.
I presented a paper which is based on the work of Master’s student, Ebert Theeuwes and co-supervised by Gabriele Venturato. It is about merging belief-nodes in MCTS to be more efficient in the search process. Because we are searching in ‘POMDP-space’, the nodes to be (potentially) merged represent belief-states.
I also presented a position paper on my idea for a Reinforcement Learning framework in spare reward problems. It is based on MCTS, goal-conditioned policies and hierarchical planning. There is a lot still to be done to work out the details of this framework. In the paper, i suggest a high-level algorithm.
AJ Westley presented his paper at the conference too. I supervised AJ’s Honour’s degree project, and the paper he presented is a product of his project report. His work is based on an idea i had a few years ago for a novel measure of safety. We developed an approach to pre-compute a safe policy space and then do RL in that space. So his work falls under the area of Safe RL.