I recently finished converting two agent control algorithms from Python code i used for research into C# code for Unity (the real-time 3D engine).
The first algorithm is based on my work with Reward Machines: instead of rewarding an agent for a given action in a given state, a reward machine allows one to specify rewards for sequences of observations. Every observation is mapped from an action-state pair. For instance, if you want to make your agent kick the ball twice in a row, then give it a reward only after seeing that it has kicked the ball twice in a row. A regular reward function would only be able to give the same reward for the first and second kick.
The second algorithm is based on my earlier work on a Hybrid POMDP-BDI agent architecture. In this architecture, an agent can pursue several goals at the same time. The goals that the agent is currently pursuing are called its intentions. The agent programmer must specify what it means for a goal/intention to be satisfied. When an intention is satisfied ‘enough’ or cannot be satisfied at the moment, then it is removed from the set of intentions. The ‘desire-level’ of a goal which has not been satisfied for a while increases. When a goal’s desire-level reaches a user-defined threshold, the goal becomes an intention. With this algorithm, all goals are satisfied periodically. There are several parameters that the programmer can set to achieve a desired agent behavior.
These two algoriths still need some poslishing (including comments and descriptions) and i need to implement each one with more example problems. I’ll be making them available on GitHub in a few weeks.