In this project, we are designing strategies for a group of robots/ agents to catch an adversary within a known map. The position of the adversary is unkown to the agents, and they maintain a belief of the position of the adversary as well as that of all other agents.
The agents may try several simple policies to catch the adversary, e.g. hallway patrol (random action), belief gobbler (action towards maximum belief state among neighbours), seeker (action towards maximum (belief/distance)). The overall strategy - Decentralized Multi-policy Decision Making - is that each agent forward simulates each of the simple policies and selects the optimal policy. This policy is followed to get the action for next k time-steps.