2020 Leaderboard Challenge

Interview with Winner Gregory Clark

This podcast features a discussion between Greg Clark, the winner of our reconnaissance blind chess 2020 leaderboard challenge and Google employee, and staff members from Johns Hopkins University Applied Physics Laboratory (JHU/APL) about how Clark designed the bot that won the challenge. Speakers include Ashley Llorens, Chief of APL’s Intelligent Systems Center (ISC); Ryan Gardner, a researcher in APL’s Asymmetric Operations Sector who works with the ISC; Jared Markowitz, an ISC reinforcement learning specialist; and Casey Richardson, one of the original inventors of the Reconnaissance Blind Chess (RBC) game and an AI specialist in APL’s Asymmetric Operations Sector.

RBC was invented by APL staff members Andy Newman, Casey Richardson, and others in 2016. APL shared it with the global AI community, wanting to bring people together to address the research challenges it embodies. They hosted a competition as part of the 2019 Neural Information Processing Systems Conference (NeurIPS), and the leaderboard challenge kept the research going after NeurIPS. Clark won the first contest in 2020.

Clark observes that RBC is particularly challenging because classic search algorithms are not applicable and “it’s not clear how to adapt the algorithms to deal with the uncertainty.” When you have so many possible paths through the game, it puts a wrench in existing algorithms that can handle uncertainty. Clark points out that with RBC, it is difficult even to create a bot that can consistently win against a random opponent. Without good sensing, if a bot tries to keep track of all the possible placements of opponent’s pieces, the computer will run out of memory.

The interviewers suggest that the challenges in RBC come up in real world applications where there is competition but the information that is available to each side is obscured and different, such as a complex business negotiation. They note that these kinds of challenges arise in the real world, but often don’t come up in games because they require an inconvenient physical setup.

Clark’s bot makes decisions by conducting playouts from sampled situations. It uses sampling to approximate the current placement of opponents’ pieces and deal with the explosion of possibilities. It keeps information on all possible piece placements throughout the game and filters them when it learns new information. It could be described as a block sequential Monte Carlo algorithm that samples multiple placements for the opponent’s pieces at each turn—using rejection sampling.

Clark used a neural network to guide the samples. He trained the network purely using imitation learning: supervised learning from data on past experience, usually from experts. In this case, he used games played by other bots. The model could have used reinforcement learning, in which the bot gets better with practice games or self-play based on results, but Clark did not go that route. Right now he does have the model running games with other bots (not self-play) to do a grid search for hyper parameters.

The architecture of Clark’s neural network is similar to the one used by AlphaZero, including a residual network with ten blocks in the main tower. On the top of the towers, he placed a policy head and value head—which he repeated for each of the opponents (if he had enough data about them).

Clark suggested that future versions of RBC would be made more challenging if the information from the 3x3 grid was unreliable (noisy sensor). He further suggested that the reconnaissance blind concept might be applied to other games like Go or Arimaa.