Facebook AI attains human-level in Diplomacy

Article:

CICERO: An AI agent that negotiates, persuades, and cooperates with people


Games have long been a proving ground for new AI advancements — from Deep Blue's victory over chess grandmaster Garry Kasparov, to AlphaGo's mastery of Go, to Pluribus out-bluffing the best humans in poker. But truly useful, versatile agents will need to go beyond just moving pieces on a board. Can we build more effective and flexible agents that can use language to negotiate, persuade, and work with people to achieve strategic goals similar to the way humans do?

Today, we're announcing a breakthrough toward building AI that has mastered these skills. We've built an agent – CICERO – that is the first AI to achieve human-level performance in the popular strategy game Diplomacy*. CICERO demonstrated this by playing on webDiplomacy.net, an online version of the game, where CICERO achieved more than double the average score of the human players and ranked in the top 10 percent of participants who played more than one game.

Personally, I would have advised against training an AI to deceive and manipulate people to achieve hostile goals. Seems like a bad idea.
 
Not sure how I feel about evil megacorporation Facebook naming it after a guy that famously got martyred losing a republic to a tyranny, either. Seems bad
 
Yes, this move will totally help Facebook and definitely not screw them over when a major bug shows up...
 
This is really incredible: Meta had earlier this year produced an AI that performed better than top humans at a no-communication variant of Diplomacy, but I'm shocked that they were able to integrate language processing so quickly afterwards.

If you want to see a live example of the bot's diplomacy the video below is a commentary of a strong diplomacy player playing against six instances of the bot. (I know the video is an hour and a half long, but you can just skip to anywhere in the middle, all of the players he is chatting with are bots.)

View: https://www.youtube.com/watch?v=u5192bvUS7k&ab_channel=DiploStrats
Since I did watch the whole thing, I'll add a couple of observations:
  • The bots occasionally say things that don't make a lot of sense*, but that's pretty rare, I think it happens like ~4 times total
  • There are a couple of funny moments when you can see the humans the NN was trained on shine through: one bot excuses not listening to a request by saying 'Sorry, I didn't see that in time' - I might buy that from a human, but I'm skeptical about the bot. Later in the game, France (a bot) calls Italy (another bot) a weak player, despite Italy's position at that point being much better than France's
  • The bots favor activity and aggressive play far more than human players, probably correctly. I expect that those innovations might be taken up by top-level human players
  • the bot's communication is significantly weaker than that of a top human player; it makes up for that with strategic vision, tactical excellence, and multitasking ability that far outstrips a human**
*specifically they refer to events that didn't happen, or claim that one player is attacking someone they clearly aren't, etc. The sentences still parse.

**I should also note that 'standard' live Diplomacy uses 15 minute turns, and the tournament the bot won was a special Blitz tournament with 5 minute turns, which handicaps the humans immensely, as 5 minutes is not enough time to analyze the position and communicate fully with the other players in the game. I suspect that it would perform significantly worse in a longer time control.

That said, at the rate that Meta is improving their Diplomacy AI, it won't be long before it passes that hurdle as well.
 
Last edited:
I'd proffer some points I first saw from Zvi's blog as counterpoints to HalaNisu's observations:
  • The bot is aggressive compared to human standard play because in this variant the game ends after eight years. Humans trained on other formats probably don't adjust to this enough.
  • Humans playing Blitz games don't have time for a full discussion with every opponent. If they have a full discussion with you, that means they think you are more important to negotiate with this turn than some other players. But bots can have a full discussion with all their opponents every turn! That means the amount-of-discussion tell makes them look friendlier than humans on average, often falsely, in a way human players can't consistently manage.
  • The bots' big advantage is that anonymity lets them piggyback on human-like revenge threats without ever following through.
These points seem sound to me, with the caveat that now that the paper's published people can test for and exploit the bots.
 
While I do agree with the general thesis of Zvi's blog post (this is not a sign that AI is about to take over the world, and this is incremental rather than revolutionary) I think he rather underestimates the bot's ability.
  • It's pretty common for end dates to be a thing, since people are more willing to commit to games if you promise them that it won't go past 2 am or whenever. The 1908 end date is on the early side, but has been used in some major tournaments. While it's true that some players won't play optimally under a fixed end date, the differences in strategy are known to the community - I don't see this as unique advantage
  • I fully agree that the bot's ability to have a full discussion with every opponent is one of its largest advantages.
  • I also agree that its opponents not knowing there was a bot playing did help the bot's performance.
However, I disagree that the above two points mean that the bot was unimpressive tactically/strategically. This speed press AI is an iteration on an earlier AI that was designed for no-communication (gunboat) games. That AI scored better than top humans, despite not having a communication edge, and despite the players knowing the bot would be in the game (albeit not which country the bot would play*). Some of the players were even deliberately looking for the bot in the hopes of screwing the bot over. It still won.

*anonymized country distribution is pretty common for non-face-to-face games among human players, this was not a thing just for the bot

Also, while the bot's unwillingness to follow through on its threats is somewhat exploitable, I'll note that this is a thing human players do too: some human players will actually throw all of their forces at whoever betrays them at no concern to their own survival, but far more will threaten to do so in the hopes their opponents won't take that risk.
 
The bot's unwillingness to follow through threats isn't the biggest vulnerability of an identified bot. It doesn't appear to keep track of the trustworthiness of individual players' statements, so it's vulnerable to repeated deceptions in a way that a human player almost certainly wouldn't be. ("Sure, I'll support your fourth attempt on MUN again. Shame it didn't work out the first three times.")

I don't see this as unique advantage
I put this point forward as an explanation for the aggressive play. It's not necessarily an advantage for the bot over the other players. But without that principle, it's easy for an observer who only plays games without a time limit to take away that the bots are superhumanly aggressive when the moves are not much more aggressive than the baseline for that particular format.
 
Back
Top