Thursday, 20 March 2014

Playing Stochastic/Markov games in class

In class on Monday my students and I looked at Stochastic games (Chapter 14 of my course notes). I decided to play a tournament similar to the iterated games that we played previously (corresponding to Chapters 9 and 10). I've blogged about them here and here.

In short, a stochastic game is a collection of states that themselves correspond to games. After players select strategies at a given state/game the next game they play (or state they are in) is given by a probability distribution.

This is a bit like playing in a casino where the table you play at depends on what happened at the previous table you played.

*The game we played in class corresponds to a casino with two tables:*

- At the first table we play a simple prisoner's dilemma (players aiming to maximise) their utility:

$$\begin{pmatrix}
(2,2)&(0,3)\\
(3,0)&(1,1)
\end{pmatrix}$$

The probability distribution at this table (indicating what game we played next) is given by:

$$\begin{pmatrix}
(.75,.25)&(0,1)\\
(0,1)&(.5,.5)
\end{pmatrix}$$

    - If both players cooperate (getting a utility of $2$ each) then there is a 75% chance that they play at this same table again.
    - If only one player cooperates and the other defects (the defector getting a utility of $3$ and the cooperator a utility of $0$) then the players move to the other game with 100% probability.
    - If both players defect (each getting a utility of $1$) then they will play the same game with 50% chance.

- The second table was a dummy game that basically ended the game:

$$\begin{pmatrix}
(0&0)
\end{pmatrix}$$

The probability distribution for this game is $(0,1)$.

The assumption made in this class is that the games are played infinitely and as such a discounting factor $\delta$ is used. When playing this in class we interpreted the discounting factor as the probability of the game continuing. So the game ended when the probabilities indicated that we were in the second game or when the discounting factor said so (we used $\delta=1/2$).

The class formed 4 teams and we played a round robin.

The first round had Tom's Team go up against Team Roy and Barmy go up against 4:



We see that Tom's team and Team Roy cooperated both times before the game ended whilst 4 defected from the start against Barmy.

The next round had Barmy go up against Tom's Team and Team Roy go up against 4:



We see that Barmy and Team Roy both cooperated before the game ended and Team Roy and 4 both defected before the game ended.

The last game had Tom's Team go up against 4 and Team Roy go up against Barmy. Below you can also see the final cumulative scores:


As you can see Tom's Team won (the name is after +Paul Harper son Tom joined the class during which we played the iterated prisonder's dilemma).

To decide who came second (they could choose the second prize to either be apples or party rings) a best of three game of Rock, Paper, Scissors, Lizard, Spock was played with Joe winning the party rings for Team Roy ('Scissors decapitates lizard').




I think this was good fun and hopefully enabled the students to have a sound grasp of what is meant by a state, a transition, etc..

Interestingly all teams happened to play Markov Strategies (i.e. in any given game they all did the same thing in each state) although this might have been different if the probabilities allowed the game to last longer. In the course we are restricting ourselves to Markov strategies.

After playing this we went through obtaining 2 equilibria for this game (both players defecting and both players cooperating).

Here are some photos that a student grabbed during the class and also a re-share of the gif of +Paul Harper's son Tom using a Nerf gun on those who defected against his team during the Prisoner's dilemma tournament: