Tuesday, 28 January 2014

Matching pennies in class

Yesterday I took the first class of my new module on Game Theory. I've been very excited about this as I think Game Theory is a really fun subject to learn (and teach).

In class we covered the first two chapters of my notes (Chapter 1: Introduction to Game Theory, Chapter 2: Normal Form Games). Whilst talking about Normal Form Games I showed the students the game called Matching Pennies.

Two players each show a coin with either 'Heads' or 'Tails' showing. If both coins match then the 1st player (the row player) wins, otherwise the 2nd player (the column player) wins.

This can be represented using a 'bi-matrix':

$$\begin{pmatrix}
(1,-1)&(-1,1)\\
(-1,1)&(1,1)
\end{pmatrix}$$

Each tuple of that matrix corresponds to a pair of strategies from the set $\{H, T\}$, so if the row player chose $H$ and the column player chose $T$ then they would read the outcome in the first row and second column: $(-1,1)$. The convention used here is that outcomes show the utilities to the first and then the second player. So in this instance the row/first player would get -1 and the column/second player would get 1 (ie the column player wins because the coins where different).

I asked the students to get in to pairs and record five rounds of the game on some paper (forms and all other content for the course available at this github repo).

After that, I modified the game to give this:

$$\begin{pmatrix}
(2,-2)&(-2,2)\\
(-1,1)&(1,1)
\end{pmatrix}$$

The row player still wins when the coins match but there is just more to win/lose when $H$ is picked by the row player.

I got the students to once again record the results.

Last night I got home and instead of speaking to my wife I went through and entered all the data.


Here are some of the results.

First of all 'basic matching pennies'. Here are the moving averages of all the games played:


I'm graphing the probability with which players played $H$. As you can see 'we' got to equilibrium pretty quickly and 'on average' players were randomly swapping between $H$ and $T$.

Here is a plot of the equivalent mean score to both players:


First of all we see that the plots are reflections in the $x=0$ line of each other. This is because the game we are considering is called a Zero Sum Game: all the utility doublets sum to 0. Secondly we see that the mean score is coming around to 0. 

All of the above is great and more or less exactly what you would expect.

While playing the second game I overheard a couple of students say something like 'Oh this is a bit more complicated: we need to think'. They were completely right!

Here are the results. First of all the strategies:


It seems like students are once again playing with equal probabilities of picking $H$ or $T$. The outcome for the score is again very similar:



Is this what we expect?

Not quite.

Let us assume row players are playing a 'mixed strategy' $\sigma_1=(x,1-x)$ (ie they choose $H$ with probability $x$) and column players are playing $\sigma_2=(y,1-y)$.

Let us see what the expected utility to the row player is when $\sigma_2=(.5,.5)$ as a function of $x$ (the probability of  playing $H$):

$$u_1((x,1-x),(.5,.5))=.5(2x-(1-x))+.5(-2x+1-x)=0$$

So in fact what the row player does is irrelevant (with regards to his/her utility) as long as the column player plays $\sigma_2=(.5,.5)$.

What about the column player?

Writing down the utility to the column player when $\sigma_1=(.5,.5)$ as a function of $y$ (the probability of playing $H$):

$$u_2((.5,.5),(y,1-y))=1/2(-2y+y+2-2y+y-1)=1/2-y$$

So NOW if $\sigma_1=(.5,.5)$ it looks like the column player has SOME control over his/her utility.

Here is a plot of that $u_2$:



So our plot is that of a decreasing function. Remember $y$ is something that the column player can control. So as the column player wants to increase $u_2$: the best response they should adapt to the row player playing $\sigma_1=(.5,.5)$ is in fact $y^*=0$ because at $y=0$ the utility is at it's highest!

What this implies (for the second game) is that whilst the students were all winning and losing in equal measure (the mean score was around 0). The column player could in fact improve their strategy and take advantage of the fact that the row player was playing $\sigma_1=(.5,.5)$.

The row player can't actually do this (we did the math above and we saw that he/she couldn't really have any effect on his/her utility). What my students and I will see in Chapter 6 of my class is that in fact there is a way to make both players 'unable to improve their outcomes'. When we get there it will also shed light on the dashed lines in some of the plots of this blog post.