Thursday, 27 March 2014

Scheduling group presentations using Graph Theory and Sage.

Yesterday (2014-03-26) I spoke at the Embedded Enterprise Exchange about my new first year module. I used a Hangout on air during the talk and you can see it here:

That's not what I'm going to talk about here though. The course I talked baout ends with all the 'companies' (groups of 4 students) giving a $\approx 25$ minute talk.

I need to schedule 40ish talks and this needs to fit around the student availability as well as my own. In this post I'll describe how I did this using a combination of Doodle, +Sage Mathematical Software System and Graph Theory.

The beginning of this post will be some of the boring details but towards the end I start talking about the mathematics (so feel free to skip to there...).

First of all I needed to know my students availability.

For this I simply used Doodle: I kind of use Doodle for every meeting I have to schedule (they also offer a cool tool that lets you show your availability so students/colleagues have an indication of when I might be able to meet with them).

Here's a screenshot of the responses:

You can't really see anything there as I had to zoom out a lot to grab the whole picture. Doodle allows you to download the information for any given poll in .xls format so I could relatively easily obtain the biadjacency matrix $M$ for my problem. Where $M_{ij}$ is 1 if group $i$ is available for schedule slot $j$ and 0 otherwise.

The mathematics and code needed.

Once I've got a .csv file (by tweaking the .xls file) of the biadjacency matrix I import that in to +Sage Mathematical Software System  and convert it to an instance of the `Matrix` class using the following:

import csv 
data = [[int(j) for j in row] for row in csv.reader(f)] 
M = Matrix(data)

I then need to remove any particular scheduled slots that are not picked by any company:

M = matrix([row for row in M.transpose() if max(row) != 0]).transpose()

Once I've done this I can define the bi-partite graph (bi-partite simply means that the vertices can be separated in to 2 non adjacent collections):

g = BipartiteGraph(M)

We can then get a get a picture of this, I do this using a 'partition' (a graph colouring) that will colour the groups (red) and the schedule slots (blue):

g = BipartiteGraph(M)
p = g.coloring()'circular',dist=1,vertex_size=250, graph_border=True, figsize=[15,15],partition=p)

The various options I pass to the `show` command are simply to get the circular arrangement (and other minor things):

The above looks quite messy and what I essentially want is get as many pairwise matchings between the blue vertices (slots) and red vertices (companies) so that each schedule slot is attributed at most 1 company and every company has at least 1 schedule slot.

On any given graph $G=(V,E)$ this problem is known as looking for a maximal matching and can be written down mathematically:

Max:  $\sum_{e \in E(G)} m_e$
Such that:  $\forall v$ $\sum_{e \in E(G) \atop v \sim e} m_e \leq 1$

We are in essence finding a subset of edges of our original graph in such a way as to maximise the number of edges such that no vertex has more than 1 edge.

This is all explained extremely well at the +Sage Mathematical Software System documentation pages here.

Furthermore at the documentation the code needed to solve the problem is also given:

p = MixedIntegerLinearProgram()  
matching = p.new_variable(binary=True)
p.set_objective(sum(matching[e] for e in g.edges(labels=False)))
for v in g:
          for e in g.edges_incident(v, labels=False)) <= 1)

When I run the above, `p` is now a solved Mixed Integer Linear Program (corresponding to the matching problem described). To obtain the solution:

matching = p.get_values(matching)
schedule = [e for e,b in matching.iteritems() if b == 1]

Calling `schedule` gives a set of edges (denoted by the corresponding vertex numbers):

[(5, 57), (0, 45), (23, 50), (4, 42), (38, 60), (26, 56), (34, 62), (16,
68), (1, 43), (7, 40), (9, 44), (36, 58), (12, 49), (35, 71), (28, 66),
(25, 47), (24, 53), (6, 46), (3, 64), (39, 67), (17, 69), (22, 55), (13,
48), (33, 41), (10, 63), (21, 61), (30, 52), (29, 65), (37, 70), (15,
54), (19, 51), (11, 59)]

It is then really easy to draw another graph:

p = B.coloring()'circular',dist=1,vertex_size=250, graph_border=True, figsize=[15,15],partition=p)

Which gives:

You can see that the obtained graph has all the required properties and most importantly is a lot less congested.

Some details.

I'm leaving out some details, for example I kept track of the names of the companies and also the slots so that the final output of all this looked like this:

4InARow: Fri1100
Abacus: Tue0930
Alpha1: Thu1130
Alpha2: Mon0930
AusfallLtd: Fri1230
AxiomEnterprise: Thu1000
Batduck: Thu1430
CharliesAngles: Thu1500
CwmniRhifau: Mon1330
EasyasPi: Fri1130
Evolve: Thu1300
HSPLLtd.: Mon1030
JECT: Tue1200
JJSL: Thu1030
JNTL: Mon1400
JennyCash: Tue1630
KADE: Fri1330
MIAS: Fri1300
MIPE: Thu1100
MLC: Tue1600
Nineties: Mon0900
Promis: Fri1400
R.A.C.H: Tue1530
RYLR: Tue1230
SBTP: Fri1030
Serendipity: Mon1230
UniMath: Tue1300
VectorEnterprises: Tue1330
Venus: Thu1400
codeX: Mon1300
dydx: Wed1630
eduMath: Thu0930

(BatDuck is my favourite company name by far...)

Why did I do this this way?

There are 3 reasons:

1. I shared the schedule with my students through a published Sage sheet on our server. That way they can see a direct applied piece of mathematics and can also understand some of the code if they wanted to.
2. "Point and click doesn't scale" - I'm sure I could have solved this one instance of my problem with pen and paper and some common sense faster than it took me to write the code to solve the problem. The thing is next year when I need to schedule these talks again it will at most take me 3 minutes as the code is all here and ready to go. (Most readers of this blog won't need that explained but if any of my students find their way here: that's an important message for you).
3. It was fun.