Sunday 16 June 2013

Why and how: open education resources.

This is the fourth post in a series of posts reflecting on the teaching and learning in a recent course I've taught on SAS and R. This post will be quite different in nature to the previous posts which looked at students choices between SAS and/or R in their assessment:
Here I would like to talk about how I deliver teaching materials (notes, exercises, videos etc) to my students.

All of the teaching materials for the course can be found here: drvinceknight.github.io/MAT013/.

How they got there and why I think it's a great place for them to be will be what I hope to discuss...

A Virtual Learning Environment that was not as good as alternatives.

At +Cardiff University we have a VLE provided that is automatically available to all our students that all lecturers are encouraged to use. So when I started teaching I diligently started using the service but it had various aspects that did not fit well with my workflow (having to upload files on every change, clunky interface that actually seemed optimised for IE and various other things). It was also awkward (at the time, I believe that this has been addressed now) for students to use the environment on smart phones etc...

As an alternative, I setup a very simple website using google sites and would use +Dropbox's public links to link pdfs and other resources for my students. An example of such a delivery is these basic Game Theoretical materials. This gave me great control, I no longer had to mess around with uploading versions of files, every change I made was immediately online and also as the site was pretty simple (links and pdfs) it was easily accessible to students on all platforms (I could also include some YouTube videos).

An immediate consequence of this approach is that my materials are all publicly available online.

To anyone, our students or not. The first thing I did was check with +Paul Harper: the director of the MSc course that I was only teaching on at the time that this was ok. We chatted about it a bit and were both happy to carry on. My main train of thought was that there are far better resources already available online so mine might as well be. (I've subsequently checked with our School's director of learning and teaching and there's no internal regulations against it which is nice to know about +Cardiff University)

There is a huge amount of talk about open access in research (I won't go in to that here) but less so to some extent in teaching. I did find this interesting newspaper article that ponders as to "Why don't more academics use open educational resources?". This offers a good general discussion about open education resources.

I would feel very very humbled if anyone chose to actually use my resources. I'm at the early part of my career and am still learning so I don't think that will happen anytime soon but there is another more important benefit to having my teaching stuff online for everyone.

I always post about any new courses I'm working on, on G+ and am grateful to get a fair bit of feedback from other academics around the world. This in itself gives me a certain level of confidence in front of my students who know that what I'm teaching them is verifiable by anyone in the world. I've often changed a couple of things based on feedback by other academics and I think that's brilliant.

To some extent my teaching resources are not just reviewed by a couple of peers in my university but also by anyone online who might be interested in them.

(It would be great if research worked this way too)

Through G+ (I've posted about how awesome a tool G+ is as an academic personal development tool) I learnt about git and github. If you don't know about git watch this video by +ZoĆ« Blade is very helpful:


After a while I jumped in and starting using it. After a little longer while I found out that you can use github to host a website:


Using this I it is really easy to put together a very basic website that has all the teaching materials. The added benefit is that the materials are now all in a github repo which opens them up even more (using dbox, only the pdf files were in general in view) whereas now everything is (md, tex source files etc...) and theoretically if anyone wanted to they could pull etc...

I'm certainly not the first person to put teaching stuff up on github, (watching people like +Dana Ernst+Theron Hitchman and various others do it is what made me jump in).

The github repo for my R and SAS course can be found here and here are some other teaching things I have up on github (with the corresponding webpage if I've gotten around to setting it up):
To finish off here are the various reasons I have for putting my teaching stuff up on github:
  • Openness:
    • my students know that this is viewable by everyone which hopefully gives the resources a level of confidence;
    • people on G+ and elsewhere are able to point out improvements and fixes (if and when they have time);
  • Access: the sites are really simple (basic html with links) so they can be viewed on more or less anything;
  • Ease of use: I don't have to struggle to use whatever system is being used. If it's an option I kind of refuse to use stuff that makes me less efficient (an example of this is our email system: I use gmail). At the moment the system I like is github + git.
I wrote a blog post (which is the most read thing I've ever written - online or offline - I think) showing how to combine various things like tikz, makefiles, +Sage Mathematical Software System etc to automate the process of creating a course site so I'll put a link to that here.

Sunday 9 June 2013

Comparing Recursive and Iterative Algorithms: Binary Search and Factorial

I'm in the middle of putting together a new course for our undergraduates at Cardiff University. The course is called 'Computing for Mathematics' and will introduce our first year students to programming in general (using python) as well as how a mathematics package can help them during there degree (we'll be using +Sage Mathematical Software System which is a natural extension from python and is also super awesome).

I was prepping some stuff on recursion (which I'm really looking forward to teaching to our mathematics students given the connection to induction) and came across a bunch of posts stating the lack of speed generally associated to recursion:
I thought I'd write some (python) code to see how much slower recursion was. All the code (and data) is in this github repo.


Binary search


The first algorithm I thought I'd take a look at was binary search. I tried to write each algorithm in as basic a way as possible so as to allow for the best possible comparison.

Iterative

Here's the algorithm written iteratively:

def iterativebinarysearch(target):
    """
    Code that carries out a binary search
    """
    first = 0
    last = len(data)
    found = False
    while first <= last and not found:
        index = int((first + last) / 2)
        if target == data[index]:
            found = True
        elif target < data[index]:
            last = index - 1
        else:
            first = index + 1
    return index

Recursive

And here's the algorithm written recursively:

def recursivebinarysearch(target, first, last):
    """
    Code that carries out a recursive binary search
    """
    if first > last:
        return False
    index = int((first + last) / 2)
    if target == data[index]:
        return index
    if target < data[index]:
        return recursivebinarysearch(target, first, index - 1)
    else:
        return recursivebinarysearch(target, index + 1, last)
    return index

The experiment

I timed 10 runs of each of these algorithms on data sets of varying size, for each size choosing a random 1000 points to search. The data is all available in this github repo.

Here's a scatter plot (with fitted lines) for all the data points:



A part from the fact that binary search seems very good indeed, there's not that much going on here apart from perhaps a slight tendency for iterative approach to be a bit slower.

I decided to take a look at the mean time (over the 1000 searches done for each data set):



This seems to show that the iterative approach is slow but again it's not very clear. This is mainly due to the fact that I haven't done any clever analysis. The data sets are pretty big (10,001,000 data points plotted in the 1st graph and 10,001 in the 2nd) so to do anything really useful I'd have to take a look at the data a bit more carefully (the two csv files: 'recursivebinarysearch.csv' and 'iterativebinarysearch.csv' are both on github).

I thought I'd try a 'simpler' algorithm as there are perhaps a bunch of things going on with the binary search (size of data set, randomness of points chosen etc...).

Computing Factorial


The other algorithm I decided to look at was the very simple calculation of $n!$.

Iteration

Here's the simple algorithm written iteratively:

def iterativefactorial(n):
    r = 1
    i = 1
    while i <= n:
        r *= i
        i += 1
    return r

Recursion

Here's the algorithm written recursively:

def recursivefactorial(n):
    if n == 1:
        return 1
    return n * recursivefactorial(n - 1)

The experiment

This was a much easier experiment to analyse however as the timings increased I thought it would also be interesting to look at the ratio of the timings:



We see that first of all the iterative algorithm seems to perform better but as the size of $n$ increases we notice that this improvement is not as noticeable. My computer maxed out it's stack limit  so I won't be checking anything further but I wonder if the ratio would ever get bigger than 1... (This data set: 'factorial.csv' is also on github).

I'm sure that there's nothing interesting in all this from a computer scientists point of view but I found it a fun little exercise :)

Saturday 8 June 2013

Student choices between SAS and R in Individual Coursework

This is the third post is a small series of posts reflecting on my teaching in a new class (all teaching materials can be found here) introducing students to SAS and R on our MSc course. The two previous posts have so far just been a reflection on student's attitudes towards each piece of software.
In the first of those posts I said how I was slightly surprised at how students had chosen to use SAS for 1 particular question where they had the option of the language. In my opinion it was a problem much easier to tackle with R (all the course works, class tests etc can be found on this site). I also mentioned in that post how I asked students which language they preferred. Almost all students answered that it depends on the task (which is a great answer) but after pushing them for a particular decision a strong majority seemed to prefer R. I posted on G+ recently about a particular interaction I've had subsequently with a student which seemed to confirm this attitude of needing to find the correct tool for the correct job.

In the second post I described how students in their group presentations (asking them to teach me something I had not taught them) mostly evaluated SAS v R for certain tasks. It was great to see them identify strengths and weaknesses for each language.

This post is about their choices in their individual coursework.

Similar to their class test (which I discuss in the first post of this series) there was a question which allowed students to choose a language in this individual coursework component of the class (which can be found here).

In my opinion this question made use of quite a big data set (generated by some research I'm currently doing) and I thought it would probably be simpler to approach in SAS. About 52% of the class agreed with me whilst 48% seemed to still prefer R. First of all, I could be wrong and R could indeed be better suited for this question, secondly it might also be a reflection of the personal preferences that the students seemed to indicate when I asked: most students seemed to prefer R. If the latter is the case then I suppose it's nice to see that students not just realise that there's a better tool for a given job but also a better tool for a particular person doing a given job. I'll be keeping a track of this over the years and see how (if) it changes.

In my next post in this series I'll start to reflect on some of the teaching methodologies I used.

Monday 3 June 2013

An online list of mathematics books

A couple of days ago +James Noble posted a link on G+ to a spreadsheet on G+ that is editable publicly inviting people to contribute mathematics books.


The link to the sheet it here and so far it has 65 books on it (there is also a meta list of lists that has 8 lists of books; to which I've added these 2 previous posts: about mathed and about game theory) and a website I've put together that uses it as an underlying database (which I'll talk about below) can be found here.

I think this is a great idea!

It's completely open and anyone can add to it and also benefit from it. I've used various keywords to search and have found 1 or 2 books that I didn't know about.

Some initial analysis of the list

I decided I'd throw some python at this spreadsheet and first of took a look at the contributors. In particular how many books are individuals contributing (a link to all the python code is available here).



As you can see about 50% of people contribute more than 1 book (including yours truly).

I was also curious as to how many authors were listed:



Here however it seems rare for an author to have more than 1 book in the list. The most prolific author is J. SCOTT CARTER with 5 books.

Some very basic natural language analysis

Two fields in the data set that I have not yet mentioned are the 'Overview' field and the 'Target' field. These two allow for some free text describing what the book is about and who it is aimed at. I used the nltk python library (python really is crazy, anything and everything has a library) to take a look at the frequency of "uncommon" words in these two fields.

First of all the description of the books:



Nothing too surprising here (I should perhaps remove certain words from this frequency analysis) but it's cool to see that "History" is pretty high up, as well as "Teaching" and "Philosophy".

The final piece of analysis is the frequency of the words in the description of who the book is for:



An initial glance at this seems to indicate that the books on the list are mainly aimed at students and or/teachers. It would be nice I suppose to see more books for research...

A very basic static website

The other thing that the code I've put together does is write a website that is hosted using (github pages) and lists the various books (as well as gives up an up to date analysis of what you see above). The website can be found here


The website is very very basic and all static but I've scripted everything (including the download of the spreadsheet although I haven't included that on github yet) so if it's helpful and anyone wants me to update the site just let me know (give me a nudge on G+ and I even check twitter sometimes).

The github repo for all this is here. If you have any ideas for further stuff that could be done to this data set then please improve my analysis or just let me know :) It would be nice to do something slightly smarter with the nltk and also do some analysis of the links between the books (although it would probably need a bunch more books... for that to be insightful)...

A really great job by +James Noble 

I think stuff like this is really great and it's been nice chatting to James about it.


It was cool to watch the list grow (I saw James's post pretty early so go to watch people make a bunch of contributions) and it'd be nice to see it grow even more.