Saturday 28 September 2013

Counting words in all my LaTeX files with Python

So today I found out about latexcount which will give a nice detailed word count for LaTeX files. It's really easy to use and apparently comes with most LaTeX distributions (it was included with my TeXlive distribution under Mac OS and Linux).

To run it is as simple as:

$ texcount file.tex

It will output a nice detailed count of words (breaking down by sections etc...). I don't pretend to be an expert in anything but I'm genuinely really surprised that I had never seen this before. I was about to write a (most probably terrible) +Python script to count words in a given file and just before starting I thought "WAIT A MINUTE: someone must have done this"...

Anyway, I've gotten to the point of not being able to watch TV without a laptop at the tip of my fingers doing some type of work so whilst keeping an eye on South Africa playing ridiculously well in their win over Australia in the rugby championship I thought I'd see if I could have a bit of fun with texcount.

Here's a very simple Python script that will recursively search through all directories in a given directory and count the words in all the LaTeX files:

#!/usr/bin/env python
import fnmatch
import os
import subprocess
import argparse
import matplotlib.pyplot as plt
import pickle

def trim(t, p=0.01):
    """Trims the largest and smallest elements of t.

    Args:
    t: sequence of numbers
    p: fraction of values to trim off each end

    Returns:
    sequence of values
    """
    t.sort()
    n = int(p * len(t))
    t = t[n:-n]
    return t

parser = argparse.ArgumentParser(description="A simple script to find word counts of all tex files in all subdirectories of a target directory.")
parser.add_argument("directory", help="the directory you would like to search")
parser.add_argument("-t", "--trim", help="trim data percentage", default=0)
args = parser.parse_args()
directory = args.directory
p = float(args.trim)

matches = []
for root, dirnames, filenames in os.walk(directory):
  for filename in fnmatch.filter(filenames, '*.tex'):
        matches.append(os.path.join(root, filename))

wordcounts = {}
fails = {}
for f in matches:
    print "-" * 30
    print f
    process = subprocess.Popen(['texcount', '-1', f],stdout=subprocess.PIPE)
    out, err = process.communicate()
    try:
        wordcounts[f] = eval(out.split()[0])
        print "\t has %s words." % wordcounts[f]
    except:
        print "\t Couldn't count..."
        fails[f] = err

pickle.dump(wordcounts, open('latexwordcountin%s.pickle' % directory.replace("/", "-"), "w"))


try:
    data = [wordcounts[e] for e in wordcounts]
    if p != 0:
        data = trim(data, p)
    plt.figure()
    plt.hist(data, bins=20)
    plt.xlabel("Words")
    plt.ylabel("Frequency")
    plt.title("Distribution of words counts in all my LaTeX documents\n ($N=%s$,mean=$%s$, max=$%s$)" % (len(data), sum(data)/len(data), max(data)))
    plt.savefig('latexwordcountin%s.svg' % directory.replace("/", "-"))
except:
    print "Graph not produced, perhaps you don't have matplotlib installed..."

(Please forgive the lack of comments throughout the code...)

Here it is in a github repo as well in case anyone cares enough to want to improve it.

Here is what calls it on my entire Dropbox folder:

$ ./searchfiles.py ~/Dropbox

This will run through my entire Dropbox and count all *tex files (it threw up errors on some of my files so I have some error handling in there). It will output a dictionary of file name - word count pairs to a pickle (so you could do whatever you want with that) file but if you have matplolib installed it should also produce the following histogram:

 


As you can see from there it looks like I've got some files quite a lot bigger than the others (I'm guessing latexcount will count individual chapters as well as the entire thesis.tex files that I have in there that include them...). So I've added an option to trim the data set before plotting:

$ ./searchfiles.py ~/Dropbox -t .05

This takes 5% of the data off each side of our data set and gives:

 


Looking at that I have a lot of very short LaTeX files (which include some standalone images I've drawn to do stuff like this). If I had time I'd see how good a negative exponential fits to that distribution as it does indeed look kind of random. I'd love to see how others' word count distribution looks...

Now, I can say that if I ever produce more than 600 words then I'm doing above average work...

Friday 20 September 2013

Revisiting examples of computer assisted mathematics


I'm in the middle of a finishing off some last things for a brand new course we're teaching at +Cardiff University starting in October. I plan on using my first lecture to explain to our new students how important computing/programming/coding is for modern mathematicians.

I will certainly be talking about the 4 colour theorem which states that any map can be coloured using 4 colours. I'll probably demo a bit of what +Sage Mathematical Software System can do. Here's a sage cell that will demo some of this (click evaluate and you should see the output, feel free to then play around with the code).


There I've written some very basic code to show a colouring of a graph on 9 vertices.

I expect that Students might find that interesting in particular if I show colourings for non planar graphs. For example here's another cell showing the procedure on complete graph on 9 vertices:


That is just a couple of 'qwerky' things that don't really go near the complexities of the proof of the 4 colour theorem.

I took to social media in the hope of asking for more examples of cool things that mathematicians use computers for. Here's a link to the blog post but without a doubt the most responses I got was on this G+ post.

You can see all the responses on that post but I thought I'd try to compile a list and quick description of some of the suggestions that caught my eye:
  • This wiki was pointed out by +Kevin Clift which contains a large amount of great animations (like the one below) made by 'Keiff':
'This is the curse of computing: giving up understanding for an easy verification.'
  • +Joerg Fliege mentioned chess endgame tablebases which I think would be a cool thing to point out to students.
  • +David Ketcheson+Dima Pasechnik and others pointed out how computer algebra systems are more or less an everyday tool for mathematicians nowadays. When I'm finding my way through a new project I normally always have a sage terminal open to try out various algebraic relationships as I go...
There are a couple of other things that I'm not listing above (mainly because I don't know enough about them to be able to comment), but interestingly enough +Timothy Gowers posted the other day a link to a paper that he has co-authored entitled: 'A fully automatic problem solver with human-style output'. In that paper a new program is described able to produce human style proofs of theorems. A blog post that +Timothy Gowers put up a while back was actually an experiment for this paper.

I'm obviously missing a large amount of other stuff so please do let me know :)

Sunday 15 September 2013

A playlist of short intro to LaTeX videos using the cloud based writeLaTeX.com

I've just finished screencasting, editing and uploading 21 videos to a playlist on +YouTube.

Quite frankly I'm exhausted (I started this morning), it's been a long day and I'll be happy to not have to drag another 'Ken Burns' effect boundary box around for quite a while...

Anyway, the videos are for a new course I'm teaching our +Cardiff University first year undergraduates. LaTeX has never actually been on the curriculum before and I'm throwing it in at the end of a long term (where students will mostly be learning Python and +Sage Mathematical Software System).

There are a bunch of really really great LaTeX tutorials all over the web but I thought I'd put together really short videos (the longest is around the 4 minute mark but all the others are less than 2 minutes) that just show syntax. By watching these videos no one will be a LaTeX expert (that's where the other tutorials come in).

The other particularity of these videos is that I've done them entirely using +writeLaTeX (https://www.writelatex.com/) which is a great cloud based LaTeX environment. If you don't know about writeLaTeX take a look at this great video by +John Hammersley (1 of the guys behind +writeLaTeX):


The great advantage of writeLaTeX is that I can include 2 links in the description of every video I've put together:

1. A link that creates a blank document so that anyone watching my video can right then and there try out the code.
2. A 'read only' link to the actual document I used during the video. Once this document is being viewed anyone can choose to make a copy of it so that they can play with the actual finished article.

For example: here's the video showing how to include inline mathematics:


1. Here's the link to a blank document.
2. Here's the link to the code created in the video. On the top right of that document you should see this link (which allows you to create your own copy that you can edit):


I'm hoping this will all be quite helpful to my students and in particular encourage the use of LaTeX by removing certain difficulties that some have with the installation process.

I'm also making these videos public in case they might be helpful to others (a large chunk of the videos I've put together for the programming aspects of this course are 'Unlisted' as I don't think they'd interest anyone).

Anyway the whole playlist can be found here.

There are a couple of other cloud based LaTeX environments out there (sorry for mentioning them here +John Hammersley!) including the 1 that is a part of the ridiculously awesome https://cloud.sagemath.com/. At the moment though, the user friendliness and amazing response to users (on G+ +writeLaTeX are extremely helpful) makes this an ideal tool for learning LaTeX.

Saturday 14 September 2013

For no reason whatsoever: animated gifs of random matrices

Here are some animated gifs of random matrices done in Sage:




Here's a smaller 10 by 10 matrix (the above are 500 by 500) which probably needs to come with a health warning:



The code to do this using Sage is pretty easy (it makes use of the plot method on matrices):


import os

size = 500
nbrofmatrices = 100

for i in range(nbrofmatrices):
    print "Ploting matrix: %i of %s" % (i + 1, nbrofmatrices)
    A = random_matrix(RR,size)
    p = A.plot(cmap='hsv')
    p.save('./plots/%.3d.png' % i)

print "Converting plots to gif"
os.system("convert -loop 0 ./plots/*png %sanimatedmatricesofsize%s.gif" % (nbrofmatrices, size))

Each of the three above gifs were made using different colour maps: (ie changing the cmap option).

This creates 100 random 500 by 500 matrices and uses imagemagik (that's a link to a blog post I wrote about creating these) to create an animated gif.


Monday 9 September 2013

Handling data files in a Sage notebook (and some linear regression in Sage)

One of my most viewed videos on +YouTube is the following (briefly demonstrating how to import csv files in to +python):



I'm in the middle of preparing various teaching materials for an upcoming class on coding for mathematicians. I'm teaching this class in a flipped classroom (if you don't know what that is circle +Robert Talbert on G+ who posts a lot of great stuff about it) and as a result I've been screen casting a lot recently. Some of these clips are solely intended for my students (as I don't believe they'd be of interest to anyone else 'to do this exercise try and think about what your base case would be for the recursion to terminate'). I'm just starting to screen cast for the +Sage Mathematical Software System part of the course and needed to put together a little something as to how to import data in to a Sage notebook. As the above video seemed quite helpful to people I thought I'd put together another one that might be helpful.



The data file used can be found here.

Here are the lines of code used in the notebook:

import csv

f = open(DATA + 'reg', 'r')  # Open the data file using the special DATA variable
data = csv.reader(f)
data = [row for row in data]  # Read data using csv library
f.close()

data = [[row[2], row[1]] for row in data[1:]]  # Only use data that is of interest, remove unwanted columns and 1st row

a, b = var('a, b')  # Declare symbolic variables
model(t) = a * t + b  # Define model

fit = find_fit(data, model, solution_dict=True)  # Find fit of model to data

model.subs(fit)  # View the model

p = plot(model.subs(fit), 5, 11, color='red')  # Plot fit
p += list_plot(data)  # Plot data
p

Examples of computer assisted mathematics needed.

This blog post is a very slight expansion of this G+ post where I've asked people for example of computer assisted mathematics. I'm posting here in the hope of maximum exposure.

I'm in the middle of preparing a new course that will be teaching +Python and +Sage Mathematical Software System to our first year undegraduates at Cardiff university. As such I'd appreciate as many great example as I can get so please do let me know of anything :)

I use the term mathematics in a loose sense to include 'things' that are not necessarily proofs:

- identifying conjectures;
- visualisations;
- etc

I'm familiar with the 'well known' results that can be found on this +Wikipedia page: http://goo.gl/EjyKel

I can also recommend the book 'A=B' by  Petkovsek, Wilf and Zeilberger which you can download here: http://www.math.upenn.edu/~wilf/AeqB.

The search for prime numbers is also a pretty cool thing that I can talk about.

Even some example of things that have an obvious need of computers would be great to hear about. For example I'm quite familiar with discrete event simulation techniques that I'll be mentioning to the students. Here's a video I put together a while back showing how to use a computer to simulate a queue:


Thursday 5 September 2013

Creating a slideshow video with YouTube in 4 + 1 ridiculously easy steps.

I've been taking photos on my way to work quite regularly of the construction of a new building by Cardiff University: http://goo.gl/k9qYN. I've been posting them on G+ which has been nice as you can click through and see all the photos and click through to see the progression of the building: http://goo.gl/CnFGd1.

However, now that I've stopped taking the pictures (the building is open) I thought I'd try and put together a video slideshow. 

My immediate thought was to see if the YouTube video editor would allow me to do this: it does and it's really easy!

1. Click on 'Upload' on YouTube which brings up the screen below. From there click on 'Photo slideshow':

2. Either upload photos (I guess) or go straight to your G+ albums. You can see in the screenshot that my 'Cardiff University Maindy Park Construction' is right there:


3. The photos come up right there and you can change the order if need be:


4. You can then mess with some settings and include some music:


5. That's pretty much it but you can click on 'Advanced editor' which brings up the usual video editor that you can use to add annotations and stuff to the video.


Here's the video I put together:



I really like how all these tools talk to each other. As the photos I take go straight to my G+ account anyway through instant backup I might well do a couple more of these which I could always make 'unlisted' if I don't think they're of public interest so that it's easy to share with friends and family.