Monday, 8 April 2013

Using sed to make sure hyper links refer to the correct format of file when using pandoc

In a previous post (which is by far the most popular thing on this blog which is mainly due I believe to a kind retweet by +John Cook and SciPyTip) I describe the makefiles I use to write large documents containing Tikz diagrams and +Sage Mathematical Software System plots in multiple formats (pdf, html and docx) using markdown and pandoc.

This is a short follow up post to fix one minor thing. In the previous post if you created hyper links in your markdown file (say pointing to another markdown document, which for my purposes was multiple chapters in some teaching notes) then all the formats (pdf, html and docx) would all point to the same format.

Here's a slight modification of the script (from the previous post) which will use sed to replace the .md to the relevant format in all links.

#!/usr/bin/env python
from sys import argv
from os import system

e = argv[1][:-3]
print e

system("sed 's/.md/.html/g' %s > tmp" % (e + ".md"))
system("pandoc -s tmp -N -o " + e + ".html --mathjax")
system("sed 's/.md/.docx/g' %s > tmp" % (e + ".md"))
system("pandoc tmp -o " + e + ".docx")
system("sed 's/.md/.pdf/g' %s > tmp" % (e + ".md"))
system("pandoc tmp -N -o " + e + ".pdf --latex-engine=xelatex")
system("rm tmp")

This script creates a temporary file (tmp- this is actually important to make sure that the makefile doesn't think that you're recurrently changing the md files) that sed makes sure has the correct format for each link and pandoc can use to get the required files. For example the sed 's/.md/.html/g' command will replace all instances of ".md" with ".html". If you're not familiar with sed it's a *nix program that allows you to do find and replace in files "easily" (it's still a bit of voodoo to me). The script can be used by simply typing:


(Which assumes that has all links sending to other md files.)

I use this script in the makefile as I did in the previous post:

md = $(wildcard *.md)
htmls = $(

all: $(htmls)

    ./ $<

The final line of that file is not relevant to this blog post but it just runs a python script that will generate the website for the course (which is still a work in progress!). That python script doesn't do anything fancy but it goes in to each Chapter document for example and reads the Chapter title which it uses to write the index.html file. All this allows me to modify a given md file and simply run make to update the website file.

If any of this is of interest the github repo is here.

Note that in this particular case the links being "format loyal" is mainly useful for the `html` files to the general audience. The loyal links in pdf and docx format are mainly useful to me as all the paths are relative to my setup. I'm posting this again so that I don't forget it but also in case it's useful to others (you can obviously use sed to replace other things).