As I have previously discussed, Transport for London has a terribly inconsistent and difficult to comprehend policy on releasing data. Regular data releases are made occasionally but most of the useful data comes from Freedom of Information requests of their performance page. I have discussed how poor their performance statistics are – in summary because putting the data into a usable format requires a screen scraper or manual copy-and-pasting.
TfL have a habit of responding to Freedom of Information requests with a table inside of a PDF file. I am a big of fan of the PDF format as the next man, but certainly not for statistics. It is very annoying for wait 22 days for a response and then receive a request back that is useless. Such tables can not be copied into Excel or Google Docs at all, so manual entry or one of the numerous online converters (I find this one works best) is needed to do anything useful.
Is this a lack of technical knowledge from their FOI desk, or simply an attempt to make the job of the data journalist harder? The data is probably already in a spreadsheet before they publicly release it, so decide for yourself.
It would be nicer if they had a more jouranlist-friendly policy, but this is unlikely to change. As a present to Data Day readers, I am making available two stacks of Tube data for public consumption. The first set is copy-and-pasted performance stats for the 2010-2011 period, including comparisons to the year before.
The second is a spreadsheet version of all 2009 stations closures – originally in a PDF table from a FOI request on WhatDoTheyKnow. I used the converter above and tided up the data as it wasn’t 100% accurate. Enjoy!