Monday, 4 October 2010

Damned PDF's

Public Authorities have up to now been able to get their own back on those of us who make extensive use of Freedom of Information requests by supplying the information in such a format which makes copying and pasting or analysing nigh impossible or very time consuming.

My favourite public authority, the North Wales Police are experts at such procedures; in that they publish responses in read only PDF formats so if I wish to include a copy of the published response in a blog I either copy it word by word or scan the whole document.

It was therefore a pleasure to read the announcement made yesterday by Francis Maude MP that all FOI responses are to be made machine readable and more accessible:
Cabinet Office minister Francis Maude told the Conservative party conference in Birmingham that the Freedom of Information Act will be amended so that all data released through FoI must be in a reusable and machine readable format.

The change in the law will mean that FoI data is "available to everyone and able to be exploited for social and commercial purposes", he said on 3 October 2010.

This means using formats such as .csv, an open spreadsheet format, or .xls, which is used by Microsoft's Excel – but not the portable document format. PDFs can only be opened as visual files, with software such as spreadsheets unable to extract the actual information.

...""We want to go much further," he told delegates. "Thousands of commercial and social entrepreneurs have been frustrated by their inability to obtain and reuse datasets. I'm sorry to say that some councils spend time and money deliberately making data unusable to anyone else."

HT to Heather Brooke @newsbrooke


Cibwr said...

tying things to a closed format such as anything from Microsoft is a backwards step. PDFs are actually an open format and there are plenty of programs that can extract the data from them (I agree most you can't copy and paste easily). Open formats are essential if documents will be accessible in the future, as well as enabling people to extract the data from them.

Peter D Cox said...

Oh dear, politicians and technology!

The problem is not the pdf format at all, but how local authorities use (misuse) it. It is one of the most open and easily accessible (works on screen readers well if properly done) file formats that can be quickly indexed and - again if properly done - can be very small files.
Most IT people are simply ignorant of:
a) the need to save the file in an accessible format (these can be easily cut and pasted from the free Acrobat reader, or using the standard viewer on a mac computer). Setting security so this cannot be done is often what people do intentionally or through ignorance.
b) Even Adobe's own software is not great at compressing very large documents eg planning applications, reports to council: for this you need a pdf shrinking programme, they cost very little and can often reduce the file size (and hence Internet traffic, expensive storage, backup space etc) by up to 70%. Even WAG doesn't do this, and they are otherwise pretty well organised!
c) Microsoft Word is the worst possible format in which to exchange documents and should be banned, not encouraged. Its multiple versions make accurate rendering of the original document impossible and you need to have bought the expensive Word programme. File sizes can be horrendous
d) Portable Document Format - which has freely available readers for all computers was invented for such purposes and it works. What doesn't is ignorant or simply obstructive users.