Propping Open the Document Trapdoor
Google Tech Talk
November 5, 2009
Presented by Steven R. Bagley & David F. Brailsford, School of Computer Science, University of Nottingham, NOTTINGHAM NG8 1BB , UK
Computer document processing often starts with an abstract, structural, representation before entering a processing pipeline which creates a desired layout and appearance. But unfortunately the whole system resembles a series of steps in a one-way chemical reaction, or the successive irreversible stages of creating assembler code using a compiler.
This `one-way function' behaviour is most obvious with PDF, which is tied to a completely fixed appearance once a document passes through a one-way 'trapdoor' like Adobe Distiller. Some formats, such as XHTML, allow for a little more wriggle room but even this breaks down if the appearance changes dramatically (such as displaying a Web page on a large monitor). In essence, any attempt to reflow a document, or view it at some other size, is either frustrating, or simply impossible, without regenerating the document from a more abstract, higher-level representation.
This limitation has not had much effect over the past 25 years, but it is now hitting us hard. In a world of iPhones, eBook Readers, 10" netbooks, laptops, 30" Cinema Displays -- and not forgetting the humble printed page -- it is no longer safe to assume that a document will be viewed in one fixed presentation. `Repurposing' (without the need for
total re-processing) needs to be the watchword for a modern document format. However, this leads us to the heart of the problem: current formats don't lend themselves to having their presentational properties partially unpicked and re-engineered.
In this talk, we outline the current state of the art in document formats, and their limitations when it comes to repurposing. We describe our attempts at making PDF be a more repurposable format and we outline some necessary features, and open questions, for future document formats.