DocEng 2011: Probabilistic Document Model for Automated Document Composition
The 11th ACM Symposium on Document Engineering
Mountain View, California, USA
September 19-22, 2011
Probabilistic Document Model for Automated Document Composition
Niranjan Damera-Venkata, Jose Bento, Eamonn O'Brien-Strain
We present a new paradigm for automated document composition based on a generative, unified probabilistic document model (PDM) that models document composition. The model formally incorporates key design variables such as content pagination, relative arrangement possibilities for page elements and possible page edits. These design choices are modeled jointly as coupled random variables (a Bayesian Network) with uncertainty modeled by their probability distributions. The overall joint probability distribution for the network assigns higher probability to good design choices. Given this model, we show that the general document layout problem can be reduced to probabilistic inference over the Bayesian network. We show that the inference task may be accomplished efficiently, scaling linearly with the content in the best case. We provide a useful specialization of the general model and use it to illustrate the advantages of soft probabilistic encodings over hard one-way constraints in specifying design aesthetics.