# On Conditional Branches in Optimal Decision Trees

Google Tech Talks

February 8, 2007

Abstract

Decision trees are one of the most fundamental programming abstractions, and a commonly used type of decision tree is the alphabetic binary tree, which uses (without loss of generality) "less than"/"greater than or equal to" tests in order to determine an outcome event. The process of finding an optimal alphabetic binary tree for a known probabily distribution on outcome events usually has the underlying assumption that the cost per decision is uniform and thus independent of the outcome of the decision. Algorithms without this assumption generally use one cost if the decision result is "less than" and another cost if it is "greater than." In practice, neither assumption is accurate for software to be optimized for a given microprocessor. The operation of the microprocessor generally means that the cost for the more likely decision outcome will be less --- often far less --- than the less likely decision outcome. This problem and generalizations thereof are applicable to hard coding static decision tree instances in software, e.g., for compiling switch statements or for fine-tuning program bottlenecks. An $O(n^3)$-time $O(n^2)$-space dynamic programming algorithm can solve this optimal binary decision tree problem, and this approach has many generalizations that optimize for the behavior of processors with predictive branch capabilities, both static and dynamic. Solutions to this formulation are often faster in practice than "optimal" decision trees as formulated in the literature. Different decision paradigms can sometimes yield even better performance.

February 8, 2007

Abstract

Decision trees are one of the most fundamental programming abstractions, and a commonly used type of decision tree is the alphabetic binary tree, which uses (without loss of generality) "less than"/"greater than or equal to" tests in order to determine an outcome event. The process of finding an optimal alphabetic binary tree for a known probabily distribution on outcome events usually has the underlying assumption that the cost per decision is uniform and thus independent of the outcome of the decision. Algorithms without this assumption generally use one cost if the decision result is "less than" and another cost if it is "greater than." In practice, neither assumption is accurate for software to be optimized for a given microprocessor. The operation of the microprocessor generally means that the cost for the more likely decision outcome will be less --- often far less --- than the less likely decision outcome. This problem and generalizations thereof are applicable to hard coding static decision tree instances in software, e.g., for compiling switch statements or for fine-tuning program bottlenecks. An $O(n^3)$-time $O(n^2)$-space dynamic programming algorithm can solve this optimal binary decision tree problem, and this approach has many generalizations that optimize for the behavior of processors with predictive branch capabilities, both static and dynamic. Solutions to this formulation are often faster in practice than "optimal" decision trees as formulated in the literature. Different decision paradigms can sometimes yield even better performance.