Classifiers That Improve With Use
Januaary 29, 2007
Training on imperfectly representative data inevitably leads to classification errors. Retraining an OCR engine with post-edited data, or even with the imperfect labels assigned by the classifier, reduces both bias and variance. Although the theoretical foundations of decision-directed adaptation are meager, it has proved successful in diverse experiments. When the operational data can be partitioned into isogenous subsets, style-constrained classification is appropriate. Patterns should be recognized in groups rather than in isolation. Shape and language context are complementary. Operator interaction should be rationalized. Only dynamic classifiers can hope to rival human performance on imperfectly printed, written, copied, or scanned documents.