This course will cover the principal topics important to creating a working text categorization system. It will focus on the components of such a system and processes required to create it based on the practical experiences of the Scamseek project. The role of machine learning will be the center of the discussion but the surrounding tasks of language modeling, computational linguistics and software engineering will all be discussed to varying degrees. Discussion of some aspects of the Scamseek project are restricted under secrecy agreements with ASIC.
Author: Jon David Patrick, University of Sydney