Large Scale Ranking Problem: some theoretical and algorithmic issues
The talk is divided into two parts. The first part focuses on web-search ranking, for which I discuss training relevance models based on DCG (discounted cumulated gain) optimization. Under this metric, the system output quality is naturally determined by the performance near the top of its rank-list. I will mainly focus on various theoretical issues for this learning problem. The second part discusses related algorithmic issues in the context of optimizing the scoring function of a statistical machine translation system according to the BLEU metric (standard measure of translation quality). Our approach treats machine translation as a black-box, and can optimize millions of system parameters automatically. This has not been attempted before in this context. I will present our method and some initial results.