QuickRank: a C++ Suite of Learning to Rank Algorithms
Accepted at the 6th Italian Information Retrieval Workshop, Cagliary, Italy, 25–26 May 2015 .
Abstract. Ranking is a central task of many Information Retrieval (IR) problems, particularly challenging in the case of large-scale Web collections where it involves effectiveness requirements and efficiency constraints that are not common to other ranking-based applications. This paper describes QuickRank, a C++ suite of efficient and effective Learning to Rank (LtR) algorithms that allows high-quality ranking functions to be devised from possibly huge training datasets. QuickRank is a project with a double goal: i) answering Tiscali industrial need for a flexible and scalable LtR solution for learning ranking models from huge training datsets; ii) providing the IR research community with a flexible, extensible and efficient LtR framework to design LtR solutions and fairly compare the performance of different algorithms and ranking models. This paper presents QuickRank design choices and report about some preliminary use experiences.
Learning Algorithms. QuickRank provides open-source C++ multi-threaded implementations of GBRT, λ-MART, and O-λ-MART (i.e., oblivious λ-MART). It is worth mentioning that no implementation of the O-λ-MART algorithm was previously made publicly available. For all these algorithms, QuickRank accepts training sets in the SVM-light format and produces a XML file storing the learned ranking model.
Document Scoring Functions. QuickRank also provides a framework for the cost evaluation at scoring time of the learned models. The evaluation is implemented as a two-step process. During the first step, a tree-based model is converted in a C++ plugin function implementing the scoring of a given document. QuickRank implements three state-of-the-art code generation strategies. The first strategy is named CodeGen. Each decision tree is translated into a sequence of C++ if-then-else blocks. The second strategy is VPred algorithm. A decision tree of maximum depth d is converted into a d-step traversal of an array storing the tree’s nodes. The third strategy only applies to O-λ-MART. In this case, the d tests are used to set a bitmask of d bits. The integer value of the final bitmask is then exploited to look up the value from a vector of 2d outcomes, i.e., to select the proper leaf of the oblivious tree. During the second step, the generated source code is compiled and linked with the QuickRank framework. The compiled function is invoked by the framework on each document of the given test set, and the total scoring time is measured. Evaluating the cost of scoring time of a ranking model is important in several application scenarios, and it is receiving more attention in the IR community. We believe that QuickRank can both provide implementation of state-of-the-art algorithms and serve as a fair evaluation framework.
To the best of our knowledge, QuickRank is the first framework addressing both the learning and the scoring processes. QuickRank source code is publicly available for non commercial uses under a Reciprocal Public License 1.5 at the QuickRank website.