Research and Code


PIC

 

Motto

“Data! Data! Data!” he cried impatiently.
“I can’t make bricks without clay.”
— Sherlock Holmes.

 
 
 

Learning to Rank

Most of the work on Learning to Rank is described here: http://learningtorank.isti.cnr.it/.

PIC

RankEval

RankEval [3] is an open-source tool for the analysis and evaluation of Learning-to-Rank models based on ensembles of regression trees. The success of ensembles of regression trees fostered the development of several open-source libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models. RankEval aims at providing a common ground for several Learning to Rank libraries by providing useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models.

RankEval is available on GitHub: https://github.com/hpclab/rankeval.

PIC

QuickRank

QuickRank [2] is an efficient Learning to Rank toolkit providing multithreaded C++ implementation of several algorithms. QuickRank was designed and developed with efficiency in mind.

QuickRank is available on GitHub: https://github.com/hpclab/quickrank.

Pattern Mining

PaNDa+

This is the result of our work on mining patterns, in particular dense “rectangles”, in noisy binary datasets. The implementation of the TKDE’14 paper [7] is available here.

Direct Local Pattern Sampling

Published at KDD’11 [1], this is the result of a collaboration with M. Boley, Sandy Moens, Daniel Paurat and Thomas Gärtner from the University of Bonn. website.

DCI-Closed

This is the result of our work on Pattern Mining algorithm for the discovery of closed frequent itemsets, in particular it implements three of our papers at ICDM’07 [6], TKDE’06 [4] and SDM’06[5], in one comprehensive software specialized for the dense datasets. In fact, it supports both out-of-core and multi-core mining. download.

Find-Rules

This software generates association rules with given minimum confidence from a collection of frequent itemsets. Differently from others, it does not need a downward closed collection. It extracts all the possible association rules, without assuming that given a frequent itemset the supports of its subsets is known. The input format is the usual ascii format: 1 2 3 (95). download.

Logistic PCA

This is a porting to python of Andrew I. Schein’s matlab code from the paper “A Generalized Linear Model for Principal Component Analysis of Binary Data”. download.

References

[1]   Mario Boley, Claudio Lucchese, Daniel Paurat, and Thomas Gärtner. Direct local pattern sampling by efficient two-step random procedures. In KDD ’11: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 582–590, August 21-24 2011.

[2]   Gabriele Capannini, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Nicola Tonellotto. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, 2016.

[3]   Claudio Lucchese, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. Rankeval: An evaluation and analysis framework for learning-to-rank solutions. In SIGIR ’17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.

[4]   Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions On Knowledge and Data Engineering, 18(1):21–36, 2006.

[5]   Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Mining frequent closed itemsets out of core. In SDM ’06: Proceedings of the third SIAM International Conference on Data Mining, April 2006.

[6]   Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Parallel mining of frequent closed patterns: Harnessing modern computer architectures. In ICDM ’07: Proceedings of the Seventh IEEE International Conference on Data Mining, pages 242–251, November 2007.

[7]   Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. A unifying framework for mining approximate top-k binary patterns. IEEE Transactions On Knowledge and Data Engineering, 26(12):2900–2913, 2014.

Share on