PIC

Claudio Lucchese


I’m Associate Professor at Ca’ Foscari University of Venice. Prior to joining UniVe, I was researcher with the I.S.T.I. “A. Faedo” in Pisa, working with the High Performance Computing Lab. Here you may find contact info, résumé, and other details. Below some news and notes on my research activities in the areas of data mining and information retrieval.

Apr. 12 2017

On Including the User Dynamic in Learning to Rank

Accepted at SIGIR 17: ACM Conference on Research and Development in Information Retrieval [1].

Abstract. Ranking query results effectively by considering user past behaviour and preferences is a primary concern for IR researchers both in academia and industry. In this context, LtR is widely believed to be the most effective solution to design ranking models that account for user-interaction features that have proved to remarkably impact on IR effectiveness. In this paper, we explore the possibility of integrating the user dynamic directly into the LtR algorithms. Specifically, we model with Markov chains the behaviour of users in scanning a ranked result list and we modify Lambda-Mart, a state-of-the-art LtR algorithm, to exploit a new discount loss function calibrated on the proposed Markovian model of user dynamic. We evaluate the performance of the proposed approach on publicly available LtR datasets, finding that the improvements measured over the standard algorithm are statistically significant.

References

[1]   Nicola Ferro, Claudio Lucchese, Maria Maistro, and Raffaele Perego. On including the user dynamic in learning to rank. In SIGIR ’17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.

Share on

Apr. 12 2017

X-DART: Blending Dropouts and Pruning for Efficient Learning To Rank

Accepted at SIGIR 17: ACM Conference on Research and Development in Information Retrieval [1].

Abstract. In this paper we propose X-Dart, a new learning-to-rank algorithm focusing on the training of robust and compact ranking models. Motivated from the observation that the last trees of Mart models impact the prediction of only a few instances of the training set, we borrow from the Dart algorithm the dropout strategy consisting in temporarily dropping some of the trees from the ensemble while new weak learners are trained. However, differently from this algorithm we drops permanently these trees on the basis of smart choices driven by accuracy measured on the validation set. Experiments conducted on publicly available datasets shows that X-Dart outperforms Dart in training models providing the same effectiveness by employing up to 40% less trees.

References

[1]   Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. X-dart: Blending dropouts and pruning for efficient learning to rank. In SIGIR ’17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.

Share on

Apr. 12 2017

RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions

Accepted at SIGIR 17: ACM Conference on Research and Development in Information Retrieval [1].

Abstract. In this demo paper we propose RankEval, an open-source tool for the analysis and evaluation of Learning-to-Rank (LtR) models based on ensembles of regression trees. Gradient Boosted Regression Trees (GBRT) is a flexible statistical learning technique for classification and regression at the state of the art for training effective LtR solutions. Indeed, the success of GBRT fostered the development of several open-source LtR libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models. In addition, the implementations provided for even the most traditional IR evaluation metrics differ from library to library, thus making the objective evaluation and comparison between trained models a difficult task. RankEval addresses these issues by providing a common ground for LtR libraries that offers useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models.

References

[1]   Claudio Lucchese, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. Rankeval: An evaluation and analysis framework for learning-to-rank solutions. In SIGIR ’17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.

Share on

Mar. 11 2017

Perception of Social Phenomena through the Multidimensional Analysis of Online Social Networks

Accepted at Online Social Networks and Media Journal by Elsevier [1].

Abstract. We propose an analytical framework aimed at investigating different views of the discussions regarding polarized topics which occur in Online Social Networks (OSNs).

The framework supports the analysis along multiple dimensions, i.e., time, space and sentiment of the opposite views about a controversial topic emerging in an OSN.

To assess its usefulness in mining insights about social phenomena, we apply it to two different Twitter case studies: the discussions about the refugee crisis and the United Kingdom European Union membership referendum. These complex and contended topics are very important issues for EU citizens and stimulated a multitude of Twitter users to take side and actively participate in the discussions. Our framework allows to monitor in a scalable way the raw stream of relevant tweets and to automatically enrich them with location information (user and mentioned locations), and sentiment polarity (positive vs. negative). The analyses we conducted show how the framework captures the differences in positive and negative user sentiment over time and space. The resulting knowledge can support the understanding of complex dynamics by identifying variations in the perception of specific events and locations.

References

[1]   Mauro Coletto, Andrea Esuli, Claudio Lucchese, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, and Chiara Renso. Perception of social phenomena through the multidimensional analysis of online social networks. Elsevier Online Social Networks and Media, 1:14 – 32, 2017.

Share on

Mar. 01 2017

A Motif-based Approach for Identifying Controversy

Accepted at ICWSM 2017: AAAI Conference on Web and Social Media by Elsevier [1].

Abstract. Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features.

References

[1]   Mauro Coletto, Kiran Garimella, Aristides Gionis, and Claudio Lucchese. A motif-based approach for identifying controversy. In ICWSM ’17: International AAAI Conference on Web and Social Media, 2017.

Share on

Dec. 10 2016

SELEcTor: Discovering Similar Entities on LinkEd DaTa by Ranking their Features

Accepted at ICSC ’17: IEEE International Conference on Semantic Computing [1].

Abstract. Several approaches have been used in the last years to compute similarity between entities. In this paper, we present a novel approach to compute similarity between entities using their features available as Linked Data. The key idea of the proposed framework, called SELEcTor, is to exploit ranked lists of features extracted from Linked Data sources as a representation of the entities we want to compare. The similarity between two entities is thus mapped to the problem of comparing two ranked lists. Our experiments, conducted with museum data from DBpedia, demonstrate that SELEcTor achieves better accuracy than state- of-the-art methods.

References

[1]   Livia Ruback, Marco Antonio Casanova, Claudio Lucchese, and Chiara Renso. SELEcTor: Discovering similar entities on linked data by ranking their features. In ICSC ’17: IEEE International Conference on Semantic Computing, 2017.

Share on

Sep. 20 2016

Best Student Paper Award at Document Engineering 2016

Our paper Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., and Trani, S., SEL: a unified algorithm for entity linking and saliency detection accepted at DocEng 16: Proceedings of the 2015 ACM Symposium on Document Engineering was awarded Best Student Paper[1].

References

[1]   Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. SEL: a unified algorithm for entity linking and saliency detection. In DocEng ’16: Proceedings of the 2015 ACM Symposium on Document Engineering, 2016.

Share on

Aug. 11 2016

Fast ranking with additive ensembles of oblivious and non-oblivious regression trees

Journal paper accepted at ACM Transactions on Information Systems [1].

Abstract. Learning-to-Rank models based on additive ensembles of regression trees have been proven to be very effective for scoring query results returned by large-scale Web search engines. Unfortunately, the computational cost of scoring thousands of candidate documents by traversing large ensembles of trees is high. Thus, several works have investigated solutions aimed at improving the efficiency of document scoring by exploiting advanced features of modern CPUs and memory hierarchies. In this paper, we present QuickScorer, a new algorithm that adopts a novel cache-efficient representation of a given tree ensemble, it performs an interleaved traversal by means of fast bitwise operations, and also supports ensembles of oblivious trees. An extensive and detailed test assessment is conducted on two standard Learning-to-Rank datasets and on a novel very-large dataset we made publicly available for conducting significant efficiency tests. The experiments show unprecedented speedups over the best state-of-the-art baselines ranging from 1.9x to 6.6x. The analysis of low-level profiling traces shows that QuickScorer efficiency is due to its cache-aware approach both in terms of data layout and access patterns, and to a control flow that entails very low branch mis-prediction rates.

References

[1]   Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Transactions on Information Systems, 35(2):15:1–15:31, 2016.

Share on

Aug. 01 2016

Computing Reviews’ Notable Books and Articles 2015

Our paper Lucchese C., Nardini F.M., Orlando S., Perego R., Tonellotto N., Venturini R., QuickScorer: a Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees [1], best paper at ACM SIGIR 2015, was selected as a ACM Notable Article in Computing 2015.

PIC

This looks like a good reason for reading our paper [1] and its recent improvement [2].

References

[1]   Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees. In SIGIR ’15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015. (best paper) (ACM Notable Article).

[2]   Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Exploiting cpu simd extensions to speed-up document scoring with tree ensembles. In SIGIR ’16: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016.

Share on

July 12 2016

Sentiment-enhanced Multidimensional Analysis of Online Social Networks: Perception of the Mediterranean Refugees Crisis

Accepted at SNAST ’16: Workshop on Social Network Analysis Surveillance Technologies in conjunction with ASONAM ’16: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining [?].

Abstract. We propose an analytical framework able to investigate discussions about polarized topics in online social networks from many different angles. The framework supports the analysis of social networks along several dimensions: time, space and sentiment. We show that the proposed analytical framework and the methodology can be used to mine knowledge about the perception of complex social phenomena.

We selected the refugee crisis discussions over Twitter as a case study. This difficult and controversial topic is an increasingly important issue for the EU. The raw stream of tweets is enriched with space information (user and mentioned locations), and sentiment (positive vs. negative) w.r.t. refugees. Our study shows differences in positive and negative sentiment in EU countries, in particular in UK, and by matching events, locations and perception, it underlines opinion dynamics and common prejudices regarding the refugees.

References

Share on