Recent Posts


HPC Lab CIKM Workshop, Nov. 18th 2013

By admin | November 17, 2013

Lunedì 18 novembre in aula Faedo dell’ISTI alle 14:30 ci sarà un workshop di laboratorio aperto a tutti gli interessati. Il programma prevede una ripetizione dei 6 talk presentati da HPC Lab a CIKM 2013 due settimane fa (4 alla conferenza e due a workshop satellite) in cui i relatori, per stimolare la discussione, daranno un po’ più di spazio agli eventuali sviluppi futuri della ricerca.


Ore 14:30, Load-Sensitive Selective Pruning for Distributed Search. Nicola Tonellotto, Fabrizio Silvestri, Raffaele Perego, Daniele Broccolo, Salvatore Orlando, Craig Macdonald, Iadh Ounis;

Ore 15:00 Learning relatedness measures for entity linking. Diego Ceccarelli, Claudio Lucchese; Salvatore Orlando, Raffaele Perego, Salvatore Trani

Ore 15:20 Dexter: an Open Source Framework for Entity Linking. Diego Ceccarelli, Claudio Lucchese; Salvatore Orlando, Raffaele Perego, Salvatore Trani

15:40 Coffee break

Ore 16:00 Twitter anticipates bursts of requests for wikipedia articles. Gabriele Tolomei, Salvatore Orlando, Diego Ceccarelli, Claudio Lucchese;

Ore 16:20 LearNext: Learning to Predict Tourists Movements. Ranieri Baraglia; Cristina Muntean; Franco Maria Nardini, Fabrizio Silvestri

Ore 16:40 Where Shall We Go Today? Planning Touristic Tours with TripBuilder. Chiara Renso, Franco Maria Nardini; Igo Brilhante, Jose de Macedo; Raffaele Perego

Topics: Uncategorized | No Comments »

Freire’s seminar

By stefania | April 17, 2013

Speaker: Ana Freire

Date: 2013 April, 24
Time: 11:00 a.m.
Duration: 60′
Room: C-29 (Faedo room)
Title: Scheduling queries across distributed and replicated IR systems.


Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest. In this seminar, we will show the conditions for which query scheduling can provide benefits to the waiting time experienced by a query. In particular, we have deployed a well-studied simulation framework, which implements different scheduling methods for real Web search queries with known arrival times. Our experiments using different numbers of shards and replicas show that the use of predicted query response times for scheduling can markedly reduce the waiting time experienced by a query under high query volume.

Topics: Uncategorized | No Comments »

HPC Lab partner tecnico di istella

By stefania | April 2, 2013

Topics: Uncategorized | No Comments »

HPC partner tecnico di istella

By stefania | March 22, 2013

Topics: Uncategorized | No Comments »

Ottaviano’s seminar

By stefania | January 23, 2013

Speaker: Giuseppe Ottaviano
Date: 2013 January, 31
Time: 03:00 p.m.
Duration: 60′
Room: C-29 (Faedo room)
Title: Space-efficient data structures for Information Retrieval


Search engines and social networks need to cope with very large collections of data, mostly strings, such as queries, URLs, entities, ngrams. In this scenarios it is very important to design fast and space-efficient data structures to store and index the collections, in order to increase the amount of data that can be kept in main memory, thus avoiding slow disk accesses.

Succinct data structures are especially fit for this purpose, as they have near-optimal theoretical guarantees and good practical performance. I will describe some succinct data structures for the storage of collections of strings, and an application of these data structures to the problem of query suggestion in search engines.

Topics: Uncategorized | No Comments »

Anastasi’s seminar

By stefania | July 16, 2012

Speaker: Gaetano Anastasi
Date: 2012 July, 20
Time: 03:00 p.m.
Duration: 60′
Room: C-29 (Faedo room)
Title: Quality of Service Management in Service Oriented Architectures
The concept of Service Oriented Architectures (SOAs) has gained momentum in Information and Communication Technology (ICT) application area in recent years, introducing an innovative approach to the analysis, design and development. Many applications provided as services are time-critical and demand soft real-time requirements that must be taken into account for providing a certain Quality of Service (QoS) to service consumers. For providing strong guarantees, SOAs must be enhanced with an advanced execution management that takes into consideration the underlying resources used for service provisioning. Management in the SOA context is not only about managing the services, but also about managing the network, the computing units and various other resources, that could be also virtualized in the case of cost-effectively large-scale systems.
In this seminar, this problem will be generically indicated with the term QoS management and will be addressed with a particular focus on service-oriented real-time applications. By using proper resource management techniques borrowed from the real-time system theory, it will be shown that the service provider can guarantee the QoS negotiated by consumers, in the context of QoS-enabled SOAs. In particular, a generic service-oriented QoS architecture has been designed and developed for negotiating and providing services with soft real-time guarantees. Also, many realistic experiments have been conducted in Linux testbeds for showing the effectiveness of the proposed approach.

Topics: Uncategorized | No Comments »

The 1st HPCLab Workshop

By admin | June 15, 2012

In this workshop, the members of our group will present their research directions and recent results. The presentations will focus on describing results at high level, without entering into technical details. This will be an occasion to disseminate and give a global vision of the research done in our lab.
The ultimate objective of this workshop is establishing and/or reinforcing collaborations with participants, and we will encourage discussions to identify aspects in common with the research of other groups. We hope that this workshop will be a great opportunity for improving collaborations and for sharing ideas and discuss new research directions.

The workshop will be on Friday, June 15th at ISTI-CNR in Pisa (how to reach us), in the rooms A27 (workshop) and A29 (breaks). In the evening, we will enjoy a social dinner with all invited participants at La Locanda Sant’Agata.


10:00 Raffaele Perego Lab Overview
10:30 Patrizio Dazzi A potpourri of distributed computing researches
11:00 Claudio Lucchese On Frequent Chatters Mining
11:30 Coffee Break Room A29
12:00 Gabriele Tolomei You’re not Asking Me to Find Something, You’re Asking Me to Get Things Done!
12:30 Franco Maria Nardini On Enhancing the User Experience in Search Engines
13:00 Massimo Coppola Issues and Opportunities of Cloud Federations
13:30 Lunch Room A29
15:00 Rossano Venturini Data Structures + Data Compression: A love story since 1989
15:30 Ranieri Baraglia Building of P2P Overlay Networks via Voronoi and Gossip
16:00 Coffee Break Room A29
16:30 Nicola Tonellotto Dynamic Pruning in Web Search
17:00 Fabrizio Silvestri The Impact of Social Groups on News Recommendations
17:30 Open discussions
20:30 Dinner La Locanda Sant’Agata

Topics: Uncategorized | No Comments »

Barbara Guidi’s seminar

By stefania | October 4, 2011

Speaker: Barbara Guidi
Date: 2011 October, 6
Time: 11:00 a.m.
Duration: 60′
Room: C-40
Title: Reti P2P
Di particolare interesse sono le reti P2P geografichefiche, nelle quali un nodo è caratterizzato da delle coordinate spaziali de_nite su di un piano d-dimensionale.
Questi overlay network necessitano di essere strutturati verso una topologia che permetta di sfruttare le informazioni spaziali associate ai nodi. Una particolare struttura molto utilizzata per modellare queste particolari soluzioni è la triangolazione di Delaunay. Le triangolazioni di Delaunay sono triangolazioni ben note in geometria computazionale e presentano particolari proprietà che risultano utili se applicate in reti geografiche.
Il seminario presenterà un algoritmo innovativo per la costruzione di un overlay network basato sulla triangolazione di Delaunay, sfruttando l’informazione ottenuta in seguito ad una ricerca epidemica all’interno dello overlay network. L’utilizzo del protocollo gossip permette di venire a conoscenza entro un numero limitato di cicli dei nodi presenti nella rete. Tale informazione viene analizzata ed elaborata dall’algortimo proposto per la creazione e la gestione dello overlay basato sulla
triangolazione di Delaunay. Attraverso l’uso della tecnica gossip è garantita la convergenza di un overlay network verso una triangolazione di Delaunay. La particolarità del protocollo proposto è la possibilità di convergere verso una triangolazione di Delaunay sfruttando soltanto l’informazione sui nodi presenti nella rete contenuta all’interno della vista di ogni nodo.

Topics: Uncategorized | No Comments »

Last Papers

By admin | March 8, 2011

HPC Lab Upcoming Papers

  1. D. Broccolo, L. Marcon, F. M. Nardini, F. Silvestri, R. Perego. Generating Suggestions for Queries in the Long Tail with an Inverted Index. IP&M. To Appear, 2011.
  2. C. Macdonald, I. Ounis, N. Tonellotto.  Upper Bound Approximations for Dynamic Pruning. ACM Transactions on Information Systems (ACM TOIS), to appear, 2011.
  3. G. De Francisci Morales, A. Gionis, M. Sozio. Social Content Matching in MapReduce. 37th International Conference on Very Large Data Bases (VLDB 2011), Aug 29-Sep 3, Seattle, US
  4. G. Capannini, F.M. Nardini, R. Perego, F. Silvestri. Efficient Diversification of Web Search Results, 37th International Conference on Very Large Data Bases (VLDB 2011), Aug 29-Sep 3, Seattle, US.
  5. A. Orlandi, R. Venturini. Space-efficient Substring Occurrence Estimation. 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2011.
  6. M. Boley, C. Lucchese, D. Paurat and T. Gärtner. Direct Local Pattern Sampling by Efficient Two-Step Random Procedures. 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011).
  7. Gabriele Capannini, Fabrizio Silvestri, Ranieri Baraglia. Sorting on GPUs for large scale datasets: A thorough comparison. Information Processing & Management, In Press, Available online from 8 January 2011.
  8. C. Lucchese, S. Orlando, R. Perego, F. Rabitti. Similarity Caching in Large-Scale Image Retrieval. Information Processing & Management, In Press, Available online from 8 January 2011.
  9. R. Baraglia, G. Capannini, D. Laforenza, M. Pasquali, L. Ricci.  A multi-level scheduler for batch jobs on grids, The Journal of Supercomputing, In Press, Available online from 22 February 2011.
  10. G. Capannini, F.M. Nardini, R. Perego, F. Silvestri. Efficient Diversification of Search Results using Query Logs (poster), WWW’11, 28th March – 1st April, Hyderabad, India.
  11. F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, R. Venturini. Recommendations for the Long Tail by Term-Query Graph (poster), WWW’11, 28th March – 1st April, Hyderabad, India.
  12. C. Lucchese, S. Orlando, R. Perego, F. Silvestri, G. Tolomei. Identifying Task-based Sessions in Search Engine Query Logs (best paper runner up). ACM WSDM, Hong Kong, February 9-12, 2011.
  13. D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, F. Silvestri. Caching query-biased snippets for efficient retrieval (full paper). EDBT 2011, Upsala. Sweden, March 22-24, 2011.
  14. R. Perego, F. Silvestri, N. Tonellotto. Representing Document Lengths with Identifiers (poster). ECIR, Dublin, Ireland, April 18-21, 2011.

Topics: Uncategorized | No Comments »

Montresor’s seminar

By stefania | February 8, 2011

Speaker: Prof. Alberto Montresor

Date: 2011, February, 21

Time: 12:00 p.m.

Duration: 40/45′

Room: Aula Faedo (C-29)

Title:  Cloudy Weather for P2P, with a Chance of Gossip

Peer-to-peer (P2P) and cloud computing, two of the Internet trends of the last decade, hold similar promises: the (virtually) infinite availability of computing and storage resources. But there are important differences: the cloud provides highly-available resources, but at a cost; P2P resources are for free, but their availability is shaky. Several academic and commercial projects have explored the possibility of mixing the two, creating a large number of \emph{peer-assisted} applications, particularly in the field of content distribution, where the cloud provides a highly-available and persistent service, while P2P resources are exploited for free whenever possible to reduce the economic cost. While executing active servers on elastic computing facilities like Amazon EC2 and pairing them with user-provided peers is definitely one way to go, this talk proposes a novel approach that further reduces the economic cost. Here, a passive storage service like Amazon S3 is exploited not only to distribute content to clients, but also to build and manage the P2P network linking them. An effort is made to guarantee that the read/write load imposed on the storage remains constant, regardless of the number of peers/clients. These two choices allows us to keep the monetary cost of the cloud always under control, in the presence of just one peer or with a million of them. We show the feasibility of our approach by discussing two cases studies for content distribution.

Topics: Uncategorized | No Comments »

« Previous Entries