Dexter is a framework for implementing and evaluating entity linking algorithms. The entity linking task aims at identifying all the small text fragments referring to entities contained in a knowledge base, e.g., Wikipedia. Many entity linking algorithms have been proposed, but unfortunately only a few authors have released the source code or some APIs. As a result, evaluating today the performance of a method on a single subtask, or comparing different techniques is difficult. Dexter is opensource, since we believe that a shared framework is fundamental to perform fair comparisons and improve the state of the art. Visit Dexter site!
TripBuilder, is an user-friendly and interactive system for planning a time-budgeted sightseeing tour of a city on the basis of the points of interest and the patterns of movements of tourists mined from user-contributed data. The knowledge needed to build the recommendation model is entirely extracted in an unsupervised way from two popular collaborative platforms: Wikipedia and Flickr. TripBuilder interacts with the user by means of a friendly Web interface that allows her to easily specify personal interests and time budget. The sightseeing tour proposed can be then explored and modified. TripBuilder demo won the best demo award at ECIR 2014. Plan your visit to Rome, Florence, Pisa and Amsterdam by clicking here .
SearchShortcuts, is an efficient and effective solution to the problem of choosing the queries to suggest to web search engine users in order to help them in rapidly satisfying their information needs. SearchShortcuts is less affected by the data-sparsity problem than most state-of-the-art proposals. Thus, it is particularly effective in generating suggestions for rare queries occurring in the long tail of the query popularity distribution.
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, bioinformatics, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. PaNDa is a greedy algorithm for the discovery of Patterns in Noisy Datasets. By exploiting the Minimum Description Length principle, the proposed algorithm extracts succinct pattern sets that approximately describe the input data. [source]
The CoPhIR (Content-based Photo Image Retrieval) Test Collection is the largest multimedia metadata collection ever made available to the scientific community. It contains five MPEG-7 visual descriptors of 100 million photographic images downloaded from Flickr®, as well as other interesting metadata such as tags, comments, GPS coordinates, etc. The collection is currently growing up to the size of 100 million images. This activity, jointly run with the NMIS Lab, is supported by the SAPIR project. Information for accessing the collection are available at the CoPhIR website.