The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, bioinformatics, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. PaNDa is a greedy algorithm for the discovery of Patterns in Noisy Datasets. By exploiting the Minimum Description Length principle, the proposed algorithm extracts succinct pattern sets that approximately describe the input data. [source]
The CoPhIR (Content-based Photo Image Retrieval) Test Collection is the largest multimedia metadata collection ever made available to the scientific community. It contains five MPEG-7 visual descriptors of 100 million photographic images downloaded from Flickr®, as well as other interesting metadata such as tags, comments, GPS coordinates, etc. The collection is currently growing up to the size of 100 million images. This activity, jointly run with the NMIS Lab, is supported by the SAPIR project. Information for accessing the collection are available at the CoPhIR website.