Looking for the world in digital libraries

Many digital libraries or archives around the world are making all sorts of multimedia content available. Now a EU project has developed a solution to the main stumbling block to using them effectively - finding and sorting one's way through the wealth of material that is there.

Digital Economy

A problem for users of digital libraries when searching for specific documents or multimedia assets is that different archives tend to classify and describe their collections in different ways, often using proprietary formats. The MIND project has addressed this problem and produced two solutions that will make searching for and finding the desired resources easier. The first technique is to automatically generate a standard descriptor for images, by extracting identifying features and storing them as metadata in a standard format. In other words, a standardized description is generated to stand alongside the archive itself, which may be in any one of a number of different formats. By using a standard resource descriptor, including the visual properties and features, for images in each library, a reliable search can be carried out across several archives at once. Secondly, when a search in several libraries returns a number of documents, images or other kind of multimedia content, the project has a developed a system for the "normalisation" of the scores given to each returned resource for its relevance. A two-stage process first analyses the search results for each library separately. The system "learns" by what factor it should adjust the score given to documents to indicate their match with the search query. The second stage then adjusts the matching scores by the correct factor for each archive. This allows "data fusion", or the merging of the search results from several libraries returned by a single query. The various documents and images are then displayed in the correct order of relevance, no matter which archive they came from, and the task of sorting through the search results is made that much easier for the user.