Skip to main content
European Commission logo print header

Harmonised Semantic Meta-Search in Distributed Heterogeneous Databases

Final Report Summary - HARMOSEARCH (Harmonised Semantic Meta-Search in Distributed Heterogeneous Databases)

Executive Summary:
HarmoSearch -the future of information services

Electronic data exchange has been an important task within the field of travel and tourism already for decades, since it is a highly information-dependent industry. What has been a very technical job for a few large players in the past has become a challenging task for thousands of small companies today, due to the lack of a unique international data standard.

HarmoSearch continued ongoing research in overcoming this interoperability problem by introducing tools to easily connect to an open data network and to find relevant information sources. An intelligent mapping tool, automatic translation of data queries and semantic registries about data providers make it very easy - especially for SMEs - to exchange data and participate in the global online business in a cost-efficient way.

The objective of HarmoSearch project was therefore to develop two components, which shall enable to run meta searches in highly networked environments with the use of semantic technologies. These components are:

1.A semantic meta-search component identifying data sources based on a semantic understanding of the search intention and knowledge about existing partners in the network. This knowledge comes from the interpretation and translation of queries based on semantic annotation of data and sources.
2. A semantic mapping tool, providing expert knowledge about the data schemas by having the semantic concepts integrated in the mapping tool. This shall allow generating mappings automatically as far as possible, requiring only little activities from the user and very little technical knowledge.

HarmoSearch is based on the work of past projects and activities, like Harmonise, Harmo-TEN, a CEN Workshop Agreement, activities of the non-profit HarmoNET association, the euromuse.net project and portals like e.g. VisitEurope.com.

Project Context and Objectives:
Harmonise is a technology that helps to overcome the interoperability problems when exchanging data between heterogeneous data bases. Those databases might contain the same kind of information (e.g. about a hotel) but they are usually structured differently. This problem could be solved by industry standards, but for example the tourism industry lacks an international standard for describing tourism objects in a data base. This comes from different reasons, three of the most prominent are
1. the long history of ICT use with a broad range of systems and data models in place,
2. the peculiarities of the industry and its players and
3. the need for some flexibility to reflect the diversity in this huge market.
This was the starting point for Harmonise, a technology to allow easy exchange of data with different data schemas in a network of several players based on an ontology. It has been developed in an FP5 RTD project called “Harmonise”. In a later eTEN project a business plan was developed for the tourism sector and a non-profit organisation established called HarmoNET. This organisation brings together user, software developing companies (typically SMEs) and research institutions, while the business plan and investments already done based on this business plan are essential parts of the motivation for this project proposal.
The Harmonise system was originally based on a decentralised solution installed on each partner’s system. This has lead to problems with updating and maintaining the system and acceptance was behind expectations. For this reason some of the HarmoNET members have invested private resources already to implement a portal-based online service, allowing partners to use the system without installing anything locally. This has improved the situation a lot and everybody who wants to participate can do this without carrying about the software at all. All it requires are access to the online system and to do mappings one time to apply the required translation rules.

As an example consider that partner A has data, e.g. describing a hotel, in a specific format (data schema), which is different from the format used by partner B although it is about the same type of content: a hotel description. Partner A sends the data to partner B via the Harmonise portal. The data is first transformed to a reference model, the Harmonise Ontology, and than transformed again from the reference model to the recipient’s data format.
The effort wouldn’t be worth for two partners exchanging data because they could make a one time mapping and exchange data directly. But the benefit comes in networked industries, like tourism, where many partners want to exchange the same kind of information. Instead of having translation rules (like mappings) with each partner involved, participants of Harmonise only need to map there data schema once with the reference model of Harmonise and can then exchange data easily with any other partner of Harmonise.

It is obvious that Harmonise offers great benefit by reducing drastically the efforts for implementing systems for the exchange of information. This solution is in principle independent of any business sector, but the ontology, as the reference data model for transforming data, has to be applied to a specific domain. Harmonise comes from a need discovered in the tourism sector and therefore the ontology is also covering data description of objects from the tourism domain.

However, two basic challenges remain for participants to joint the services of Harmonise:
1) To create translation rules the user of Harmonise has to do a mapping between the partner’s own data schema and the Harmonise Ontology. This requires some technical skills and knowledge about data schemas and mappings.
2) One participant can send data to other participants, but cannot search on other participant’s data bases. But sending data is only one business case, even more often there is a need to search data without knowing exactly who has it or who has the best offer.

Thus it is easy to exchange data, but only after the challenge of mapping has been done. This is no problem when having an IT-department, but most of the tourism offer providers in Europe are SMEs lacking a dedicated or competent IT-departments, often having a simple CMS or website only. Existing mapping tools on the market are made to support developers, but not to support somebody who does not have the technical knowledge about data base design. Thus it is not easy for them to join the Harmonise system. This becomes more pressuring since the tourism market become heavily depending on online offers and deals.

The second issue, the possibility to search on several partner’s data bases at the same time, is of interest for any partner in tourism no matter which size. Currently the system allows sending data to one partner, like e.g. a central data store, which can then be searched – as it is the case for example in euromuse.net. euromuse.net is a museum portal for exhibitions in Europe using Harmonise, where participating museums are sending their data via Harmonise to the central database, where interested users can search for the cultural offer in Europe. But in many other cases it is difficult or often even impossible to maintain a central data base. The problem is even worse when this data has always to be up-to-date (e.g. availability of event tickets) and partners what need to send large amounts of data constantly.

These two problems can be solved by making use of semantic technologies and this was the focus of this project: First to develop a semi-automatic mapping tool, which has already some “knowledge” about the data schema by the underlying semantic annotation of data schemas. It supports the user in defining mappings by making suggestions and recommendations like an intelligent assistant. This is only possible if the mapping tool has some knowledge of either nature or structure of the data on a meta-level. An example would be post codes, which can be detected since they are usually integers with 4-6 digits close to city names.
But also meta-search cannot easily be done in a network of hundreds or thousands of participants. This is the second issue where semantic concepts are needed to provide a semantic index with knowledge about the participants and the data they can provide. This shall provide fast and reliable results even when thousands of partners can be searched. This semantic index is a kind of intelligent registry of information provider. To enable search on other systems it is necessary to run a search query on the other system – and of course typically different systems have different query languages. As a consequence, not only data needs to be transformed, but also queries need to be transformed to run a search in this network. Thus, similar to translating data on the fly, the search or query string also needs to be transformed. Consequently, a mapping of search queries is a pre-requisite to make meta search in such a large environment. The semantic mapping tool is therefore needed not only to map data with heterogeneous schemas, but also to map query languages.

Both components, the mapping tool and the semantic meta-search index, are both independent stand-alone components, which are used in the Harmonise scenario but can also be used in other systems as well.
The objective of this project was therefore to develop two components, which shall enable to run meta searches in highly networked environments with the use of semantic technologies. These components are:
1) A semantic meta-search component identifying data sources based on a semantic understanding of the search intention and knowledge about existing partners in the network. This knowledge comes from the interpretation and translation of queries based on semantic annotation of data and sources.
2) And a semantic mapping tool, providing expert knowledge about the data schemas by having the semantic concepts integrated in the mapping tool. This shall allow generating mappings automatically as far as possible, requiring only little activities from the user and very little technical knowledge.

Project Results:
The project contributed both in the development of new knowledge and technology with new tools, but also brought new technologies and services to tourism organizations of the tourism domain. This tools are stand-alone components, which can also be transferred to other industry domains and topics likewise and are not an ongoing development of existing solutions.

In fact, the project came up with a new services to the tourism players for searching and exchanging semantically enriched contents - each participant is able from one side to expose its contents to all the other tourism organizations partners of the network, and, from the other, to query contents of other players. This enables also small organizations to be part of a network, by overcoming one of the major issue of the tourism domain, its granularity. In fact, the tourism industry is composed by small organizations, which often cannot afford the costs for developing in house new interfaces to existing standards or to connect with other players.

Thus the first contribution of the project is to foster collaboration, exchange of knowledge, dissemination of information among the tourism players. Tourism organizations have the possibility, with affordable costs, to plug their system into a global network, deliver their information to other organizations and querying data of other organizations. For example, a tourism organization promoting events could publish on its portal data about available accommodations in the same area, by querying the HarmoSearch service and receiving relevant data from a local destination management organization, and a destination management organization could publish on its portal the events organized by the event provider organization.

Technically, the project delivered new knowledge and tools in the following areas:

Semantic Registry for meta-search
A new technology allowing to build, manage, and update the description of services provided by integrated tourism organizations will be developed.
This registry was built by exploiting several knowledge sources: Data explicitly inserted by tourism organizations when registering the service. These are basic information which are available by using existing standards (for example described with the Web Services Definition Language).Information acquired from the mapping that each organization defines. When defining a mapping, it is possible to capture several information about the instances that each organization can provide. For example, if the accommodation facilities concept is mapped, it is known that this provider can answer specific queries about accommodations providing specific facilities.
This allowes to build a profile specific for each service allowing to understand, given a certain query, which services are suited to provide contents for it.

Mapping of search queries
A specific model explicitly studied to represent queries that can then be translated in the query language of each data provider has been developed. Given this model, it is now possible to translate queries described in the model of the organization querying the system to the one requested by the organizations providing data. The model is powerful enough to represent the desired query constructs, but simple enough to be able to support the variety of query capabilities which are exposed by the data providers. Some will expose powerful constructs, others very limited, so the right compromise was identified to be able to cover most of the cases without losing in flexibility.
The research will focus also on the representation and mapping of metadata like enumeration values. In fact, for example, for being able to query a certain feature, is necessary to know which are the enumerated values that describe a given feature, and how these values are mapped on the harmonise ontology.
The model and its metadata should is programmatically accessible to organizations that want to consume the HarmoSearch service. For example, if a system should provide a query form allowing to search for accommodation providing certain facilities, the consumer should be able to know the whole set of possible facilities that could be queried (to populate for example a drop down list). Thus, a specific service providing description of the model and its metadata should be made available by HarmoSearch.

In addition, should be considered two different needs:
First is to have a common model shared among all the partners. A common data model allows to have one query which could then mapped to the models of more partners. If we have to query accommodations which have wellness facilities, the concept “wellness” should be shared among all the participants. This model should not be too complex, otherwise defining the mapping could become too complex as well.
Second, often there is also the needs to manage specific metadata private for subgroups of participants of the networks. Continuing the previous example, some organizations, very focussed on “Wellness”, will be interested in allowing querying on more specific types of Wellness (for example “Turkish bath“), to provide more specific information about their products. This level of detail is not of interests to most of the participants of the network, and we do not want to overload the definition of Harmonise ontology with too specific concepts not of interests to the most.
Thus, the query model is able to support both concepts at a global level, known to all the participants of the network, but also private concepts shared only among a subset of the participants.

Semi-automatic mapping tool
One of the most complex task for organizations who desire to join the Harmonise network is to define the mappings from and to the Harmonise ontology. Even if this is a task that should be performed once, it is necessary to spend several man days of work, and this effort could be of obstacle to organizations willing to join the network.Thus it was a project goal to visually support the user in creating the necessary mapping files from the data model of the organization to the one of Harmonise and vice-versa.
Input sources are the Harmonise model as well as the one of the tourism organization that desire to join the network. They can be expressed as XML Schema.
The designer can interactively define the mapping, aided by automatic functions like:
- Suggestions on which concept of the source model can be mapped into the concept of the destination model
- Suggestions on mapping of metadata values
- Test and validation of the mapping, with sample data


Potential Impact:
The result of this project is of great benefit not only to the participants in this project, but to the entire tourism industry in Europe as such. The market is already dominated by online business, where the position of large suppliers has become much stronger compared to SMEs, since they can afford investments in portals, establishment of several standards and online systems. But at the same time the overwhelming majority of industry players are SMEs, which have difficulties to join the online business and keep pace with current developments. They are becoming more and more depending on larger partners. The Harmonise solutions, with the components for making mappings and the possibility to search on their systems easily, offers completely new business opportunities for them. It takes comparable small effort to join the network – either as supplier or as booking platform, search portal, destination marketing organisation whatsoever. The same is true for e.g. the cultural heritage sector, where we see the problem were well in the euromuse.net project: Museums have a strong interest to join the portal and to bring their offer to the online market. Harmonise is a great solution for them and gives the necessary infrastructure.

But not only the market as such benefits, of course also the SMEs participating in this project do. Each of them is either offering interoperability tools and services to the market, is providing meta-search solutions or runs portals in need of meta-data functionality. Data interoperability in the tourism sector is considered one of the biggest challenges in e-tourism with still growing market volumes. A Harmonise solutions offering meta-search in a highly distributed network with possibly thousands of participants and better access by offering a semi-automatic mapping tool can significantly increase the use of Harmonise and each of the SMEs participating would benefit from increased turn-over.

eCTRL is offering e-tourism consulting and services and has developed their own recommender and travel planning tool. This tool is currently integrated into a client’s system to recommend items from this database based on user and tourism offer profiles. With multiple databases to search via a semantic meta-search, eCTRL can now offer a solution where the semantic profiles of tourism products could be added to all the products in the network, offering a new product with a broader scope (e.g. recommend all offers in a give destination without integrating them into one data hub).

EC3Networks is offering meta-search solution in tourism, which have been implemented for example in the official Finland tourism portal VisitFinland.com. But they also offer meta-search as a service to rent to destination, so that all relevant booking partners are integrated into one search. The integration of each booking partner is work of several weeks, whereas integrating them with the semi-automatic mapping tool and Harmonise meta-search limits these efforts to one or two days – significantly cutting down costs while offering a stronger solution.

Musuemsmedien is supporting cultural heritage institutions in promoting their services with competence in multimedia and web technology and maintain their portals und data hosts, including the meta-portal euromuse.net which is brought to the market with co-funding from the European Commission. The museums joining euromuse.net or any other online information service in the cultural heritage sector, are in a similar situation like the tourism players, but even worse with respect to resource for activities on the electronic market. While tourism has a long lasting history in the use of information and communication technologies, this is not true for cultural heritage. They are much more in demand for easy and cost-effective ways to join the online business. This project brings Museumsmedien in the position to offer their clients tools and support to connect with the online world and also with the tourism sector. They received a number of expressions of interest on this solution already before the project end.

Afidium is a French consulting company specialized in ICT consulting for the tourism sector and has been involved in the development of XFT, a framework for a standardized data schema language. This is another approach for offering solutions to overcome data interoperability problems. However, first it is only applicable for partners who have followed this initiative and they need to connect with others who do not use XFT and second XFT cannot solve all problems in data interoperability. Afidium is also delivering connectors and transcoders in order to deliver XFT connectivity to non XFT providers. There remains some lack which can now be overcome by the use of an enriched Harmonise service. With this solution in place, Afidium can offer a completely new dimension to XFT and benefit from increased consulting services based on the new service.

Each of the partners mentioned has specific market offers in place and benefits significantly from the new service additionally, either by saving costs for current activities or by increased sales from the new services. However, a deeper analysis of the business potential have shown that a central online data mediation service can only be offered by all or at least some of the partners together. This is for several reasons:
1) Each of the SME partners is a small company and has not the necessary resources to offer and maintain an online service on his own. This is less a technical issue, but rather a matter of market competence and power. An online service has to be pushed and marketed regularly and intensively to create awareness and attract customers. This can effectively only be done by bundling the forces of a group of partners.
2) The different cases for the usage of data mediation services are rather limited. We have evaluated different cases and the most promising one is the exchange of event data in cultural heritage and tourism. There are some other opportunities as well, like e.g. a hotel meta search, but there is more competition and the market is thus more difficult to enter.
3) There is not much reason to limit such a service to national markets, since tourism and cultural heritage are heavily present on the international level. But if each partner would start such a service individually, there will soon be competition amongst each other on the international level. It is therefore preferred to start in a concerted and directed cooperation from the beginning and launch a service jointly.
4) The cooperation in the project has proven to be very good and effective. From the exchange and interaction of ideas to the different competencies in coordinative tasks, development, design, testing, etc. – the consortium has proven to work very well together. This is something we want to continue after project end.
For this reasons the idea of launching an event publishing- and search service was evaluated more in detail and a draft project plan was elaborated. The idea is to use the HarmoSearch service, extend it with external service to enrich existing information (e.g. by categorization), and build an event information hub out of it. The main objective is to provide advanced harmonisation services to collect, enrich and distribute data about events and attraction for commercial use. This enrichment of data is adding value by categorizing and classifying the events and attractions based on their context. Thus, the vision is to deliver to the recipients a selection of events and attractions that are automatically filtered according to their requirements. This is a project idea that the partners want to continue working on.

List of Websites:
Project Co-ordinator

Company name: ec3 NetWorks GmbH
Name of representative: Manfred Hackl

Address: Voigtländergasse 15, AT-1070 Wien

Phone number: +43-676-842 755 100
Fax number: +43-676-842 755 799

manfred.hackl(at)ec3networks.at

www.harmosearch.com