Semantic Web Technologies - a New Action Line in the European Commission's IST Programme
Hans-Georg Stork and Franco Mastroddi
European Commission, Directorate-General Information Society and Media, D5
Abstract: We describe the background and rationale of the 'Semantic Web Technologies' action line of the European Commission's IST Work Programme 2001, offered under Key Action III (Multimedia Content and Tools) within its area 'Information access, filtering, analysis and handling (IAF)'. We also report on a workshop held in November 2000, in Luxembourg, where this action line was presented and discussed.
In less than ten years the World Wide Web, based largely on HTTP and HTML, has evolved into a vast information, communication and transaction space. Needless to say its features differ greatly from those of traditional media. Projects and related activities supported under the R&D programmes of the European Commission have made significant contributions to developing these features and to putting them to use in many applications [1]. Yet this note is not about past or ongoing European projects. We believe, as many others do, that today's Web gives us just an inkling of the full potential globally distributed systems may achieve in terms of information access and use. This note is an invitation to the relevant European R&D communities to participate jointly in realizing this potential.
It is, more specifically, an invitation to take the opportunities offered under the European Commission's IST (Information Society Technologies) programme [2], the largest specific programme of the 5th Framework Programme (FP5) that started in 1998 and will run until 2002:
They relate to ideas that have been looming for a number of years and whose objective is currently being referred to as the 'Semantic Web'. In the past, some of these ideas have been implemented and tested to a greater or lesser extent in open experimental or proprietary commercial systems. Yet it is probably safe to say that they have received their greatest push from the World Wide Web Consortium (W3C) [3]. An informal paper published on the Web in September 1998, by Tim Berners-Lee, entitled "Semantic Web Road Map" [4], and a more formal note on "Web Architecture: Describing and Exchanging Data" [5] (June 1999) may be considered the seminal documents.
'Making content machine-understandable' is a popular paraphrase of the fundamental prerequisite for the Semantic Web. In spite of its potential philosophical ramifications this phrase must be taken very pragmatically: content (of whatever type of media) is 'machine-understandable' if it is bound (attached, pointing, etc.) to some formal description of itself (often referred to as metadata*).
Ideally, adding 'semantics to content' in this sense should be achieved through algorithmic content analysis and/or algorithmic learning processes.
'Machine understanding' is not an end in itself. Rather, it should lead to automating a range of tasks within the context of distributed systems (such as the Web): from (chains of) business transactions to searching and filtering relevant and trustable information on whatever subject a user may be interested in. The kind of software performing such tasks is commonly known as 'agents', decorated with varying attributes and qualifications, such as information, intelligent, autonomous, co-operative, adaptive, rational, mobile, etc.
Lastly, human users should be able to interact with their agents (or directly with content) in an intuitively appealing fashion. Visual and/or virtual reality metaphors are perhaps the most likely candidates for representing the semantics of Web content at the man-machine interface (to make, in a manner of speaking, machine-understandable content understandable to humans) and for providing new ways of navigation and search.
The above considerations underly the structure of the 'Semantic Web Technologies' action line, as defined in the IST Work Programme 2001 [6]. This action line is being offered by the IST Key Action III (Multimedia Content and Tools) [7] within its area 'Information access, filtering, analysis and handling (IAF)' [8]. It consists of four strands that may be characterized roughly as: formalizing (using XML, RDF and advanced techniques for semantic interoperability and reasoning such as ontology languages), grounding (formalisms, for instance through content analysis and/or machine learning), acting (to support resource and knowledge management, resource and knowledge discovery, transactions, intelligent filtering and profiling, collaborative filtering, knowledge sharing, etc.) and interacting (e.g. through intuitive visual interfaces).
Work to be proposed under this action line will draw not only from various Computer Science subdisciplines such as formal modelling, formal logics and formal languages, information retrieval, (multimedia) databases, knowledge engineering, image analysis, etc., but also from neighbouring disciplines such as Cognitive Science. Projects will interweave the above mentioned strands, in order to create generic tools and demonstrate them through innovative applications. Where interoperability (e.g. among information agents) is an issue there is a strong case for agreeing on and/or adopting Web wide standards.
With its focus on Web technologies the action line takes account of the fact that the rapid growth of the World Wide Web stimulates and motivates R&D opening up new ways of processing and managing all kinds of digital content (including images, video and audio but of course also text and plain data), and its delivery via stationary and (increasingly) mobile platforms. It is designed with a view to giving these developments more momentum by creating synergies between hitherto relatively separate R&D (and standards) communities, both in industry and academic/public research. Our use of the word 'Web' does not necessarily imply 'World Wide': projects proposed under this action line may well be limited to local or corporate 'Webs'. However, the technologies they develop should be scalable to larger dimensions and sustainable at a larger scale.
The IST Key Action III initiative in this field is certainly not isolated within the IST programme or unique in the world. Other IST Key Actions, in particular II (New Methods of Work and Electronic Commerce [9]) and IV (Essential Technologies and Infrastructures [10]), are contributors (and are in fact already hosting relevant projects such as IBROW [11] or ONTOKNOWLEDGE [12]). Yet our initiative may differ from other, similar ones (outside Europe or in individual EU Member States) in that it is slightly broader and more open to knowledge management, multimedia and interface issues.
A workshop designed not only to test and discuss the above outlined agenda but also to be a forum for exchanging ideas and making contact, was held in November 2000, in Luxembourg [13]. Some 110 participants attended this event and 63 presentations of projects and project ideas had been proposed. Due to time constraints 24 of these could actually be given. The mix of different interests represented was remarkable. It not only reflected the various aspects of our 'Semantic Web Technologies' action line, but also its potential for further collaborative research (across different R&D communities) and for viable commercial applications of current and future developments.
Invited talks covered the role of ontologies, special requirements for mobile platforms, specific multimedia issues, agent technologies and, last but not least, business opportunities. We briefly summarize these talks, the short contributions and the general discussion.
Formal ontologies seem indeed instrumental in achieving the 'Semantic Web'. Rooted in a long tradition not only of formal logics and artificial intelligence but also of more mundane endeavours such as the setting up of classification schemes, thesauri and controlled vocabularies, they are currently the most promising candidates for providing semantically sound machine-processable descriptions of digital content. Making ontologies operational within the context of large distributed systems (such as the Web) requires a considerable research and development effort to be directed towards methods and tools for constructing and maintaining domain specific ontologies in a continously changing world. Key problems include: ontology learning, ontology-based annotation of legacy content, and the management of ontology repositories. Only 'heavy-weight' ontologies, endowed with axioms or rules, can support inferencing engines that would implement an important part of the 'Semantic Web' vision, the 'Web of Trust'. Also required are agreements on ontology language standards, necessary conditions for creating a sustainable 'Semantic Web'. Pertinent activities have already been launched at the W3C, with the participation of European, US and Japanese groups.
Mobile platforms, or the 'wireless Web', pose particular demands in terms of content semantics, largely due to the perceived usage patterns and a multitude of different capabilities of mobile devices. They require the matching of a variety of profiles, not only related to the user herself, but also her current situation/location, the services she needs, the terminal equipment at hand, the proxy server and whatever policy may apply in these circumstances. Ontologies for expressing these profiles are badly needed.
The next big challenge then is multimedia on the 'Semantic Web'. The current Web, in spite of the many attempts at popularizing audio and video streaming, is not really hospitable yet to these forms of content. In fact, the current physical infrastructure of the Internet does not yet support high-bandwidth applications that can compete with traditional audiovisual media. However, this is very likely to change over the next five to ten years. Multimedia producers and distributors will then be faced with a formidable management task (a 'multimedia engineering crisis' in analogy to the 'software crisis' of the sixties and seventies) while users (or consumers) of content will have to choose from an ever increasing number of information, education, training or entertainment products. Both problems can only be solved through the systematic use of metadata. This is yet another case in point for advancing research on appropriate ontologies. It is also a case in point for encouraging even closer co-operation between the RDF/ontologies world(s), the SMIL world(s) and the MPEG world(s). And thirdly, given the sheer size and dynamics of the contents involved, it is a case in point for automating to the largest extent possible the production of metadata through algorithmic content analysis.
So the future 'Web' will be, as one of the speakers put it, a multimedia web and it will be used by mobile users; it will be adaptive and open, collaborative and automated. To that effect it has to be 'semantic' of course! It will be pervasive and browsers will not play the role they play today. Agents, i.e. communicating distributed processes, acting on behalf of human users, will play a far greater role. Agents have been around - at least in the literature - for more than five years, and they come in basically two varieties: as personal information assistants and as members of multiagent systems. Again, for agents in a distributed system to be effective, ontologies and ontology-based metadata are indispensable. Agents enter the 'Semantic Web' at different levels. As much as they make use of the 'semantic infrastructure' they can also contribute to the creation and maintenance of that infrastructure. Agent based computing appears to be the appropriate paradigm to work in a complex world with multiple ontologies, fragments and multiple inferencing engines. It is interesting to note here that the agent and metadata aspects establish a strong link between the 'Semantic Web' and another important initiative in the field of distributed computing and research networking, known as the 'Computational Grid' (cf. the DataGrid project [14]).
Business opportunities related to the notion of a 'Semantic Web' seem to abound. First of all in the traditional areas of selling (B2C = Business to Consumers) or trading (B2B Business to Business) goods over the Internet. While 'traditional' B2C and B2B are still very much (product-)data and text (and of course image) oriented this will change as the Web becomes more and more multimedia enabled, making already complex content management tasks even more complex and requiring solutions based on 'Semantic Web' technologies. XML alone is not a panacea. Unlike today, with most content still being available for free, content itself will be a commodity in a future Web, subject to both B2C selling and B2B trading. In order to be on the winners' side all parties involved in these games will have to rethink their approaches and strategies. Content providers for instance will have to understand the benefits obtained from the systematic generation of metadata; service providers will have to accept metadata as the basis on which to build new services; and the producers of software tools for end-users will redirect their imagination towards more appropriate integration of application software with Web content, taking advantage of metadata.
The short contributions to the workshop were, as expected, about projects at all levels (European, national, institutional) and about project ideas, covering most areas addressed by the IST 'Semantic Web Technologies' action line. They also covered some of the business aspects, with clear ideas on exploitation. While most contributors paid tribute to 'mainstream ontology' research there were some who presented alternative or complementary approaches to capturing content semantics, such as Topic Maps, Notion Systems or Content Dictionaries (developed by the OpenMath project [15] and enabling the interoperability of mathematics software). Although the subject matter of most presentations was rather generic, addressing a fair range of diverse potential applications, there was a visible cluster of projects (or project ideas) centred around multimedia teachware.
The final discussion picked up on several of the issues raised by the invited speakers and in the short contributions. One of the most salient points concerned the construction and management of distributed ontologies: who builds them and who maintains them? Are there sufficiently powerful tools? Important questions for instance for commercial users who need to be sure of the ultimate benefits of work that does not seem to yield profits in the short-term. Such tools should be easy to use in spite of the complexity and sophistication of the structures they operate on. A participant briefly presented an interesting example of a tool based on a neural network paradigm. It extracts metadata from patent abstracts with hardly any human intervention. The need for applying 'Semantic Web' technologies to the human-computer interface was also emphasised. Finally, a participant posed the seemingly open problem of measuring the quality of the knowledge encoded in various 'Semantic Web' formalisms and made available as resources. Certainly a problem worthy of attack.
We repeat: by and large the presentations and discussions at this workshop were not focused on one particular application or application area. This is well in line with the remit of the IAF part of the IST programme which has been defined in the overall description of the programme as follows [16]:
". advanced technologies for the management of information content to empower the user to select, receive and manipulate (in a manner that respects the user's right to privacy) only the information required when faced with an ever increasing range of heterogeneous sources. Improvements in the key functionalities of large-scale multimedia asset management systems (including the evolution of the World Wide Web) will support the cost effective delivery of information services and their usage."
Yet it is obvious and also becomes apparent from this 'definition' that technologies must not be developed for the sake of developing technologies. They should respond to real needs and they will be successful (commercially and otherwise) only if they do so. We believe that 'Semantic Web' technologies meet this requirement fully, without being committed to any single application domain. Hence we invite proposers intending to submit under this action line to make sure the project they propose does not benefit a limited constituency only, or solve just one isolated problem. Rather, projects submitted under a generic action line such as 'Semantic Web Technologies' should, in a final analysis, produce solutions that are widely applicable in the context of Web-like distributed systems.
It should be noted that apart from supporting R&D projects the IST Programme also foresees so-called Accompanying Measures which can take various forms such as Trial and Take-up projects or Networks of Excellence, the latter being designed to foster collaboration and the exchange of know-how among experts working in a particular field. We are confident that such a Network can be set up relatively shortly for the European 'Semantic Web' communities.
It should also be noted that the initiative described in this article is just a beginning. Discussions have now started on the next European RTD framework programme. We believe that the content technologies addressed in this article will become crucial for the development and exploitation of digital content in networks whose physical infrastructure is becoming increasingly powerful in terms of bandwidth and processing speed. They will provide the supporting content infrastructure. Our workshop has certainly confirmed that there should be a prominent place for them in any future IST programme.
(*) Classical examples of collections of metadata are library catalogues consisting, for example, of MARC records describing books and other items belonging to the 'Gutenberg Galaxy'. By contrast, the metadata we have in mind when talking about the Semantic Web pertain to all kinds of digitally representable objects.
- References
- 1. To get an impression it suffices to search for "web technology" in /search/index.cfm?fuseaction=proj.showadvform
- 2. /ist
- 3. http://www.w3.org
- 4. http://www.w3.org/DesignIssues/Semantic.html
- 5. http://www.w3.org/1999/04/WebData
- 6. /ist/workprogramme.htm, see also: Excerpt from the draft IST Work Programme 2001
- 7. /ist/ka3
- 8. /ist/ka3/iaf
- 9. /ist/ka2/welcome.html
- 10. /ist/ka4
- 11. http://www.swi.psy.uva.nl/projects/ibrow/home.html
- 12. http://www.ontoknowledge.org
- 13. Semantic Web Technologies
- 14. http://grid.web.cern.ch/grid
- 15. http://www.nag.co.uk/projects/OpenMath.html
- 16. /ist/b-oj-en5.htm