WonderWeb: Ontology Infrastructure for the Semantic Web

Livrables

DOLCE is a foundational ontology developed as part of the WonderWeb Foundational Ontologies Library (WFOL). The development of this library has been guided by the need of a reliable set of foundational ontologies that can serve as - Starting point for building other ontologies, - Reference point for easy and rigorous comparisons among different ontological approaches, - Rigorous basis for analyzing, harmonizing and integrating existing ontologies and metadata standards (by manually mapping them into some general module(s) in the library). In addition, the WFOL is meant to be minimal (including only the most reusable and widely applicable upper-level categories), rigorous (the ontologies are characterized by means of rich axiomatizations and their formal consequences explored in some detail), and extensively researched (each module in the library undergoes a careful evaluation by experts and consultation with canonical works). DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) is the first module of WFOL and it is not a candidate for a universal standard ontology. Rather, it is intended as a starting point for comparing and elucidating the relationships with the other modules of the library, and also for clarifying the hidden assumptions underlying existing ontologies or linguistic resources such as WordNet. As reflected by its acronym, DOLCE has a clear cognitive bias, in the sense that it aims at capturing the ontological categories underlying natural language and human commonsense. DOLCE is an ontology that focuses on particulars in the sense that its domain of discourse is restricted to them. The fundamental ontological distinction between universals and particulars can be informally understood by taking the relation of instantiation as a primitive: particulars are entities which have no instances; universals are entities that can have instances. Properties and relations (corresponding to predicates in a logical language) are usually considered as universals and thus are not classified by this ontology (although they occur in as far as they are needed to classify particulars). A basic choice adopted by DOLCE is the so-called multiplicative approach: different entities can be co-located in the same space-time. This assumption allows us to make justice of incompatible essential properties. A classical example is the distinction between a vase and its amount of clay: the vase does not survive a radical change in shape or topology while the amount of clay does. DOLCE assumes that the vase and the corresponding amount of clay are two distinct things, yet co-located, so that we can talk of the shape of the vase (but not of the clay) or the mass of the clay (inherited by the vase) without fear of contradictory claims. Another foundamental feature of DOLCE is the distinction between enduring and perduring entities, i.e. between what philosophers usually call continuants and occurrents. For instance, my copy of the newspaper I bought today is wholly present (and endurant), while some temporal parts of my reading the newspaper is not (a perdurant). The main relation between endurants and perdurants is that of participation: an endurant lives in time by participating in some perdurant(s). Other important notions and relations are characterized in DOLCE, among the notions we recall Qualities, Physical Objects, Social Objects, Events, Processes, Temporal Regions and Spatial Regions; among the relations let us mention Participation, Parthood, and Constitution. DOLCE has quickly become a standard in formal ontology and, thanks to its availability in several formats (like KIF and OWL) with modules specialized for specific subdomains and connections to natural languages resources (like WordNet), it is used by several researchers around the world (see www.loa-cnr.it/DOLCE.html for more information and a partial list of users). Applications using DOLCE as a formal tool for the semantic integration of data span several areas like computational linguistics, agriculture, medicine, cultural resources, banking and insurance organization, legal documents management, software engineering, knowledge engineering, and mobile robotics.

FaCT++ is an implementation of an OWL-Lite reasoner. It is a new generation of the well-known FaCT reasoner. It uses the established FaCT algorithms, but with a different internal architecture. Additionally, the implementation language C++ was chosen in order to create a more efficient software tool, and to maximise portability. During the implementation process, new optimisations were also introduced, and some new features were added. FaCT++ is released under GNU public license so it is available for download both as a binary file and as a source code (see http://owl.man.ac.uk/factplusplus/).

The promise of the emerging Semantic Web Services field is that machine understandable semantics augmenting web services will facilitate their discovery and integration. Several projects used semantic web service descriptions in very different application domains (bioinformatics grid, Problem Solving Methods). A common characteristic of these descriptions is that they rely on a generic description language, such as OWL-S, to specify the main elements of the service (e.g. inputs, outputs) and on a ontology containing knowledge in the domain of the service such as the type of offered functionality (e.g. TicketBooking, CarRental) or the types of service parameters (e.g. Ticket, Car). The quality of the domain ontologies used influences the complexity of reasoning tasks that can be performed with the semantic descriptions. For many tasks (e.g. matchmaking) it is preferable that web services are described according to the same domain ontology. This implies that the domain ontology used should be generic enough to be used in many web service descriptions. Domain ontologies also formally depict the complex relationships that exist between the domain concepts. Such rich descriptions allow performing complex reasoning tasks such as flexible matchmaking. We conclude that building quality (i.e. generic and rich) domain ontologies is at least as important as designing a generic web service description language such as OWL-S. The acquisition of semantic web service descriptions is a time consuming and complex task whose automation is desirable, as signaled by many researchers in this field. Pioneer in this area is the work reported by Andreas Hess which aims to learn web service description from existing WSDL (WSDL stands for Web Service Description Language and is the industry standard for syntactic web service descriptions) files using machine learning techniques. They classify these WSDL files in manually built task hierarchies. Complementary, we address the problem of building such hierarchies, i.e. domain ontologies of web service functionalities (e.g. TicketBooking). This task is a real challenge since in many domains only a few web services are available. These are not sufficient for building generic and rich ontologies. Our approach to the problem of building quality domain ontologies is motivated by the observation that, since web services are simply exposures of existing software to web-accessibility, there is a large overlap (often one-to-one correspondence) between the functionality offered by a web service and that of the underlying implementation. Therefore we propose to build domain ontologies by analysing application programming interfaces(APIs). We investigate two research questions: - Is it possible and useful to build a domain ontology from software APIs?. - Can we (semi-)automatically derive (part of) a domain ontology from APIs? We verified our hypothesis in two different domains. First, we worked in the domain of RDF based ontology stores. Tools for storing ontologies are of major importance for any semantic web application. While there are many tools offering ontology storage (a major ontology tool survey reported on the existence of 14 such tools), only very few are available as web services (two, according to the same survey). Therefore, in this domain it is problematic to build a good domain ontology by analysing only the available web services. Nevertheless, good domain ontology is clearly a must since we expect that many of these tools will become web services soon. We attempted to build a domain ontology by analysing the APIs of three tools (Sesame, Jena , KAON RDF API). The second domain was that of bioinformatics services. We have experimented with extracting domain ontologies from the descriptions of a set of bioinformatics services employed by the myGRID project. In both cases the results were very encouraging, as we were able to extract a significant amount of ontological knowledge. Therefore we believe that our method will be a crucial innovation in the field of Semantic Web Services by supporting domain engineers in their task of building adequate high quality domain ontologies, therefore boosting a web of semantically described services.

Application servers provide many functionalities commonly needed in the development of a complex distributed application. So far, the functionalities have mostly been developed and managed with the help of administration tools and corresponding configuration files, recently in XML. Though this constitutes a very flexible way of developing and administrating a distributed application, e.g. an application server with its components, the disadvantage is that the conceptual model underlying the different configurations is only implicit. Hence, its bits and pieces are difficult to retrieve, survey, check for validity and maintain. To remedy such problems, we here present an ontology-based approach to support the development and administration of software components in an application server. The ontology captures properties of, relationships between and behaviors of the components that are required for development and administration purposes. The ontology is an explicit conceptual model with formal logic-based semantics. Therefore its descriptions of components may be queried, may foresight required actions, e.g. preloading of indirectly required components, or may be checked to avoid inconsistent system configurations - during development as well as during run time. Thus, the ontology-based approach retains the original flexibility in configuring and running the application server, but it adds new capabilities for the developer and user of the system. The proposed scheme resulted in an infrastructure called Application Server for the Semantic Web that additionally facilitates plug'n'play engineering of ontology-based modules and, thus, the development and maintenance of comprehensive Semantic Web applications. Ontologies serve various needs in the Semantic Web, like storage or exchange of data corresponding to an ontology, ontology-based reasoning or ontology-based navigation. Building a complex Semantic Web application, one may not rely on a single software module to deliver all these different services. The developer of such a system would rather want to easily combine different - preferably existing - software modules. So far, however, such integration of ontology-based modules had to be done ad-hoc, generating a one-off endeavour, with little possibilities for re-use and future extensibility of individual modules or the overall system. The infrastructure is implemented in a system called KAON SERVER} which is part of the KArlsruhe ONtology and Semantic Web Toolsuite (KAON) http://kaon.semanticweb.org.

A huge effort has been invested in the development of schema structures for existing information systems, such as XML-DTD, XML-Schema, relational database schemata or UML specifications of object-oriented software systems. The LiFT tool semi-automatically extracts light ontologies from such legacy resources. We restrict our attention to the most important ones, namely the W3C schema languages for XML: Document Type Definitions (DTDs), XML Schema and relational database schemata. We also provide a preliminary translation from UML-based software specifications to ontologies.

The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language that can formally describe the meaning of terminology used in Web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema. In February 2004, the W3C released the Web Ontology Language OWL as a Recommendation. OWL is used to publish and share ontologies, supporting advanced Web search, software agents and knowledge management. OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. This representation of terms and their interrelationships is called an ontology. OWL has more facilities for expressing meaning and semantics than XML, RDF, and RDF-S, and thus OWL goes beyond these languages in its ability to represent machine interpretable content on the Web. OWL is a revision of the DAML+OIL web ontology language incorporating lessons learned from the design and application of DAML+OIL. The definition of OWL was motivated by a number of Use Cases (detailed in the OWL Use Cases and Requirements Document, which also provides more details on ontologies, and formulates design goals, requirements and objectives for OWL. OWL has been designed to meet the need for a Web Ontology Language and is part of the growing stack of W3C recommendations related to the Semantic Web. - XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents. - XML Schema is a language for restricting the structure of XML documents and also extends XML with datatypes. - RDF is a datamodel for objects ("resources") and relations between them, provides a simple semantics for this datamodel, and these datamodels can be represented in an XML syntax. - RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes. - OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. WonderWeb provided significant input to the Standardisation process, with a number of project members serving on the working group. Implementations developed during the project (such as the OWL API and FaCT++) were also key inputs as implementation experience to the standardisation process. The W3C requires that implementations are feasible before documents reach Recommendation status. Key theoretical work (as reported in the publications) underpinning the standardisation process, and possible extensions was also contributed by WonderWeb members.

The need for reliable ontologies in the Semantic Web (SW) has risen two complementary issues that require the development of strong methodologies for ontology development. On the one hand, researchers are constructing new complex foundational ontologies to answer the demand for knowledge structures matching specific views of the world. On the other hand, the management of data and documents already available on the web requires a quite rich number of domain ontologies which should be clear and easily understood by the users. Relatively to the first issue, we have recognized two main distinctions in deveolping foundational ontologies: the descriptive vs. revisionary attitude, and the multiplicative vs. reductionist attitude. A descriptive ontology aims at describing the ontological assumptions behind language and cognition by taking seriously the surface structure of natural language and commonsense. A revisionary ontology, on the other hand, gives less importance to linguistic and cognitive aspects, and does not hesitate to suggest paraphrases of linguistic expressions or re-interpretations of cognitive phenomena in order to avoid ontological assumptions considered debatable on scientific grounds. Regarding the second distinction, we observed and clarified that a reductionist ontology aims at describing a great number of ontological differences with the smallest number of concepts while in a multiplicative ontology expressivity is more relevant: the aim is to give a reliable account of reality despite of the need of a larger number of basic concepts. Once these distinctions are recognized, we have isolated major notions that mark the ontological character of a foundational ontology. These notions can be collected in four major groups, namely; - Universals, particulars and individual properties, - Abstract and concrete entities, - endurants and perdurants, and - Co-localized entities. A strong methodology in the construction of foundational ontologies must include the crucial distinctions we listed above in this way making the developer aware of their consequences. As it was recalled above, the other issue central to methodology is the development of domain ontologies for the SW. Several strategies have been exploited: machine learning, NLP techniques, semantic services, lifting existing metadata, etc. These strategies have different advantages according to the type of documents or domains: machine learning and NLP techniques try to extract useful recurrent patterns out of existing documents, and semantic services try to generate semantically indexed, structured documents e.g. out of transactions, existing metadata can be considered proto-ontologies that can be "lifted" from legacy indexing tools and indexed documents. In other words, metadata lifting ultimately tries to reengineer existing document management systems into dedicated semantic webs. Legacy information systems often use metadata contained in Knowledge Organization Systems (KOSes), such as vocabularies, taxonomies and directories, in order to manage and organize information. KOSes support document tagging (thesaurus-based indexing) and information retrieval (thesaurus-based search), but their semantic informality and heterogeneity usually prevent a satisfactory integration of the supported documentary repositories and databases. As a matter of fact, traditional techniques mainly consist of time-consuming, manual mappings that are made each time a new source or a modification enter the lifecycle by experts with idiosyncratic procedures. Informality and heterogeneity make them particularly hostile with reference to the SW. In this case, our work concentrated on a demonstration of KOS reengineering issues from the viewpoint of formal ontology, therefore the main threads were given in the context of a concrete case study description rather than as explicitly addressed topics. In this work, we described the methodology used for the creation, integration and utilization of ontologies for information integration and semantic interoperability in a specific domain. We expect that the clarifications and the examples provided on this topic will increase the reliability of the new foundational and domain ontologies. Furthermore, the overall discussion should increase the understanding of what ontologies are and how they may be classified according to their overall ontological structure.

To realize the vision of the Semantic Web, the Web Ontology Working Group was chartered to develop a standard language for expressing semantics on the web. The Web Ontology Language (OWL) comprises a standardized syntax for exchanging ontologies and specifies the semantics of the language, i.e. how the syntactic structures are to be interpreted. However, it is unclear precisely how to slice the pie between the disciplines of syntax and semantics in applications. Support for OWL in applications involves understanding how syntax and semantics interact (i.e., their interface). A number of issues relating to this split continually re-occur in the design of Semantic applications, e.g. in the development of OntoEdit, OilEd and KAON. The provision of APIs allows developers to work at a higher level of abstraction, and isolate themselves from some of the problematic issues related to serialization and parsing of data structures. Our experience has shown that application developers can interpret language specifications such as DAML+OIL in subtly different ways, and confusion reigns as to the particular namespaces and schema versions used. The direct use of higher level constructs can also help to alleviate problems with "round tripping" (Round tripping refers to the process where a data structure (e.g. an ontology) is serialized and deserialized to/from some concrete syntax without loss of information) that occur when using concrete transport syntaxes based on RDF. The OWL API attempts to present a highly reusable component for the construction of different applications such as editors, annotation tools and query agents. Besides allowing them to "talk the same language", it ensures that they share underlying assumptions about the way that information is presented and represented. Thus a cornerstone to the successful implementation and delivery of the Semantic Web, namely the interoperability of applications is achieved.

In principle, a foundational ontology should cover (provide notions and allow extensions for) all possible subjects that exist according to the philosophical stands it takes. Because of this, foundational ontologies can be of considerable size and need to include several primitive notions and derived (defined) terms. The overall ontology, comprising handreds of axioms and definitions, is usually quite complex and it is hard to ensure its quality when maintenance and other developments require to change or expand the ontology. To overcome these issues, it has been suggested to divive the foundational ontologies into (sub-) modules. These can be seen as boxes that deals with independent primitive notions or even specialized domains. There are several advantages in having the ontology divided in modules, for instance a module can be analyzed by the developer to check the characterization of a primitive in isolation from the rest of the system or a module can be used to capture (and isolate) specific knowledge that is relative to some application domain only. Also, the relationship among modules gives important information on the interdependence among notions defined in the ontology in this way providing a very informative hierarchy of adopted/covered notions. The user can take advantage of the division in modules by focussing on those more relevant to her work. In this way, she can reach a better grasp of the notions she needs to use without being overwhelmed by the overall structure of the ontology. Furthermore, the structure of the modules facilitates the population of the ontology at lower levels of the hierarchy since it makes it easier to individuate the correct point where the data should be added. The DOLCE ontology has built-in a preliminary distionction in modules according to the motivations listed above. The modules have been isolated taking into account the basic relationships as well as the categories adopted and it is carried out on both the definitional part and the axiomatic part of the formal system. For instance, we have isolated the module of definitions on mereology, the module of definitions based on the notion of perdurants, and the one on dependence. The axiomatization is also divided in modules. For instance we have a module that comprises axioms for parthood, another for constituttion, and a third for participation.

The increasing awareness of the benefits of ontologies for information processing in open and weakly structured environments has lead to the creation of a number of such ontologies for real-world domains. In complex domains such as medicine these ontologies can contain thousands of concepts. Examples of such large ontologies are the NCI cancer ontology with about 27.500 and the Gene ontology with about 22.000 concepts. Other examples can be found in the area of e-commerce where product classification such as the UNSPSC or the NAICS contain thousands of product categories. While being useful for many applications, the size and the monolithic nature of these ontologies cause new problems that affect different steps of the ontology life cycle. Maintenance: Ontologies that contain thousands of concepts cannot be created and maintained by a single person. The broad coverage of such large ontologies normally requires a team of experts. In many cases these experts will be located in different organizations and will work on the same ontology in parallel. An example for such a situation is the gene ontology that is maintained by a consortium of experts. Publication: Large ontologies are mostly created to provide a standard model of a domain to be used by developers of individual solutions within that domain. While existing large ontologies often try cover a complete domain, the providers of individual solutions are often only interested in a specific part of the overall domain. The UNSPSC classification, for example, contains categories for all kinds of products and services while the developers of an online computer shop will only be interested in those categories related to computer hardware and software. Validation: The nature of ontologies as reference models for a domain requires a high degree of quality of the respective model. Representing a consensus model, it is also important to have proposed models validated by different experts. In the case of large ontologies it is often difficult if not impossible to understand the model as a whole due to cognitive limits. What is missing is an abstracted view on the overall model and its structure as well as the possibility to focus the inspection on a specific aspect. Processing: On a technical level, very large ontologies cause serious scalability problems. The complexity of reasoning about ontologies is well known to be critical even for smaller ontologies. In the presence of ontologies like the NCI cancer ontology, not only reasoning engines but also modelling and visualization tools reach their limits. Currently, there is no OWL-based modelling tool that can provide convenient modelling support for ontologies of the size of the NCI ontology. All these problems are a result of the fact that the ontology as a whole is too large to handle. Most problems would disappear if the overall model consists of a set of coherent modules about a certain subtopic that can be used independently of the other modules, while still containing information about its relation to these other modules. - In distributed development, experts could be responsible for an single module and maintain it independently of other modules thus reducing revision problems. - Users of an ontology could use a subset of the overall ontology by selecting a set of relevant modules. While only having to deal with this relevant part, the relations to other parts of the model are still available through the global structure. - Validation of a large ontologies could be done based on single modules that are easier to understand. Being related to a certain subtopic, it will be easier to judge the completeness and consistency of the model. Validated modules could be published early while other parts of the ontology are still under development. - The existence of modules will enable the use of software tools that are not able to handle the complete ontology. In the case of modelling and visualization tools, the different modules could be loaded one by one and processed individually. For reasoning tasks we could make use of parallel architectures where reasoners work on single modules and exchange partial results. In other areas, e.g. software engineering, these problems are tackled by partitioning monolithic entities into sets of meaningful and mostly self-contained modules. We have developed a similar approach for ontologies. We have designed a method for automatically partitioning large ontologies into smaller modules based on the structure of the class hierarchy. We have shown that the structure-based method performs surprisingly well on real-world ontologies. This claim is supported by experiments carried out on realworld ontologies including SUMO and the NCI cancer ontology. The results of these experiments are available online at http://swserver.cs.vu.nl/partitioning/

When ontologies are used as means for describing knowledge about information on the web, we will have a situation in which not only the information on the web changes continuously, but also the knowledge that is used to interpret it. Changes in the ontologies will possibly have effects on the validity of tasks performed with it. We propose a framework for coping with change in distributed ontologies. The framework consists of two major elements. The first element is a language for representing ontology change. For this, we defined a taxonomy of change operations. Because it is influenced by the expressivity of the ontology language considered, the set of operations is to some extent language specific. We derived the set by iterating over all the elements in the meta-model of the ontology language, creating "add", "delete" and---when appropriate---"modify" operations for all elements. In this way, we abstracted from representational issues and had a guarantee that we covered all possible modifications. To decide on which language we would base our change representation, we compare two well-known knowledge representation formalisms: the OKBC knowledge model and the OWL (Full) ontology language. By comparing their respective knowledge models, we conclude that strictly speaking neither of these is a subset of the other. However, it appears that the things that are not present in OWL are quite rare in practice. Therefore, we decide to use OWL as basis for our change operations. In addition to the operations that are directly derived from the knowledge model of the ontology language, we also introduce complex operations. These operations can be used to group together several basic operations, and/or to encode additional characteristics of the change operations. Operations that cluster other operations can be used when the constructing operations form a logical unit (e.g. removing something and adding it somewhere else), and when the composite effect of operations is different from the effect of operations on their own. Operations that encode additional knowledge can be used to define specialized variants of other operations, e.g. an operation that specifies that the range of a property is restricted instead of just modified. Complex operations are useful for both visualizing and understanding changes and for determining their effect. The possibility to define complex changes forms an extension mechanism that allows for task- or domain-specific representations of change. The framework consists---besides a representation for changes---also of an abstract process model for ontology change management. Basically, this model describes the following steps: - Change information should be created from the sources that are available, - Heuristics, algorithms or human input should be used to enrich this information (e.g. resulting in a set of change operations), and - Ontology evolution related tasks can be performed with help of the enriched change information. We jointly developed two tools that can be used to create change information. We also specify several processes for deriving new information from existing change information. In addition, we describe how to perform four ontology evolution related tasks. First, we explain how we can use an ontology to access or interpret instance data of another version of the ontology. Second, we describe a procedure that heuristically determines the validity of mappings between ontology modules. This procedure predicts whether subsumption reasoning within one module is still valid if changes have occurred in an ontology from which concepts or relations are imported. Third, we adapt a methodology for the synchronization of related, but independently evolving ontologies to be used within our framework. Finally, we show a tool that visualizes changes at an abstract level to help people with understanding these.

Recherche de données OpenAIRE...

Livrables

Partager cette page

Télécharger