Semantic Web and Peer-to-Peer

The Bibster system is an instance of the SWAP platform and shows the scalability of knowledge sharing by semantic web and peer-to-peer technology. The system is aimed at researchers that share bibliographic metadata. Bibster builds on the fact that many researchers in computer science keep lists of bibliography data that they must laboriously maintain manually, for which they do not have an easy overview and that has greatly varying data quality. At the same time many researchers are very willing to share these resources if they do not have to invest work in doing so. On the one hand Bibster provides a scalability case study for the SWAP platform and gives valuable insights into issues such as technological stability and scalability and, in particular, into research questions. On the other hand Bibster is a powerful tool for the research community and enables them managing and handling their bibliographic metadata effectively. The application has reached an executable status and has become Open-Source software. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Peer-to-Peer systems have proven to be an effective way of sharing data. Modern protocols are able to efficiently route a message to a given peer. However, determining the destination peer in the first place is not always trivial. We developed a model in which peers advertise their expertise in the Peer-to-Peer network. The knowledge about the expertise of other peers forms a semantic topology. Based on the semantic similarity between the subject of a query and the expertise of other peers, a peer can select appropriate peers to forward queries to, instead of broadcasting the query or sending it to a random set of peers. To calculate our semantic similarity measure we make the simplifying assumption that the peers share the same ontology. We evaluated the model in a bibliographic scenario, where peers share bibliographic descriptions of publications among each other. In simulation experiments we show how expertise based peer selection improves the performance of a Peer-to-Peer system with respect to precision, recall and the number of messages. Furthermore, the model is evaluated in the Bibster case study. More information on the SWAP -project can be found at: http://swap.semanticweb.org

In Peer-to-Peer systems, the shared data is typically distributed redundantly with possibly inconsistent representations. Consequently, one has to cope with query results that contain multiple instances that actually represent identical entities. To make query results useful, duplicates need to be detected. We developed an ontology-based model for duplicate detection using semantic similarity functions. We exemplarily instantiated the model for Bibster, a bibliographic Peer-to-Peer system, in which researchers share bibliographic metadata about publications. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Huge amounts of bibliographic metadata are stored in BibTeX files. Many researchers have accumulated extensive collections of BibTeX files for their bibliographic references. However, these files are semi-structured and thus single attributes may be missing or may not be interpreted correctly. Another problem is that there are no well-defined interfaces for the exchange of standard BibTeX files. For interchanging bibliographic data in a semantically based Peer-to-Peer network it has to be represented in a structured and formal way. The usage of standardized representations is decisive for sharing knowledge with other peers. This result is the implementation of BibToOnto, which is a component of Bibster for extracting explicit knowledge of bibliographic items. Plain BibTeX will be transformed into an ontology based knowledge representation. This transformation is used to give meaning to the information structures that are to be exchanged between peers. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Core assumption within SWAP is to have multiple peers. All these peers act individually, thus having own ontologies representing their knowledge. To allow easy interoperability equal entities have to be identified, even if they originate from different peers. This functionality is provided by the alignment plug-in for the OntoEdit environment. OntoEdit is generally used for ontology editing in the Xarop application. Through clicking the alignment command the local repository is checked for entities, which can be aligned. The result is presented to the user, who can then decide if the mappings should be explicitly added to the local repository. Various ontology features (based on RDF(S)) and similarity measures are used to gain the best possible alignment proposals. More information on the SWAP -project can be found at: http://swap.semanticweb.org

In peer-to-peer networks, finding the appropriate answer for an information request, such as the answer to a query for RDF(S) data, depends on selecting the right peer in the network. We have investigated how social metaphors can be exploited effectively and efficiently to solve this task. To this end, we have defined a method for query routing, REMINDIN, that lets: - Peers observe which queries are successfully answered by other peers, - Memorizes this observation, and, - Subsequently uses this information in order to select peers to forward requests to. REMINDIN has been implemented for the SWAP peer-to-peer platform as well as for a simulation environment. More information on the SWAP -project can be found at: http://swap.semanticweb.org

SeRQL is an RDF query language that has been developed on the basis of experiences with implementing and using different state-of-the-art query languages such as RQL and RDQL1. The language is currently being implemented in the Sesame System, which is used to store the knowledge of peers in SWAP. The main feature of SeRQL that go beyond the abilities of existing languages is the ability to define structured out put in terms of an RDF graph that does not necessarily coincide with the model that has been queried. Some of SeRQL's most important features are: - Graph transformation. - RDF Schema support. - XML Schema datatype support. - Expressive path expression syntax. - Optional path matching. SeRQL is now the main query language that is supported by the Sesame RDF storage and retrieval system. Sesame is one of the most widely used RDF storages with a large user and developer community. Sesame is the backbone of product development and project work at the Dutch company Aduna. Typical projects include the creation of information portals for large organizations. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Knowledge management solutions relying on central repositories do sometimes not meet expectations, since users often create ad-hoc knowledge using their individual vocabulary and their own decentral IT infrastructure (e.g., their laptop). To improve knowledge management for such decentralized and individualized knowledge work, it is necessary to, first, provide a corresponding IT infrastructure and to, second, deal with the harmonization of different vocabularies/ontologies. This result focuses on the harmonization of the participating ontologies in a distributed environment. Thereby, the objective of this harmonization is to avoid the worst in congruencies by having users share a core ontology that they can expand for local use at their will and individual needs. The task that then needs to be solved is one of distributed, loosely-controlled and evolving engineering of ontologies. More information on the SWAP -project can be found at: http://swap.semanticweb.org

As SWAP (Semantic Web and Peer-to-Peer) is a system without any central repository, every participant has to be enabled to easily provide his own knowledge. The combination of the extracted structures from his personal computer with other background knowledge allows the creation of ontologies, viz. Emergent Semantics. Currently Ontoscrape can extract structures from: folders and files from MS Windows systems, emails and their structures from MS Outlook, addresses from the address book, and bookmark structures from the MS Internet Explorer. The user can choose the structures he would like to share through an intuitive user interface. The knowledge sources are then integrated automatically into the knowledge repository of the local peer. Even though the use seems very simple, various processes are working in the background. Ontoscrape connects to the different systems such as MS Outlook and the Internet Explorer. Files are extracted including attributes as author, creation date, format, etc., emails additionally extract topics and email addresses, the address book entries are saved with the most important attributes as name, email, and address, and the favourite links are saved with URL and title. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Integrated platform is a software framework which enables development of various solutions which facilitate exchange of semantically reach data between loosely connected bodies in a Peer-to-Peer fashion. The modular architecture allows easy deployment of new applications build on top of the platform. The software provided manages peer-to-peer communication and exchange of messages. The messages received from the network and generated from the user interface are routed in the form of RDF triples through a set of processing modules. Integrated platform provides simple implementations for all of those, but new modules can easily be built and integrated into the framework. The platform is written in Java language. Peer-to-peer network layer is built on top of Suns JXTA technology. The platform is used as a foundation for two applications built for SWAP case studies: Xarop and Bibster. SWAP integrated application platform has become Open-Source software and is available for further extensions to interested parties. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Sesame is an architecture that allows persistent storage of RDF data and schema information and subsequent querying of that information. For persistent storage of RDF data, Sesame needs a scalable repository. Naturally, a Data Base Management System (DBMS) comes to mind, as these have been used for decades for storing large quantities of data. As we would like to keep Sesame DBMS-independent, all DBMS-specific code is concentrated in a single architectural layer of Sesame: the Storage And Inference Layer (SAIL). This SAIL is an application programming interface (API) that offers RDFspecific methods to its clients and translates these methods to calls to its specific DBMS. For some applications such as P2P information sharing, the use of a DBMS as a storage device is not a good option because of the administrative overhead created by the database. In case of rather small data sets and the need for fast response time, a lightweight solution is preferable. In the SWAP project an in memory storage for RDF data has been developed as a lightweight solution. The repository implementation can be accessed via the SAIL API and is part of the official Sesame distribution. The implementation has been show to outperform the database-based implementations by orders of magnitude wrt. response time and caused a significant speedup in the exchange of RDF data in the SWAPSTER system. The development of the in memory storage makes Sesame a light weight storage infrastructure that is better suited for distributed and mobile applications. More information on the SWAP -project can be found at: http://swap.semanticweb.org

Deliverables

Share this page

Download