SEARCHIN: Searching in a networked world
The successful paradigm of searching for textual information on the Web cannot be applied directly to support searching within the meta-information bases of emerging network-centric infrastructures. Therefore, the searching problem needs to be addressed in the context of the networked-world, and search needs to become an integral part of future infrastructures. Through the proposed TOK-DEV project, the host organization seeks to expand and combine its competences in parallel and distributed computing to other areas of Computer Science (meta-data, information retrieval, semantics) required for achieving the groundbreaking scientific and technological advances necessary to bring the simplicity and effectiveness of Web search right at the core of future global, network-centric computing systems.
The project will support the integration of 3 incoming experienced researchers in the host organization, along with the transfer of knowledge from 3 incoming senior researchers, and the exposure of local researchers to research activities of partner organizations during 4 outgoing visits. This project will create a unique and sustainable team able to achieve scientific contributions that will realize the vision of effective and user-friendly search tools for network-centric-infrastructures.
UNIVERSITY OF CYPRUS
Kallipoleos Avenue 75
NATIONAL AND KAPODISTRIAN UNIVERSITY OF ATHENS
UNIVERSITY OF MANCHESTER
Final Activity Report Summary - SEARCHIN (SEARCHing In a Networked world)
In particular, we worked on the following cases:
1. Software Retrieval in Grid and Cloud Computing Infrastructures: Software retrieval is concerned with locating and identifying appropriate software resources to satisfy user's requirements. It is considered to be one of the key technical issues in software reuse since "You must find it before you can reuse it". In this topic, we investigated the problem of supporting keyword-based searching for the discovery of software resources that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures. We addressed a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. We designed, developed and evaluated Minersoft, a harvester that visits Grid/Cloud infrastructures, crawls their file-systems, identifies and classifies software resources, extracts metadata, and discovers implicit associations between them.
The results of Minersoft harvesting are encoded in a weighted, typed graph, named the Software Graph. A number of algorithms were proposed to enrich this graph with structural and content associations, to annotate software resources with keywords, and build inverted indexes to support keyword-based searching for software. Using a real test bed, we presented an evaluation study of our approach, using data extracted from production-quality Grid and Cloud computing infrastructures. Experimental results show that Minersoft is a powerful tool for software retrieval. The incoming fellow Dr George Pallis initiated the work in software retrieval. The work involved post-graduate and under-graduate students, and has so far led to one M.Sc. thesis at the University of Cyprus (another M.Sc. thesis is under way). Results have appeared in premium conferences and scientific journals, and datasets derived from this work have been made available to the scientific community.
2. A query Formulation Language for the Data Web:
In this work, we presented a query formulation language (called MashQL) aiming at supporting the query and fusion of structured data published on the Web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one (or multiple) data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most practical cases, we do not assume that a data source should have -an offline or inline- schema. This poses several language-design and performance complexities that we addressed in our work. To illustrate the query formulation power of MashQL, and without loss of generality, we focused on a Data Web scenario, where data sources publish their data in RDF format. We chose to focus on querying RDF, as it is the most primitive data model; hence, MashQL can be similarly used for querying relational databases and XML.
We developed two implementations of MashQL, an online mashup editor, and a Firefox add-on. The former illustrates how MashQL can be used to query and mash up the Data Web as simple as filtering and piping web feeds; and the Firefox add-on illustrates using the browser as a web composer rather than only a navigator. To end, we evaluated MashQL on querying two datasets, DBLP and DBPedia, and show that our indexing techniques allow instant user-interaction. The incoming fellow Dr Mustafa Jarrar initiated this line of work. The work involved post-graduate and under-graduate students, and has so far led to two M.Sc. theses and one B.Sc. thesis at the University of Cyprus. Results have appeared in premium CS conferences and scientific journals.
3. Information Dissemination and location queries in dynamic ad hoc networks:
VANETs have emerged as a platform to support intelligent inter-vehicle communication, to improve traffic safety and performance, and to collect and disseminate urban sensing data. The road-constrained and high mobility of vehicles, their unbounded power source, and the emergence of roadside wireless infrastructures make VANETs a challenging research topic. A key to the development of protocols for the retrieval of sensory information collected by moving vehicles lies in the knowledge of the topological characteristics of the VANET communication graph. In this line of work, we used data mining techniques to explore the dynamics of VANETs in urban environments. Using both real and realistic mobility traces, we studied the networking shape of VANETs in urban environments under different transmission and market penetration ranges. Several latent facts about the VANET graph have been revealed and implications for their exploitation in protocol design are examined. In this context, we developed a graphical-oriented real time visualisation tool for vehicular ad-hoc network connectivity graphs.
This work was conducted by incoming fellows Dr G. Pallis and Dr M. Spanakis, in collaboration with faculty members from the University of Cyprus, and several post-graduate and undergraduate students. Other work in this thread of work focused on algorithms for reporting to every mobile user its k closest peers at any time. Our algorithm performs efficiently given the high mobility, large number, and skewed distribution of real world cellular phone users. Our technique is applicable on any type of existing cellular network infrastructure. We exploit the arrangement of cellular network base-stations to achieve constant time (O(1)) for answering a single k-nearest neighbour (kNN) query and linear time (O(n)) for answering a all k-nearest neighbour (AkNN) query. This is the first work to tackle the AkNN problem in spatio-temporal applications, where both objects and queries are moving. Our theoretical analysis shows that we outperform state-of-the-art AkNN algorithms in our cellular network setup (O(n) as opposed to O(nlogn)). Our experimental evaluation shows that we outperform state-of-the-art techniques for answering continuous kNN queries. This work was initiated by incoming fellow Dr G. Chatzimilioudis, in collaboration with faculty members from the Universities of Cyprus and Athens. A number of post-graduate and undergraduate students are also actively involved in implementation issues.
4. Power-efficient Architectures for Data Centres:
Power-efficiency is becoming a critical problem in the operation of Data Centres. Incoming fellow, Dr Michele Mazzucco, initiated this line of work, applying his expertise in queuing theory to model some aspects of power-aware management of data-centre workloads.
Deliverables not available
Publications not available