"Over the last decades, the amount of data has increased in an unprecedented rate, leading to a new terminology: ""Big Data"". Big data are specified by their Volume, Variety, Velocity and by their Veracity/Imprecision. Based on these 4V specificities, it has become difficult to quickly acquire the most useful information from the huge amount of data at hand. Thus, it is necessary to perform data (pre-)processing as a first step. In spite of the existence of many techniques for this task, most of the state-of-the-art methods require additional information for thresholding and are neither able to deal with the big data veracity aspect nor with their computational requirements. This project's overarching aim is to fill these major research gaps with an optimised framework for big data pre-processing in certain and imprecise contexts. Our approach is based on Rough Set Theory (RST) for data pre-processing and Randomised Search Heuristics for optimisation and will be implemented under the Spark MapReduce model.
The project combines the expertise of the experienced researcher Dr Zaineb Chelly Dagdia in machine learning, rough set theory and information extraction with the knowledge in optimisation and randomised search heuristics of the supervisor Dr Christine Zarges at the University of Birmingham (UoB). Further expertise is provided by internal and external collaborators from academic and non-academic institutions, namely Prof Tino (UoB), Prof Merelo (University of Granada), Prof Lebbah (University of Paris 13) and Philippe Barra (Arrow Group). The involvement of Arrow Group, an SME based in France specialised in Big data, Banking, Finance & Insurance is of particular importance to ensure that real-world requirements are met throughout the development of the framework.
Fields of science
- natural sciencescomputer and information sciencesartificial intelligenceheuristic programming
- natural sciencescomputer and information sciencesdata sciencebig data
- medical and health sciencesclinical medicinecancerbreast cancer
- natural sciencesmathematicsapplied mathematicsmathematical model
- natural sciencescomputer and information sciencesdata sciencedata mining
Call for proposal
See other projects for this call