We are currently researching and enhancing a “standardised approach required for both analysis and evidence provenance”.
The expected progress beyond the state of the art is related with the widespread usage of the Cyber-investigation Analysis Standard Expression (CASE) format by the platform, which is directly related with all data-related components.
This format will facilitate the provenance and exchange of evidence and may also be used for tool validation in future.
The manner in which we map the CASE data format to INSPECTr storage is not only innovative but necessary for data analysis and discovery of linked data.
We map the unwieldy format into data structures that are more suited for fast queries and link analysis; i.e. a combination of scalable storage for binary files, graph databases and indexed documents.
The consortium also expects to progresses various tasks linked with the Natural Language Programming technology, beyond state of the art, which include:
i. Data independence: We are trying to create methods to treat and correlate data that are independent of the input format.
ii. Seriality detection (cross correlation and investigations profiling): Our cross correlation method is original and specialised to find series (patterns/clusters) over LEA data.
iii. Data extraction wrapper for OSINT: We have created a semi-automatic wrapper that can identify the parts of a webpage that are interesting and may be of interest for investigative purposes. It will greatly simplify and speedup the creation of parsers for specific sites.
The development of highly complex use-case and related datasets, similar to real LEA investigations, go beyond the existing “single purpose” available datasets outside this project.
The role and tasks of the criminal analyst, as support for complex investigation, is facilitated by the integration of all potential types of sources of evidence, as by the ingestion and standardisation of the existing commercial and free specialised tools.
Progress beyond the state of the art in terms of ethics in the INSPECTr project exists in three ways.
First, is the application of ethics and privacy-by-design into the law enforcement and digital forensics domain. Much of the current work on principles of ethics and privacy-by-design has focussed on design of commercial technologies, so INSPECTr is applying these theoretical concepts in a new domain.
Secondly, ethical and privacy concerns with respect to LEAs have mostly focussed on how police interact with the public, about surveillance practices, and whether forensic activities are reliably accurate. However, in INSPECTr, ethical concepts about law enforcement are being combined with ethical considerations about technology in a further granular development of principles that leads to concrete design solutions.
Thirdly, as the INSPECTr project is developing beyond-state-of-the-art technologies, this necessarily raises novel ethical concerns. For example, during the Gender and AI workshop, state-of-the-art ethical concerns were discussed in terms of state-of-the-art technologies to develop beyond-state-of-the-art design solutions, this process simultaneously identifies and mitigates discrimination and algorithmic bias issues in the law enforcement/digital forensics domain which has not traditionally been an area of focus for such topics.
At the conclusion of the INSPECTr project, we expect that there will be a more refined methodology for applying ethics and privacy-by-design principles to the law enforcement and digital forensics domains that is sensitive to the needs of end-users of the INSPECTr technologies. Further, a more detailed understanding of the legal frameworks that can be used to adequately regulate the use of LEA closed case file data in research will be reached, and this could be used in future research projects.