Servizio Comunitario di Informazione in materia di Ricerca e Sviluppo - CORDIS

Periodic Report Summary 1 - SIIP (Speaker Identification Integrated Project)

Project Context and Objectives:
Fused Speaker Identification:

SIIP will develop a high performance innovative and sustainable Speaker Identification (SID) solution, running over a large voiceprint (voice sample) database (blacklist). This solution is based on integration and fusion of a series of state of the art Speech Analytic Algorithms which includes Voiceprint recognition, Gender Identification, Age identification, Language and Accent Identification, Keyword and Taxonomy spotting and voice cloning detection. This fusion will be the core technology of SIIP and will be named as ‘F-SID’ (Fused Speaker Identification) is the core technology of SIIP which will result in much higher true-positive identification of individuals, compared to each of these algorithms as stand-alone, reducing in a scale, the False-Positive/ False-Negative detection and increasing the reliability, confidence and judicial admissibility of the speaker identification.

SIIP solution adaptability:

SIIP solution will enable LEAs to run F-SID over all voice stream communication mediums that can be lawfully intercepted, by court warrant against a specific suspect, including: Internet VOIP apps (some of these applications are encrypted e.g. Skype, Tango, Viber and ooVoo are not encrypted) such as G-talk, Skype (although Skype is encrypted, LEA may use Black Box to decrypt it getting the encryption keys from the vendors and use the given keys to decrypt Skype inside the monitoring/decoder), Viber, Tango, ooVoo; PSTN; Cellular and SATCOM (satellite communication). Other voice communication mediums are “recorded” type and include social media such as YouTube or Facebook.

SIIP Solution scalability:

SIIP can be implemented at the headquarters level (where the monitoring center is placed) and at the tactical level (where the LEA investigation is done in geographical vicinity to the suspect location, by portable system that intercepts cellular channels, WiFi, xDSL or LAN), while exchanging data, in real time between these two levels (Tactical<>Headquarters).
Multisource of voiceprints enrolment (Voiceprints enrolment is done by voice sampling voice sampling): For better enrolment of suspect voiceprints, SIIP will make use of various enrolment methods:
(i) Enrolment of voiceprints at the suspect point of presence (POP) where high quality voice recording sensors and kits should be deployed (covertly) near the suspect physical presence. (ii) Enrolment of voiceprints from open source (public) social media (e.g. YouTube, Facebook) where individuals upload content which includes their voice. (iii) Enrolment of voiceprints from lawful intercepted calls from telephony/Internet peer to peer VOIP sources.

Rich metadata enrolment:

SIIP will enrol rich metadata that will be associated to the enrolled voiceprints. The rich metadata that includes for example, IP addresses, e-mail addresses, chat nick-names, social network identifiers, social network connections, cell phone identities and many more. This rich data will be enrolled from Internet open sources and from lawful intercepted telephony and Internet (VOIP) data.

International cooperation between LEAs:

The use of the “Sharing Center” would significantly improve effectiveness of international police cooperation and assists in the identification of suspects who can then be brought to justice.
SIIP consists of a centralizing secured database, to be located at the ‘INTERPOL’, for analysing voice samples (retrieving and analysing voice prints for identification purposes) enhanced with rich metadata of individuals subject to request for international police cooperation.

Privacy by design concept:

SIIP integrated solution will endorse the privacy by design concept and ensure that privacy is respected at the level of each module developed and at the level of the overall system integration. Safeguards will be included at different levels of the system to prevent misuse of the SIIP solution.

Project Results:
The first year of the SIIP project aimed at defining the requirements and ethics issues, the definition of the architecture of the SIIP solution as well as laying the foundations for work on the various developments needed to realize the proof of concept system and the final system including the metadata fusion.

The concrete results of the SIIP project for the first year of its activities were:

• The precise operational requirements of end-users for speaker identification within the INTERPOL, police and other law enforcement agencies (LEAs) have been identified.
• The end users requirements were used to define the functional specifications and to establish criteria for later validation and testing.
• Use cases were defined to identify the different types of utilization for the SIIP system (incl. the SISC) by the end users that will be used to determine the scenarios for the field tests at the last phase of the project.

• Ethical, legal and social issues arising in the use of speech recognition technology in the field of criminal justice were identified, analysed, and assessed
• A practical guidance was produced for end-users of speech identification technologies (e.g. risks of using such technologies in their work);
• Advice was provided throughout the course of the project so to ensure that SIIP end-products are fully compatible/ compliant with current trends in European and international privacy and data protection standards (e.g. PbD principles).

Preparation of both high level architecture of the entire system and a detailed design per each module through
• The End User and Functional requirements, while addressing the DoW PBD (Privacy by Design) guidelines.
• The consortium designed SIIP Hardware & Software layout, and a recommendation document provided to the LEAs in the project. We tested the design and layout in a Test bed environment.

Within the 1st period of the WP4 the consortium members made the following progress:
• Based on the input from other work packages, WP4 members designed and developed two non-relational database systems to maintain information coming from open source intelligence and from lawful interception systems (The interception system is implemented by a simulator within the SIIP project)
• Setting up of several social network crawlers that are capable of gather information from Facebook, Twitter, Google Plus, LinkedIn and YouTube
• Setting up an engine on top of the crawlers that can initiate searches within the entire OSINT search ecosystem and bring the results to a common data structure
• Development of a lawful interception simulator that can emulate real interception systems with all of their facilities e.g. sound quality, typical noise, data structure, meta information.
• Communication interfaces with other modules

• H-SIIP Enrollment System is merging two enroll engines
• H-SIIP Enrollment System allows creation of custom metadata fields
• H-SIIP Enrollment services integrated into web-lab orchestrator
• T-SIIP Application is integrating NUAN mobile library
• Initial design of the Info Sharing Center interfaces

• New methods applied on top of state-of-the-art i-vector extractor were developed and tested
• Model adaptation exploiting unsupervised techniques were researched in the context of SIIP
• Adaptation toward robustness (noise ad channel mismatch) was analysed using VTS techniques
• Novel technologies inspired by ASR research were deployed for speaker-identification
• Development of the first version speaker-identification engines (supporting enrollment and identification and communicating with enrollment database and orchestrator)

• Preliminary implementation of the initial Prototype Portal based on specifications delivered by end–users available for requirements elicitation
• Testing scenarios on each modules separately (e.g. HTTP Server, Database Module, Socket Module, etc.) to identify possible point of failures
• Generation of back-end engines and front-end layout (User Interfaces)

• The integration framework (SW and HW) were setup
• The integration of state-of-the-art speech processing modules provided by WP6 and WP5 has started
• The integration of the data collection workflow provided by WP4 has started

Perform initial dissemination activities to establish awareness of the SIIP project through:
• Organization of the first SIIP workshop collecting end-users feedback for the requirements
• Promotion of the SIIP project through presentations in specific conferences
• Creation of the SIIP project website

Potential Impact:
Economic - SIIP breakthrough technologies for voice biometrics and speaker identification will enable LEA's a most accurate and fast detection of the suspect identity during lawful investigations and could highly contribute to identify terrorists or criminals threats in time, solve hostage takings and demand of ransom cases. This will promote to ameliorate the high costs of terror and crime, providing an effective suspect identification that can be also translated to substantial monetary savings. It will significantly shorten the time invested by LEAs in chasing the wrong leads.

The work achieved so far will contribute impacting significantly speech recognition technologies. The consortium partners expect SIIP to have the following impacts on society and economy:

Enhanced tools for fighting organized crime and terrorism - Speech recognition technology is an important tool in successfully prosecuting kinds of illegal activity involve actors who are forensically aware, sophisticated and able to pay large sums for technological help and inside information

List of Websites:

Reported by