CORDIS - EU research results

Speaker Identification Integrated Project

Final Report Summary - SIIP (Speaker Identification Integrated Project)

Executive Summary:
One of the prominent challenges encountered by law enforcement agencies (LEA) fighting crime and terrorism is the use of multiple and arbitrary identities by terrorists and criminals.

The SIIP project aimed to overcome this to enable LEAs to have better intelligence capabilities while working within the appropriate privacy, legal and ethics considerations.

During its 4 year operations, the SIIP project developed a high performance innovative Speaker Identification (SID) solution, running over a large voiceprint database. The solution was based on integration and fusion of a series of state of the art Speech Analytic Engines which includes Voiceprint recognition, Gender Identification, Age identification, Language and Accent Identification and Keyword and Taxonomy spotting. This fusion was the core technology of SIIP and we named it ‘F-SID’ (Fused Speaker Identification). The SIIP integrated solution was based on the "privacy by design" and privacy was respected at the level of each module developed and at the level of the overall system integration. Substantial safeguards, including a dedicated ethics advisory board, were put in place at the start of the project. The board continued to operate until after the final review approximately 2 months after the end of the SIIP project.

The SIIP system was designed, developed and tested together with INTERPOL and four LEAs in Europe (Carabinieri-Italy, Metropolitan Police -UK, PJ-Protugal and BKA-Germany), taking into account and complying with European legal, societal, privacy and ethical norms as well as INTERPOL’s rules and regulations. An EU legal framework and code of conduct was set up to enable the use of the SIIP solution in accordance with EU regulations at the EU level and at each EU country level and in a way which is compatible with INTERPOL’s rules and regulations.

Testing the technology showed results of much higher true-positive identification of individuals, compared to each of these speech analyticengines as stand-alone, reducing in a scale, the False-Positive/ False-Negative detection and increasing the reliability, confidence and judicial admissibility of the speaker identification.

Also the international exchange of speaker identification data was tested and seems to have the potential to significantly improve effectiveness of international police cooperation and assist in the identification of suspects who can then be brought to justice. The facilitation of such cooperation through INTERPOL, provided that INTERPOL would receive the mandate to do so, could have a significant positive impact on fighting against crime and terrorism.

The partners in SIIP were on the forefront of technological development in this area and relished the joint work in order to address the challenges of voice analysis and speaker identification. The SIIP consortium consisted of a multidisciplinary team comprising research groups and commercial companies. There are 5 research institutes (IDIAP, INOV, WARWICK, RUG, LSC), 5 industrial partners (VRNT, SING, IBG, CS, NUAN) and 4 specialized SME’s (SAIL, DFI, SNTMA / LIVE and OK2GO). Each of the partners brought complementary expertise to the project. The support by INTERPOL brought the results of SIIP to approximately 190 member countries around the globe.

Project Context and Objectives:
Identity confirmation as a core issue for LEAs
To date, one of the prominent challenges encountered by law enforcement agencies (LEA) and security agencies (SA) in fighting crime and terrorism is the use of multiple and arbitrary identities by terrorists and criminals. Being tracked by LEAs, they use increasingly sophisticated means to hide their real identity and real activities in the telecommunication domain (PSTN, Cellular, SATCOM) and in the Internet domain (peer to peer VOIP apps and social media) in order to mislead the LEAs and to make their tracking or monitoring very difficult or almost impossible.

For example, criminal and terrorists can use randomly multiple prepaid cell-phones, replacing and switching between them frequently, knowing that linking prepaid cell-phone identity (MSISDN/IMSI/IMEI) with the real subscriber identity is very difficult. Moreover when using post-paid cell-phones, the criminals/terrorists change the SIM cards occasionally creating a real difficulty to link between all these SIM cards identities (‘IMSIs’). They may even use any public phone in the street or in a nearby coffee shop, a roamer phone or even a passer-by cell phone. In the Internet and social-media, the criminals and terrorists use easily, many different identities and nick names through various applications (such as: Whatsapp, SkypeYoutube, facebook and many more ).

Another challenge that LEAs/SAs face is the ‘Unknown 2nd side’ (or unknown participant) in a conversation with a suspect which is being lawful intercepted. This problem is another side of the first challenge above and is derived from it. It is important for LEAs to know who both participants are in a lawfully intercepted call, as unknown 2nd side conversations are estimated to be 30% of all transcript products in lawful interception.

The third challenge for LEA's is the possibility to use performing and efficient Voice recognition (‘VR’) biometric technologies while preserving the public's privacy and conducting ethically in a way that respects societal norms. For example, innocent callers who use suspect's phone routinely and therefore should not be eavesdropped upon (unless they are forced by the suspect to communicate with another suspect/criminal). Or another example, suspect family's members who use the suspect phone at their home routinely for personal business, for their personal matter, although the phone under a court warrant permitting lawful interception. These "innocent" calls must be filtered out from the Lawful Interception process. (Nevertheless, where innocent people are forced by the suspect to communicate with other suspects or criminals, these calls should be identified and intercepted).

Additional challenges that LEA face in the context of speaker identification reliability are:
• Judicial admissibility of speaker identification results depends on national legislation which is strongly influenced by the reliability of the automated voiceprint analysis.
• A challenge to have speaker identification results presented in a standardized format before the court to enhance such reliability. It would indeed avoid subjective interpretation in the final written account.

The SIIP project
The SIIP project aimed to overcome the above challenges in order to enable LEAs to have better intelligence and incrimination capabilities while responding to the privacy preserving, legal and ethics considerations.

The developed SIIP solution addressed the three steps of the Speaker Identification Process:
(1) Intercepted voice record collection and calibration (calibration per data source, dialects, etc);
(2) Enrolment of voiceprints i.e. collection of good quality voice samples of speaker voiceprints, cleansing it, and saving it in a voiceprint reference database, the “blacklist”); and
(3) Identification (finding a match between the intercepted voice and the voiceprint reference database.

Main Research & Technical Objectives
The research objectives were divided into three categories: (A) Technological, (B) Legal, Ethical and Societal and (C) Use cases, Info sharing procedures and operational methodologies.

A. Technical Objectives

1. Developing and integrating SIIP solution, including:
– Unknown Participant Speaker Identification)using the F-SID analysis 1:N (‘1’ designates the unknown participant speaker ; ‘N’ designates the blacklist size of the voiceprints database with voice samples of N different individuals). F-SID 1:N will try to find a match between the individual intercepted voiceprint and between the ‘N’ different voiceprints in the blacklist database.
– S-SID (Social Media SID) using the F-SID analysis M:N ; M>>N. ‘M’ designates the number of content files with voice, collected from the open source social media; ‘N’ designates the blacklist size of the voiceprints database with voice samples of N different individuals). It will be possible to collect ‘M’ files which is much bigger in size than ‘N’ (therefore, it is indicated that M>>N). F-SID M:N, will try to find a match between the M collected files and between the N voiceprints in the blacklist. The ‘matching’ process may result in detection of few files that were created by individuals who are part of the blacklist. For example, typical figures can be: M~1,000,000, N~1,000.
– Innovative and breaking through speech analytic algorithms.
– Use high quality suspect metadata and voiceprints enrolment technologies/devices.
2. Employ secured, privacy-preserving, info sharing mechanism and international cooperation between LEAs at the EU level by developing and implementing the SIIP Information Sharing Center (SISC) of voiceprints and metadata, which will be located at and administered by the ‘INTERPOL’ in accordance with its rules and regulations.
3. Multiple modes of deployment of the SIIP solution: Implementation of the Speaker ID at the LEA headquarters level and also at the Tactical level as a portable system.

B. Legal, ethical, and societal acceptability research
SIIP Legal, Ethical and Societal Objectives are:

– Creation of code of conduct for SIIP operation by end-users and to ensure compliance with EU/International ethical and privacy regulation and Democratic rights of individuals as stated in the EU Charter of fundamental rights
– Content and details to reveal at the hi-score and positive match alerts
o What should be presented to the user (reason for the alert -Taxonomy, Pre-defined Keywords Detection, Positive SID, etc.).
– Warrant handling procedures for:
o Suspect voiceprint enrolment (where, when and how).
o Getting access to Speaker ID, Call Content and Call Data in case of hi-score Alert
o Information sharing procedure: centralized info sharing management by the INTERPOL, Distribution of public key for encryption of suspect records and private key for decryption; Standard operating procedures respectful of potential confidentiality issues, based on restrictions decided by the source of the information.

– Creation of an ethical framework for ensuring the ethical compliance and Individual/public privacy preserving during the development, integration and testing of the SIIP system.
– SIIP will be completely compatible with the latest insights on ethics and privacy by design and it will be as little intrusive as possible.

– Analysis of the societal impact of SIIP solution to ensure that the research done meets the needs of the society.
– Show the substantial benefits of such a solution for the society.
– Ensure that the research won’t have negative impacts on the society when implemented and when deployed.

C. Use cases, Info sharing procedures and operational methodologies
– Define operational methodologies, use cases, training guidelines and intervention strategies to enable an optimal use of SIIP systems across Europe (for its two level implementation modes, Headquarters and tactical) and beyond.
– Facilitating the SIIP knowledge and info sharing between end users in Europe and beyond by an adequate operational methodologies kit.
– Recommend standardization policy for SIIP (operations and info sharing).

Project Results:
The results of SIIP can be summarized as follows:

• Fully developed system installed at SIIP LEA partners
Most importantly, the envisaged SIIP solution was fully developed, installed at LEAs and tested by the LEAs involved in the SIIP project as well as others. The system was extensively used to test speaker identification and the test results indicated a significantly higher identification of people then existing technology.
• Well defined user requirements and use-cases
WP1 developed a series of highly useful and detailed user requirements as well as use cases. These were based on a series of real-life scenarios provided by the various end-users. These requirements and use cases steered the development and testing throughout the SIIP project.
• Large community of users and experts involved in the project
Mainly through Interpol, but also through the other LEAs and REA, many potential endusers of the SIIP technology were made and kept aware of the R&D in the SIIP project. Three large scale demo and test events enabled the people to see, feel and "touch" the technology.
• Clearly defined guidelines for legal, ethical and societal acceptance
The WP2 matrix of ethical, legal and societal recommendations focuses on the operation of the SIIP Simulator and the final SIIP solution. Specifically for the SIIP Simulator (stage of development) the GDPR was applicable. For the final SIIP solution, the stage of deployment and use, the LED is applicable.The WP2 matrix can be used for similar projects in the future.
• Legally defined info-sharing mechanisms in terms of privacy & data protection
As part of WP2 and WP3, the SIIP partners worked to understand how sensitive personal data like voiceprint can be shared. Detailed guidelines can be obtained from the appropriate SIIP partners.
• Open architecture which can integrate easily third parties’ speech analytics engines
The SIIP architecture was established relatively early in the project. The fusion engines were provided by various SIIP partners, requiring the ability to integrate analytic engines in multiple formats and with different backgrounds. This principle has been extended to enable LEAs to also use third party (i.e. non SIIP partner) analytic engines which may further improve the performance of the SIIP platform.
• Large variety of state of the art speech analytics engines (many vendors, different algorithms) – all fused and integrated into one SIIP platform
The various SIIP partners provided a number of different analytic tools. The true strenght of the SIIP platform comes from the abilty to mix and match these various engines into one very powerful identification solution. The system can either assign weightings to the
• Robust collection
The SIIP project developed tools so simplify work with social-media (unique), integrating 5 different social networks (YouTube, Twitter, google+, LinkedIn, Facebook) as well as geolocation and keyword search in YouTube. Furthermore an LI simulator (Cellular, SATCOM, PSTN) was developed and special care was given to the ability to easily ingest probe audio recorded by sensors. The SIIP solution provides substantial automatic pre-processing of audio to improve the performance of the system.
• Exceptional dissemination
The participation of several LEAs and Interpol ensure extensive dissemination of information to a large variety of LEAs in almost 190 countries. Three major SIIP events were organized with the participation of dozens of users and experts from all over the world. The last field-test (Lyon, November 2017) was used by by end-users to solve real cases. Extensive end-user training was provided, including on-premise training (PJ, UK-Met, CP, Interpol) and a detailed user manual of SIIP. The speakder identification market analysis was analyzed and commercial opportunities are being pursuit by the partners. The SIIP consortium boast more than 20 publications. Many of the results are public and released on the SIIP website.

Potential Impact:
Impact on the LEAs
SIIP has provided a framework to take full advantage of the available legacy systems and new open source media in order to achieve accurate person identification. The system infrastructure necessary for the successful exploitation of such a system, by multiple agencies, was investigated. Technically, the system is operational and could be deployed across Europe at any point in time.
As the borders of countries within the European Union are diminishing, criminal networks now run their operations across borders more than ever. Now that the technological feasibility of SIIP has been proven, the actual full scale implementation of the SIIP technology could be a substantial step towards criminal identification and tracking, and the suggested approach is not restricted by national boundaries.

Whether this will happen is a question for the LEAs (those in SIIP and those outside of the project) that would need to be willing to adopt the technology in the years to come. The SIIP projects proved that infrastructure can be designed so to allow both tactical and operational analysis with cross-agency and cross border utilization.

Economic impact
With the uptake of the integrated SIIP solution being unlikely in the short term, the various partners of the SIIP project have developed individual dissemination and commercialisation plans for the knowledge and technology developed in the SIIP project. The plans vary and ranges from continued academic research to full scale commercialisation of developed technology. From this point of the view, the SIIP project has a positive economic impact.

In terms of employment, the SIIP project generated a series of new jobs and kept existing jobs in the EU. Most of the jobs were for men, although a series of women was included in the project, especially at the universities. The continued research and expected commercialization is expected to generate more jobs although it is hard to estimate numbers at this time.

Impact on the state of the art
The work in SIIP has been ground breaking. Over 20 publications have been published. Espeically in the field of fusion including cross-referencing several sources, metadata and voice recognition engines, significantly enhanced traditional forms of data monitoring. The SIIP research advanced the state of the art in speaker recognition system, as well as in other disciplines including language and gender detection as well as keyword spotting/ speech recognition.
A few example in a bit more detail:
Voice Biometric Algorithms - Most of the conventional SID systems perform fusion on one-level (i.e. score or decision level). The core technology of SIIP is fusion of multiple voice recognition engines, producing unique final scores, enabling a breakthrough suspect identification solution. The SIIP F-SID (Fused Speaker Identification) was based on effective fusion of Speaker Recognition engine. Fuzzy decision based approaches were used in order to combine different scores or decisions.
Voiceprints recognition - The current type of speaker identification was based primarily on ISO IEC 19795 biometric performance testing standards. The problem is, that biometric testing standards focus on testing of physiological biometric modalities as opposed to modalities such as voice that incorporate behavioural elements. These state of the art in performance testing are only useful for initial assessment of speaker identification algorithms. The SIIP approach proposed new solutions towards high performance innovative SID which go beyond existing state-of-the-art technology especially in the way of integrating and fusing series of individual processing modules exploiting not only pure acoustic features but also exploiting the lexical content of the input data and an additional information extracted from social media. This proposed approach addressed accuracy (such as genuine and impostor error rates) as a function of database size and sample complexity, thus maximize controlled measurement of the impact of contextual data such as location, speech content, time, and location on performance and improving the potential utility and admissibility of speaker identification at the EC level.
Gender, Language, and Accent recognition – the SIIP consortium focused on automatic extraction of additional information from acoustic signal, besides speaker characteristics, carrying potentially relevant information about a suspect.
Keyword and Taxonomy spotting - Available speech recognizing tools included fixed vocabulary and were/are not easily modified by the user.SIIP speech recognition system provided to the end-user a much higher level of flexibility, including fast and easy integration of terminology into the speech-recognition engine, enabling to include words and phrases typically employed by suspects, in a dynamic manner.

Added Value at the European level
The technologies involved in creating SIIP’s integrated comprehensive criminal identification system brought European academia and business to the forefront of technology and solutions development in this field. SIIP demonstrated how to close the loop between human factors, operations, design, development and implementation. This approach can be used as a template for future research of this kind and in this domain. Especially the world-wide reach to the 190 members INTERPOL is a unique position to support the cooperation in operations which high sensitivity as are the operations supported by the SIIP technology.

The main dissemination activities and exploitation of results
The SIIP project advanced the research towards countering crime and terrorism through significant improvements in the ability of LEAs to identify speakers based on audio material. The results provided by the project needed to be thoroughly disseminated through the LEA communities and EU governments in order to ensure the success of the work. A number of activities was undertaken by the project partners to guarantee that the project results are properly disseminated. This included the creation and maintenance of a web site, participation in a series of conferences and external publications, organisation of a SIIP End-Users forum, the preparation of brochures, and the creation of a project video. The geographic distribution of the SIIP partners in Europe as well as the international exposure provided by INTERPOL was also a significant added value to ensure that dissemination activities reach the necessary people within the European and International global crime fighting and anti-terrorism communities.
The dissemination activities including the publication in articles, workshops and websites were monitored by the Internal Security Assessment and Ethics Committee (ISAEC) that was set up in the consortium in order to prevent any possible dual use of the project knowledge and results.