Skip to main content
European Commission logo print header

Understanding Videos Automatically with the SensifAI Deep Learning Technology

Periodic Reporting for period 1 - SensifAI (Understanding Videos Automatically with the SensifAI Deep Learning Technology)

Reporting period: 2018-12-01 to 2019-05-31

Google has a value of $1 trillion because it has managed to make texts searchable. However, 80% of internet traffic is videos, audios, and images and they are not searchable. Making videos searchable is extremely challenging. This is why most of the video tagging is done manually and the results in automated video recognition are still limited. Mobile video recognition is also starting to emerge.
SensifAI has developed a cutting-edge audio-visual deep-learning technology trained on millions of videos to recognize audio and video content and to tag them accurately. SensifAI automatically tags videos, images, and audio, which makes them searchable and can be customized for a range of use cases. We believe our approach to contextual video analysis is unique and on the leading edge as it recognizes, scenes, actions, celebrities, landmarks, logos, music genre, moods and emotion, and speech. SensifAI delivers the video recognition technology on the cloud on the Amazon Web Services Marketplace and can be embedded on devices such as smartphones (by OEM’s).
Our software just became available on the Amazon Web Services Marketplace where we follow a unit-based pricing model ranging from €0.01/minute for recognizing landmark images/objects/celebrities/unsafe contents to €0.05/minute for general tagging and action/sport recognition. Our customers are different OEM’s, broadcasting and media companies and anyone who needs searchable videos. The most important types of suppliers are hosting and data providers. Our current cloud service users include multimedia companies, robotics companies, broadcasters, video-sharing websites and our customers for embedded software in the device are major smartphone manufacturers.
SensifAI bvba was founded by three alumni and scientists from MIT, ETH Zurich, and KU Leuven, who acquired an accumulative experience in audio-visual data processing through involvement in many international projects. The leadership team met each other in KU Leuven Centre for processing speech and images and have been collaborating for some years. Therefore, they know each other very well due to working in the same research lab before starting SensifAI.
Imagine a day when the 30 million visually impaired Europeans use a wearable camera equipped with a software describing them the surrounding environment automatically by recognizing the semantic concept of the captured video. It includes the description of the scene, objects, and activities. Similarly, imaging a technology when the 119 million aurally impaired people use a wearable microphone equipped with a software describing them the surrounding environmental sounds automatically through recognizing the semantic concept of the captured sound. This day is closer with the launch of SensifAI.
During phase 1 of the SME Instrument, we had a huge technological and commercial improvement. We became a launch partner of Amazon Sagemaker platform and released 17 different models for customizable video/audio/image recognition. We also added many new and attractive features to our cloud-based video recognition system on the Amazon Sagemaker platform. During this project, more than 100 users have subscribed to our platform through Amazon AWS and Amazon supported us by providing free servers to run a live video recognition demo 24/7.
SensifAI also deeply studied and assessed the market of video recognition, analyzed the competitive scene and negotiated with many potential customers including Samsung, VRT, Skyline, LG, Qualcomm, and MediaTek. In this project, business potentials in different directions have been extensively assessed and analyzed. We identified a huge market potential of the on-device artificial intelligence for smartphones based on the emergence of Neural Processing Unit (NPUs) chipsets which were recently invented by three giants Qualcomm, Huawei and MediaTek. Based on this finding, we further focused on this sector, developed the required know-how, and launched the world’s first all on-device, deep-neural net, and real-time video recognition app for smartphones. This pioneer a new paradigm in artificial intelligence which unlocks a huge market by preserving the users’ privacy and delivering truly realtime systems without any latency. We also forged several partnerships with the main AI chipset manufacturers companies and access to the latest AI chipsets before they are even in the market. Building on this success, we started working on a B2C app for addressing the need of the users to make their video/image recordings searchable which helps them to find the image/video of interest by searching for keywords such as “Birthdate cake of Sara” or “Christmas party with John”.

Exploitation Dissemination:

1- Social Media activities
a) We had less than 200 LinkedIn followers at the beginning of the project and we have more than 2000 now.
b) We had less than 200 Twitter followers at the beginning of the project and we have more than 1000 now.
c) Our Chrunchbase rank was around 90000 at the beginning of the project and it is now 6000.
2- Press Release
a) We had more than 6 press releases during the project. The most important ones are getting featured in AWS blog and Huawei Global Community, and Mediaroad. We are about to release 20 more PRs in international Tech news and weblogs in the during the next two weeks.
3- Live Demo:
We launched a live demo 24/7 which attracted many users as they can easily upload their own pictures and videos and see the result of our video tagging system for their multimedia content easily. https://demo.sensifai.com
4- Attending Conferences and Exhibitions
a) World Mobile Congress 2019, Barcelona
b) EIC Corporate Day: Deep Tech on display – Showcasing innovation to P&G and Partners including LG and Nokia, Brussels(We won the best pitch award of the session)
c) Amazon Tech Summit, Amsterdam
5- SEO
At the beginning of the project, users cloud not find us in the first 100 hits when searching for keywords like “Video Recognition API” in Google. We are now among the top 10 hits in Google, and Bing.
SensifAI Added Value Beyond the State-of-the-Art: SensifAI is the only video analytics platform that incorporates both audio data as well as visual data interactively and simultaneously while all the other competitors focus on visual data only and ignore the importance of audio data. SensifAI developed a unique deep learning platform which mimics how our brain works and comprehend the audio-visual data simultaneously. This is why sensifAI is able to recognize all types of things a human would: objects (e.g. pen, TV, laptop). scenes (e.g. indoor, park, beach), actions (e.g. dancing, blowing candle, doing sport), celebrities (e.g. Cristiano Ronaldo, Angelina Jolie), landmarks (e.g. Eiffel Tower, Statue of Liberty), logos (e.g. logo of Sony, Huawei), music genre, moods and emotion and speech.
Unique Paradigm Change in Contextual Video Analysis: We believe SensifAI’s approach to contextual video analysis is unique and on the leading edge and could represent a future paradigm of the next-generation of search. By increasing the number of concepts recognized and together with the machine learning algorithms, SensifAI will eventually recognize anything a human would as well.
Innovation 1: Using audio data and visual data interactively and simultaneously for the first time in the world: SensifAI developed a deep learning platform that incorporates both audio data as well as visual data interactively and simultaneously for video understanding which is inspired from the human brain audio-visual data analysis. This increases the high-level video concept recognition accuracy significantly.
Innovation 2: Embedding the video recognition software in the device for the first time in the world: the video recognition software is working based on deep-learning models which are very computationally demanding and required supercomputers equipped with Graphical Processing Units (GPU). Therefore, although there is very high market demand, it is very challenging to embed those deep-learning models in a standalone device. SensifAI for the first time in the world developed very compact deep-learning models and embedded them in smartphones successfully with great accuracy.
Innovation 3: General Image Recognition Software: We have developed a general image recognition software, which finds objects and scenes of any given image.
Innovation 2: Action Classification Software: We have developed an action classification software that recognizes important actions such as kissing, fighting, drinking, dancing, etc.
Innovation 3: Keyword Spotting System: We have developed a keyword spotting system in human voice segments of the given video to recognize the most important information conveyed in the speech segments of the video.
Innovation 4: Music Emotion Recognition System: We have developed a music emotion recognition system to recognize the mood of the given video.
Innovation 5: Pipeline for Video Tagging: We Thousands of concepts are already covered and more and more concepts will be covered on a daily basis. Our product fine-tunes itself due to observing more and more examples.
Innovation 6: Human Face Analysis Pipeline: Is capable of detecting human faces and recognizing celebrities.

User Needs and Benefits and USPs
Assistive Technologies: SensifAI can revolutionize assistive technologies such that visually or aurally impaired people can get information from their surrounding area through using wearable cameras and microphones. It drastically improves their living standard and helps them in their daily lives, e.g. avoiding possible dangers such as a car accident.
Multimedia: The volume of personal or online videos archives is skyrocketing due to recent technological advances capturing digital videos. Therefore, there is a growing demand for indexing, searching and summarization software and solutions to make efficient use of video information. Despite many technological advances in video editing/handling software, a suitable technology has not been developed yet due to its numerous technological challenges. To extract semantic tags matching and satisfying the human’s perception and description, SensifAI employs the state-of-the-art deep-learning methods over both audio and visual data of millions of hours of videos.
Broadcasting companies, video-sharing and music-sharing websites and indirectly their users benefit from this technology. For example, our technology can be used to develop (I) content-aware video search engines, (II) concept-aware video recommender systems, (III) personalized video summarization, (IV) monitoring to avoid illegal video contents, (V) concept-based video advertisement.
Security: Security professionals usually have to monitor live security videos constantly in public places, e.g. metros and airports, to prediction threats and alert in the case of problems such as abnormal human activity, unattended bags, gun shooting, etc. SensifAI can be used to automatically identify abnormal human behavior, and dangerous objects such as a gun, knife, fire, etc. and predict the threats and provide alerts.
Robotics: Natural human-machine interaction is required in many applications such as social robotics. However, the ability to recognize the surrounding area, e.g. objects, scenes, and sounds, is still the most challenging problem to reach this goal. SensifAI AVSR technology paves the way to develop robots interacting with a human in a natural manner.
Education: Universities and e-learning content providers usually deal with large archives of educational and documentary videos. SensifAI can help to manage and handle these archives by providing fast concept-based search and video summarization and retrieval.
Publications