Skip to main content
European Commission logo print header

Depth Sensing Systems for People Safety

Final Report Summary - D-SENS (Depth Sensing Systems for People Safety)

Executive Summary:
Due to recent technological developments in the visual depth sensing domain and driven by the available computational performance new perspectives are emerging for creating vision-based solutions which enhance human safety aspects. Human safety plays a central role in many fields of our daily life. In the D-SenS project (Depth Sensing Systems for People Safety) we brought state-of-the-art technology know-how and thorough understanding of end-user and stakeholder requirements together by engaging scientists and small-medium enterprises in joint technology development.

The main motivation of the D-SenS project was given by the fact that vision-based scene and object analysis in many application domains is still far from being reliable, thus remaining inapplicable for practical use cases. The main complexity of the underlying visual analysis task stems from the fact that image content is ambiguous by objective measures and it comprises a daunting variability. This prevailing ambiguity is one of the main motivating factors why 2D image information (RGB) and 3D depth data (D) are combined. These highly de-correlated information channels remove numerous geometric (e.g. scale) and photo-metric (e.g. shadows and illumination variations) ambiguities and therefore they are essential to meet the D-SenS’s research goals.

At the technology level the D-SenS project aimed at accomplishing practically relevant depth sensing vision systems around selected, previously unsolved problems which significantly impact human safety aspects. Given the heterogeneity of partner profiles and pursued goals across the project partners, one primary objective was to partition and link potentially marketable task-oriented ideas and available know-how in order to establish multiple actively networking teams within the project. These teams were essential to drive development from initial concepts towards prototypical solutions along a shortest path.

In the D-SenS project following research results have been accomplished: We addressed a set of novel visual monitoring applications for people safety, ranging from detecting and tracking humans in retail and public infrastructures, ambient intelligent and road environments to scenarios of left item and intrusion detection. Our solutions elaborated within the D-SenS project combine the “best of two worlds” (high-quality depth and intensity data, and modern vision algorithms) and enable the development of robust vision-based safety systems. Robustness in this context stems from the fact that the vision system operates on highly reliable data representing true scene geometry and remaining invariant with respect to photo-metric variations. In the project this increased robustness has been shown to yield meaningful high-level concepts (humans, left-items, intruders, scene model, motion patterns and activity) at a quality which seems to be adequate for practical use as demonstrated by the five use case prototypes tested and demoed in the application domains of smart buildings, assisted living and security.

An important vehicle towards efficiently reaching our task-oriented goals was to create a technology framework, the so-called Common Framework which shares common functionalities among vision system prototypes and enhances further development of system components and applications. The functionality of the Common Framework has been validated by the developed pilot applications in the selected applied domains and also joint demo sessions of a tutorial character were organised where consortium partners learned about the algorithmic mechanism, advantages and limitations of the created prototypes. We are confident that the project results and the Common Framework form a sustainable and easy-to-extend technology basis for commercial exploitation, long-term co-operations and additional project concepts even beyond the D-SenS project.

Project Context and Objectives:
All applied tasks targeted in the project involve new, emerging application scenarios; therefore prior to algorithmic development a thorough analysis has been carried out. The analysis involved aspects formulated in a use-case and marketing context: commercial potential, requirements, ease-of-use of deployment and operation, and probable limitations in a practical setting. At the same time the scientific and technology aspects were investigated: is the posed vision task feasible; are there probable limitations when applying the technology. The targeted pilot applications were decomposed into share-able functional units mapped to various levels of visual data processing within the Common Framework. The obtained three levels of abstraction (practical tasks, functional units and sensory data) are illustrated in Figure 1, created such that a maximum synergy between the pursued people safety applications can be accomplished.

The Common Framework library is an important result of the project which gathers and partitions camera and depth/intensity data processing algorithms into a unified software library. The jointly performed Common Framework design has identified various layers of information processing such as calibration, analysis and interfacing, and many recurring functional units are shared among the individual use case prototypes. Beside the established processing modules, the Common Framework specification includes also common data formats, interface definitions and software implementation practice guidelines. The entire code base was jointly managed and version controlled throughout the project, along with small sample data sets, exemplary processing chains illustrating a minimal solution of a problem for easy understanding and corresponding documentation.

Following focus areas were selected for use case definition, scientific investigation and implementation:
• People tracking and detection in smart surroundings,
• Intrusion detection within a spatial volume,
• People behavior analysis,
• Object tracking and monitoring,
• 3D scene reconstruction and analysis from a moving platform,
• Traffic flow monitoring and
• Crowd management.

In light of the defined focus areas and use cases a review on the available technology (concepts, methodologies, and sensors) was performed and documented. The results of the review provided additional information for the conceptual design of the Common Framework, and lead to a set of detailed focus-area-specific requirements in terms of functionality, detection accuracy, system and hardware, context and operation environment, and interfaces as well as business-related requirements.
Gathering visual data in an early phase of the project was essential to assess the complexity of the given practical task. Data collection with depth sensors and stereo cameras was performed for guiding the development towards realistic scenarios. Depth sensing sensor setups (passive and active stereo) and acquisition software were distributed among the project partners, and a substantial amount (several hours) of data has been collected for each use case scenario. Data (time-varying data of depth, intensity and scene-specific priors) acquired in applied settings at the company partners beneficially complemented the laboratory data used at the scientific partners for elaborating solutions working under a large set of diverse conditions.

Five use case implementations were created to meet the commercial requirements of the participating SME’s and demonstrating the Common Framework’s flexibility and re-usability. Next, we provide a concise overview on the current status of accomplished demonstrators:

People detection and tracking in smart surroundings: By employing depth data from two sensor types (MS Kinect/Kinect2 and a passive stereo camera setup) a real-time operational prototype has been elaborated performing human detection and tracking in indoor environments. Accurately quantifying indoor human demography in terms of number, location, motion patterns and dynamics provides information of great value in the retail, crowd management and safe infrastructures domains. Figure 2 shows some results of the human detection and tracking application.

Detecting intrusion within predefined spatial volumes: Data representing time-varying spatial geometry is well suited for an analysis with respect to the presence or absence of arbitrary objects of sufficient size within predefined volume elements. Such a visual analytic functionality significantly impacts applied fields in safety (safe zones around machines) and security (intrusion detection for preventing theft). Figure 3 shows some result of the real-time intrusion detection prototype, capable of using two sensors observing the same intrusion volume, thus rendering the detection process substantially more robust with respect to occlusions or false alarms.

Robust fall detection in indoor environments: Analysis of dynamic events and surrounding indoor scenes generates probabilistic alarms in the event of certain spatio-temporal patterns (fall, body pose) in this prototype. This application has a great practical impact in ambient assisted living scenarios (monitoring the elderly, children or chronically ill patients). Figure 4 displays a sample result for the awareness of human presence in the fall detection prototype.

Reliable left-item detection in public spaces: Accurate detection of changes in specific crowded environments (subways, airports and other public spaces) is anticipated to enable the highly relevant security task: detecting left-items and their status (owner nearby or abandoned) in presence of clutter and occlusions. Figure 5 illustrates some results of the left-item detection prototype.

Depth-aware road surveillance on a mobile platform: Future generation of vehicles are increasingly able to assess traffic and road conditions in front or around them by automated means. A specific high-resolution wide-baseline camera setup (Figure 6) has been developed and integrated into a road monitoring prototype targeting the detection of traffic participants up to large distances (less than 100 m), and assessing the road context from RGBD data, such as delineating rail tracks and thus assessing collision hazard prior to any incident.

Nearly all of these prototypes rely on privacy-preserving internal representations (point cloud, shape segments, geometry), therefore providing added support for a deployment in privacy-sensitive public (e.g. subway) or private (elderly home) scenarios.

Facilitating the take-up of results: Due to operating at large scale (geographic location, diverse scopes and background) within the project, the scientific and enterprise teams were faced with novel challenges: (i) how to integrate different research results into individual solutions which result in commercial benefits and (ii) how to establish a common understanding across viewpoints of technical and task-oriented nature. Partitioning on task-level (use cases) and on functionality-level (Common Framework) have proved to be a viable solution for creating actively engaged research-enterprise teams which share know-how with others. Organizationally, research was done in situ at the scientific partners, but as soon as a first framework became available, the development turned into an iterative process, bringing practical insights (verification, demonstration, end-user feedback) from the SME’s back into the research process.
To achieve a collaborative integration of research and enterprise teams within D-SenS there was an important focus on Knowledge transfer and Knowledge management including the protection of IPR’s. The consortium employed following means and channels to accomplish these goals:
• Knowledge transfer and Training for SME’s: regular telephone conferences between all consortium partners, scientific discussions between research institutes and research counselling and visits at the SME’s have facilitated the joint design, analysis and modification of algorithms and frameworks. Additional 3-4 annual face-to-face workshops were held to present technical progress, to revise development plans and demonstrate achieved results.
• Knowledge management: secure document and software configuration management frameworks (Twiki, Git) were employed to share knowledge and provide organizational flexibility so that we could support multiple use-case projects with shared functionalities and active engagement from all partners.
• IPR protection: the consortium has agreed on legal principles governing the usage and right on previously existing and jointly developed know-how. The consortium has managed to achieve openness and flexibility posing no restrictions on the flow of ideas to support technical innovations. There were also emerging co-operations between SME’s targeting joint business cases.
• Dissemination and Exploitation: based on the accomplished use case results a D-SenS-Video has been prepared highlighting the technical innovations achieved in the developed technology prototypes and supporting commercialization efforts. The project website at hosts demonstrations for accomplishment results. Dissemination at a scientific level (publications, tutorials, shared datasets) were also considered to be essential in order to generate valuable feedback from the scientific community, to produce further seeds of collaboration and to educate future staff. Accordingly, a number of scientific papers and presentation have been elaborated and presented at high-quality forums.

Project Results:

D-SenS Common Framework

The D-SenS Common Framework is a specific set of modular algorithms to support efficient research and development of visual analytics algorithms in the D-SenS project, and most importantly to transfer developed vision technologies to the involved SME partners.
The D-SenS Common Framework provides depth data analytics functionalities and development environment for creating analytics applications, both creating new business opportunities for the consortium SME partners and enhancing co-operation between the SMEs. During the D-SenS project, the Common Framework was used for implementing the D-SenS use case application prototypes and thus acted as a common code base for sharing and re-using the developed software modules.
Based on the elaborated Common Framework (CF) tools and the use-case prototypes making use of CF functionalities, the SME’s will be able to quickly create complex and powerful products and services, while having well-documented state-of-the art Computer Vision functionalities and RTD know-how at their disposal.
The development of the Common Framework started with requirement analysis and specification design, which led to the actual implementation in later phase of the project. The D-SenS Common Framework adopts a modular layered approach (Figure 7).

The sensing layer is responsible of connecting different sensors in the system. The sensing layer provides an abstraction layer to the calibration layer so different sensors provide similar interface to the applications making addition of new sensors to the framework simpler. For each sensor a sensor specific wrapper module is required.
The calibration layer implements tools and functions for setting geometry of the sensor compared to the monitored area and registering the view of different sensors into a unified co-ordinate system. Calibration tools include graphical tools to define the geometry from sensor image as well as tools to detect surfaces for providing semi-automatic calibration tools.
Analysis is the most important and largest layer in the D-SenS Common Framework. Analysis layer contains 25 modules that represent 50% of all modules in the framework. Analysis provides tools for segmentation, background modelling, object detection and tracking, point cloud processing, pattern recognition, object classification and human behavior statistics.
Event generation takes classification results from analysis layer and it can create events by combining and interpreting information from the analysis results. Different focus areas have defined different events and the common framework provides a basis for creating different varieties of events.
The communication layer provides data communication interfaces for submitting events or transferring Common Framework internal data between distributed nodes.

The Common Framework was implemented with C/C++ for different CPU platform targets depending on the requirements set by use cases.
• Main supported Linux platform is Ubuntu 12.10 running Intel Corei3 or better for high performance full reference target.
• Windows platform is XP or newer running Intel Corei3 or better for high performance full reference target.
• Low power targets for the Framework for ARM Cortex A8 or better or passive cooled Intel Atom processor. Low power target is use case dependent subset of the framework that does not implement all processing modules of the Common Framework.

For supporting modularity and code re-use, following data formats were agreed for connecting framework modules. Control events, configuration and external communication was found to be use case application dependent, thus no common formats were defined for them.
• Point cloud data and registered depth maps: corresponding data formats from PCL library
• Intensity (visual) data: OpenCV image data format
• 3D reference model data: corresponding data formats from PCL
• Control, event etc. formats: defined for each specific application
• Configuration parameters XML: defined for each specific application
• External communication data formats and protocols: defined for each specific application
The Common Framework modules have been used to create specific targeted application prototypes for the D-SenS application domains.

Use case A: People detection and tracking
The main purpose of this use case was to develop a general-purpose people tracking framework. The primary test environment was office and public spaces, especially targeted towards open spaces with a notable people flow, such as lobbies.

The Use Case was specified as:

• Detecting of people flow within the determined area.
• People flow consists of a flow of individual person(s) or a group of persons.
• Detecting and counting persons entering and leaving the area.
• Reporting the events.

Many people tracking products that are already available in the market are limited to a fixed camera pose, i.e. vertically mounted on the ceiling. The people tracking methods that were developed for D-SenS are based on plan-view projections: height map and occupancy map. They enable an arbitrary camera pose, making the method more flexible in various spaces. For example, camera coverage is extended in rooms with a low ceiling. Also, these algorithms are robust even with noisy data and highly variable install environments. Finally, camera calibration using depth sensing is much less cumbersome compared to traditional RGB cameras.

There are multiple outcomes of the project:

1. A people tracking framework that can be utilized in many applications. The framework contains multiple algorithms that are needed in most people tracking applications; therefore it makes further application development much faster.
2. Interfaces and recording tools to easily obtain and efficiently store data from multiple cameras, namely Primesense depth sensor and Bumblebee 2 stereo camera. A custom image compression method was developed to store depth data, using only a fraction of the original file size.
3. A highly robust Bayesian tracking algorithm based on occupancy and height maps.
4. An embedded, low-cost people tracking unit, consisting of a depth sensor and a miniature computer. A height map -based algorithm is optimized to run on a low-power ARM processor in real time. Features an automatic pose calibration, to make device installation effortless.
5. A client-server architecture based people flow counting -demonstration application.

Experiments and setup

D-SenS algorithms under Smart Building- business area, were tested against the approved specifications, and against the performance requirements. Several measurement sessions in the applicable contexts were completed. Testing procedures were completed at the demonstration and customers’ premises, and the testing results were also reported and reviewed with the customers.

Also evaluation scenario was defined at an indoor location (UrbanMill) in Espoo, Finland. This scenario offers realistic situations in terms of clutter and human density.
Figure 8 illustrates typical environment and system set up and figure 9 illustrates a screenshot of the demonstrator program running in Urban Mill lobby.

Use case B: Intrusion detection within a volume
In this use case the goal was to implement a practical system for detecting volumetric intrusion in a predefined restricted area by using a depth-based camera. Intrusion detection techniques (e.g. person-machine collision prevention, off-limits area observation, etc.) are important monitoring activities that are useful in establishing safe and secure environments.
The proposed intrusion detection system performs intruder tracking from a distance without any human intervention. The system defines a virtual 3D shield around the asset that has to be protected, thus delimiting the protected boundaries in all three dimensions.
One peculiarity of this use case is the user interface, which is necessary to define the 3D area to be protected. This feature has been implemented with a graphical tool, enabling the user to create a shape (i.e. a “virtual cage”) in the 3D space detected by the sensors, and to adapt the virtual cage - changing its size and position.
The use of a single sensor (and a single cage) was decided for limiting the scope of the use case. In a real application it is expected that multiple sensors will be required in order to limit the occluded areas in the scene, and multiple virtual shields may be necessary for the protection of multiple assets.
The use case B accepts multiple types of sensors, complying thus with the “sensor independent” approach implemented in the D-SenS framework and allowing the exploitation of the strengths of each technology.

Intrusion detection functionality is performed by detecting the location of the foreground objects relative to the prohibited volume and subsequently by employing a connected components labelling approach for identifying and visualizing the group of intruding objects.
Intrusion detection rate delivered on a modern PC (i7-2600 CPU @ 3.40GHz 4 GB RAM) for the scenario using passive stereo vision was approximately 6 fps (with an image resolution of 1280x1024 pixels). For the active sensor case (image resolution of 640x480 pixels) we obtained a detection rate of 11fps.
The great advantage of this solution is easiness of setup and flexibility. In comparison with alternative asset protection options (e.g. IR barriers, physical glass barriers, etc.), this solution is particularly suitable for the temporary protection of assets, e.g. in a temporary gallery, where the time for the setup of the system is crucial.

Besides proving to be a powerful tool for detecting intrusions, this system is also suitable for applications that require a protection of the privacy. As depth cameras can be configured to provide only the required depth data, all the intermediate intensity images can be easily discarded after the generation of depth images.

Use case C: Fall detection
In this use case a system for detecting and tracking people using off-the-shelf depth sensors and evaluating their behavior in order to detect fall events was developed. There is a growing need for automatic fall detection systems as the population in developed countries is aging rapidly all the while the staff and funds available per patient in, for example, nursing homes and hospitals are declining. These automated fall detection systems are required, in addition to being economical to purchase, install and maintain, to be able to reliably detect various types of fall events. Simultaneously, the system is required to minimize false alarms and thus the burden on the human staff who gets alerted whenever an event is detected, to not violate people’s privacy even if the sensors were installed to such locations as bathrooms, and to function adequately regardless of the level of illumination in the scene.
The aim of use case C: Fall detection is to detect reliably falls in home-like environments. The software is designed to work in relatively static indoor environments (changes in e.g. room order are possible, but large crowds decrease detection accuracy) with pre-defined observation area of about 5 m*5 m managing the typical lighting conditions.
The Fall detector application aims at meeting real-life demands by being able to with high accuracy detect fall events starting from various positions
• standing
• sitting
• lying down
The application also should function properly regardless of speed and direction of movement, changes in illumination and reasonable level of occlusion. In addition the system should be easy to install and start up in short amount of time.

Experiments and setup
The system was tested using the video and depth recordings by Advansee (Figure 10). The recordings were made in a typical living room and bathroom/toilet environment of a nursing home. They contain simulated normal behavior with 1-3 people at a time, including cleaning, moving furniture, walking around, sitting or lying down, kneeling to tie shoelaces, using the toilet/shower, etc. Mixed with this material are series of simulated fall events, ranging from fast, slipping-down type falls to slow collapses starting from a sitting position, or even rolling down from lying on the bed and disappearing behind it completely.

The fall event detector works as a two-phase system (Figure 11). First, a fast movement speed and/or sudden drop in tracked object’s height triggers a “mild” fall suspicion. Using the tracked object’s movement history, a SVM classifier determines for each frame, whether the current situation merits a “serious” tag or not. When the fall event is elevated to a “serious” class, the fall alarm countdown is activated. When the fall event returns to “mild” class, the countdown is paused. When in “mild” class, the object’s behavior is checked for any signs of recovery, which would merit cancelling the countdown and stopping the classifier.

The results from the data recorded by Advansee are very promising. The system can reliably raise an alarm on “simple” falls, and is able to detect even the “tricky” falls designed to pinpoint the weak points of the system at a considerably high rate. Most of the failures were in fact caused by clutter in the scene, by multiple simultaneously tracked objects after, for example, the actors moved around a mattress in the scene and then fell on top of it. It would seem further improvement could be cleaved through improving the human detection and tracking modules, not by tinkering with the fall detector itself.

Use case D: Left item detection
The Left item detection use case covers the development and evaluation activities of a novel left item detection approach which combines depth and intensity information to reliably detect and segment left object candidates in scenes with observation zones of up to 10 by 10 meters in size. Unattended luggage detection poses an important security threat in transport and other critical infrastructures. Abandoned or left luggage detection represents also a key problem to surveillance personnel since accurate detection of rare abandonment events embedded into a cluttered environment is beyond the capability of a human observer.
The developed left item detection framework combines:
• Depth based detection of the stationary object candidates examining geometry elements newly introduced into a scene.
• Spatial location of the candidates with respect to a common ground plane (on it, above it).
• Object geometric properties (Volume, Size)

The combination of these specific attributes results in an improved left item detector in terms of increased robustness against illumination variations, ability to discriminate between dropped-off and removed objects, and ability of distinguishing between items and humans.

A possible definition of a left item can be given as a “non-human” object previously introduced and left in a scene and remaining still over a certain period of time. The detection task involves the left item and human (owner) detection problems, and the characterization of the spatio-temporal relationship between object and owner. Figure 12 illustrates the detection results at the evaluation site.

Experiments and setup
Two evaluation scenarios were defined at an indoor location (UrbanMill) in Espoo, Finland (Figure 12) , and at the outdoor platform of the Roma Ostiense Train Station in Italy. Both scenarios offer realistic situations in terms of clutter, human density and variable conditions in terms of scene objects and illumination.

For both evaluation setups we used our in-house developed AIT sensor to extract intensity information, employing a passive canonical stereo setup (three monochrome cameras mounted in parallel), with a baseline of 0.4m between the two cameras located at the ends of the rig.

The second evaluation site was set to an outdoor platform at the Roma Ostiense Train Station in Italy. In collaboration with the Italian railway company Secom have managed to deploy the left luggage test system directly on a train platform. With our setup we were able to record about six hours of video comprising of many situations ranging from an empty to a crowded platform, with trains stopping and leaving. The experimental setting was especially adequate for assessing the robustness of the left item detection software in an open environment.

The set-up experimental scenarios along with existing in-lab scenarios have covered many complex scenarios and valuable insights could be gained during the experiments. Most of these insights were used to improve the demonstrator, nevertheless, certain aspects, such as the noise inherently present in the input data, could not be mitigated, only considered in algorithmic ways by employing noise-resilient, noise-filtering and robust statistical techniques capable to cope with outliers. The detection results from Urban Mill in Figure 14. The detection results from railway station in Figure 15.

Use case E: Long-range image analysis from moving vehicles
In this use case, two applications have been investigated in the context of long-range image analysis from moving vehicles. The first focuses on change detection (for surveillance) and the second on long-range obstacle detections for intelligent vehicles. Both are summarized including some experimental results.
E1: Image analysis for surveillance vehicles
This use case explores and evaluates the benefit of depth sensing in the context of automatic change detection. This technology allows governmental agencies to efficiently monitor infrastructural assets, for example, detecting missing traffic signs and street ornaments due to theft or vandalism. The existing monocular change detection system of ViNotion was extended with depth-sensing capabilities by adding a second camera in a fixed stereo setup. The proposed depth-sensing extension alleviates the main limitations of the monocular change detection and contributes to a more effective and reliable system, especially in urban environments.
To ensure realistic testing, a custom-build stereo camera has been mounted on top of a car. This camera consists of two 20 Megapixel Ampleye Nox-20 UHD cameras with a variable baseline of up to 1.5 meter. With this setup, videos were recorded while driving through the cities of Amsterdam and Rotterdam. These videos were used for testing and validation of the system.
The goal of use case E1 is to improve upon the monocular change detection system using depth sensing. In short, the following improvements have been obtained:
• Improved (temporal) registration
Using the 3D geometry of the scene it is possible to more accurately register two images acquired during two different recording sessions, e.g. where one session was recorded hours or even days earlier. It was found in our experiments that the improvement in alignment accuracy is significant. In almost all evaluated video frames, the alignment has improved. This has resulted in improvements in the consecutive image processing stages, clearly visible in the final change detection results. An example is shown in Figure 16. This figure shows two images, representing the same scene at a different time of the day, as well as the changes detected with a monocular system and one extended with depth sensing. The novel use of 3D geometry clearly improves the alignment of the road markings, as can be seen in the right image of Figure 16.
• Separation of physical changes and shadows
By exploiting knowledge that shadows do not affect the depth of the scene, the majority of the (undesired) changes caused by shadows was rejected.

E2: Collision avoidance for intelligent vehicles
This use case considers an Advanced Driver Assist System (ADAS) that aims at warning drivers for potential collisions. The selected context is public transportation, or, more specifically, a tram in an urban environment, since the potential benefit is big and there are specific potential customers for this application area. In contrast to commercially available automotive vision approaches, the newly developed system has a large operational range of up to 100 meters.
Within this collision warning system, depth-sensing is exploited to distinguish obstacles from the ground-surface model. To this end, a custom-build stereo camera is mounted on top of a tram. The processing chain consists of online camera calibration, depth map computation, spline-fitting of the ground-surface model, obstacle segmentation, and, in parallel, tramway rail extraction.
To ensure realistic testing, the stereo camera, as discussed in use case E1, has been mounted on top of a tram. In cooperation with the Amsterdam tram company (GVB), videos were recorded when driving the tram through the city of Amsterdam. These videos were used for testing and validation of the system.
The main achievements of this use case are briefly summarized below:
• Online camera calibration
Due to the large baseline (1.5 m) of the stereo camera, regular calibrations cannot be applied to this setup. A new (extrinsic) calibration method, that can be applied while driving, was proposed and implemented. This new method is better suited in practice.
• Ground-surface modelling
Typically, the ground surface in a city is not flat. Even in a relatively flat country as the Netherlands, it was found that the majority of urban roads have (changing) slopes, bumps and curved road surfaces. Especially when looking more than 100 meters ahead, the changing ground surface quickly becomes a problem. To compensate for such varying surface, the 3D ground surface is modelled with mathematical spline functions. This enables to accurately distinguish obstacles from the ground surface.
• Obstacle segmentation
The next step is to isolate obstacles from each other. To this end, a graph-based algorithm is employed that successfully clusters regions, as can be seen in (Figure 17).
• Tramway rail detection
Since a warning should only be given if there is an obstacle on the path of the tram, the tramway rails are detected to analyse the collision zone.

Finally, a few validation results are presented:
• It can be observed that the road region is detected accurately, often also in areas that contain bumps or varying up- or downward slopes. With the spline-fitting method, it is possible to reconstruct the road surface reliably up to at least 75 meters in general.
• For the obstacle detection, three different situations are distinguished:
o On straight, flat roads, the obstacle detection operates practically without error. A total of 3573 obstacles (98.5%) were detected with only 8 false detections. If partial detections are included, a detection rate of 99.9% is achieved.
o In road curves, the road detection sometimes performs sub-optimal. This is due to the fact that the vision in this situation is often blocked by buildings in the curve, resulting in a false model. In experiments with curves, a total of 1047 obstacles (88.5%) were detected, now with 25 false detections. With partial detections, a detection rate of 95.3% is obtained.
o At small hills or bridges in the road, the elevation of the surface itself occludes the view of the downward slope. This is also a difficult case for the ground-surface modelling and inherent to such vision-based systems. The frames with these situations are not processed separately, but are part of the above-mentioned results.

Potential Impact:
Contribution to the expected impacts
The two megatrends - urbanisation and aging population - are major cause of growing investments on safety applications. The future smart cities will, among other things, optimize the visitor management in smart buildings and spaces, reduce traffic congestion, and improve assisted living services. The main objective clearly is to provide fluent services to citizens and, most of all, improve the safety of people. At the same time, security requirements have become tighter in airports, hotels and other public places. The vision based safety systems will become part of the everyday life in an intelligent and inconspicuous fashion.
Frost and Sullivan have estimated a strong worldwide growth rate for video surveillance market revenues at the upcoming years as seen in Figure 18. They also estimate video content analysis (in security) market to reach $623.6 million in 2014 with a CAGR (Compound Annual Growth Rate) of 27.7% from 2007 to 2014. Similarly independent market research agencies as IMS Research forecast the video surveillance market to grow from a value of 2.1 billion dollars (1.5 billion euro) in 2009 to 3.3 billion dollars (2.4 billion euro) in 2015 with a CAGR (compound annual growth rate) of 10%. Frost and Sullivan have also estimated assisted living in Europe alone to reach the markets earned revenues to increase from 154.92 million $ in 2009 to 525.58 million by 2015 [Frost&Sullivan: Advances in digital video surveillance and optical security systems (2009)]
Computer vision technologies are widely available and used commodities within the safety and security business. One way to differentiate the solutions and products in these sectors is to offer more innovative and intelligent video content analytics and use advanced technology that can provide additional value to the end-user.

Results of the D-SenS project, the Common Framework and specific algorithms, will make after the project the developing of depth sensing solutions much faster and more agile. For example the people flow detection, fall detection and behavior analytics are emerging technologies, which are becoming rapidly deployable. With depth analysis the SMEs can provide more cutting-edge solutions on those fields. The SMEs also gained during the project knowledge and technology, which has improved greatly the SMEs competitiveness on video content analytics market.

Common Framework was the core of the project. It provides a solid base for further developments. In case of source code, the D-SenS Git was created and managed for the individual software modules. Especially, coordinated process of software version management is one of the achievements of the project.

Already by the end of the project the results of the D-SenS project have proven to be very valuable in video analytics market. The end-users are very interested in the new use case - applications developed in the project and the estimated sales revenue for the years 2015 – 2017 is evaluated to be over 7 M€.
It is worth noting that the phenomenon of affordable depth sensors and RGB-D cameras is so new that only the recent year’s market watches and reports include it. The gradual change to IP cameras in security solutions has been seen as one of the next steps. The depth sensing might not be the best technology for all applications within the safety and security field, but it is evident that change of this technology becoming applicable has been rapid. The different depth sensing technologies encapsulate a huge market and innovation potential. During the D-SenS project also the commercially available depth sensors have evolved a lot and become more cost-effective. This is due to the development of technology but also to the depth sensor utilization for video game markets.
In the consortium we believe that it is of crucial importance for us to acquire the knowledge of this technology now in order to grow with this market opportunity. Already during the project many market opportunities have occurred, been identified and growing interest for the applications have become clearer.

Economic impact on D-SenS consortium SMEs

Scope of the project:
As a result of the D-SenS project, SMEs have the analysis software that is “ready for the market” or “nearly ready for the market”. The technology has been tested and verified not only in the lab, but also in real pilots, which have allowed agile application development. To obtain full technology transfer the software has been fully documented as Common Framework and an extensive training session has been organised in the end of the project. Also smaller, more specific use-case detailed training sessions have been hold frequently during the entire project. A plan for the next developments of the software has been created as well as a roadmap for the future of the novel video analytics products, both in terms of maintenance and development of new applications. The RTD partners prioritized the work so that the activities were focused on the most promising features.

Synergy and benefits:
The project consortium consists of technology and application developers from different sectors of people safety. Internally the project has created stable synergies for the definition of common framework and knowledge sharing at the common level of software development. The cooperation among the SMEs has continued in the integration phase, testing and maintenance. The consortium SMEs have found collaboration possibilities to amend one’s own offerings with the other partners’ offerings as well as using the existing technology solutions from each other with specific agreements.
By the end of the D-SenS project strong network has been created among the SME-partners. Each partner is willing to provide a customer opportunity to other partner and work in collaboration with each other. The synergies of the results developed in the project can be utilized as the target is to create better application to the customer in faster and more agile way.

Business development value:
Most of the SMEs in D-SenS project have well- established market segments. The consortium collects together innovative SMEs providing leading-edge solutions at the global marketplace. The participating SMEs want to open new frontiers on the computer vision field. As a result of this project, the SMEs got a head-start with the technology, which has been developed new business and will develop in the future.
The RTD partners have provided the research and development capability necessary to achieve the technical results that were in SMEs scope. In D-SenS project RTD partners identified and produced the most valuable software and methodologies and helped SMEs to achieve their goals within a time frame and a quality level that they could not afford alone.

Direct financial impact:
The main goal of D-SenS project was to provide technology for SME partners that will benefit and improve their offerings. By the end of the project SME partners see that the developed technology will have a strong effect on their business growth. Almost all of the SME partners estimate big growth in sales revenue in years 2015 – 2017. The forecast for the sales revenue of the D-SenS applications was altogether 7 335 000 € for this period. Already during the project the applications could be turned to real commercial applications and thus provided SME partners specific customer opportunity or direct growth in sales.

Privacy impact:
The D-SenS project developed depth sensor and video analysis technologies for people safety. In some application areas this can mean counting the number of people or understanding their behavior. The project did not do any biometric identification or recognition or any other activities that will expose the citizen privacy. The D-SenS partners actively followed the EU level and national level discussion of legislation and regulations and implemented ethical review in every country the project was addressed.

Project results and management of intellectual property
The D-SenS partners have agreed to adopt the default position regarding intellectual property, meaning that the SMEs preserve full ownership of all Foreground information (D-SenS results). Most of the Foreground generated by the RTDs with help of the SMEs, and they have been accordingly remunerated.
The main principles of IPR issues in the D-SenS consortium are following:
• We are committed to respect the background IPR of each partner.
• We focus the efforts into a Common Framework and fair sharing of results.
• We have different business models and segments, and competition between the SMEs is not foreseen a dividing issue in the foreseeable future.
The Background information shall remain the property of the nominated provider, whether it is an SME-partner or an RTD-performer.
Most of the IP is created by the RTD partners and there is an absolute confidence between the RTD and SME partners that all IP is reported to the assigned partner. There are confidentiality obligations in the Consortium Agreement to ensure the confidentially between the partners.

Exploitation and dissemination of the results

Our major goal was to create a long-term business co-operation approach. The cooperation among the SMEs will continue beyond the time frame of the project, since all the SMEs share a large part of common knowledge and core libraries that will be needed for the maintenance and the evolution of the different applications. The D-SenS SME Value Chain is presented in Figure 19.

Depth sensors were integrated to the embedded system development tool chain. Starting from intelligent sensors the common core offer s tools for fast 3D image-based computer vision system development. By developing the advanced analysis algorithms and software the SMEs gained benefit compared to their competitors when offering their products to the system integrators. System integrators increase their core business and competence related to people safety application and offer robust solutions for application providers and end-users.
The D-SenS project already contains a natural value chain and companies that are clearly end-users to each other. To improve, facilitate and fasten the exploitation of the results the SMEs involve their “actual” end-users for video analysis systems, meaning e.g. parking halls or rest homes, to follow the project results. SME partners provided the consortium Letter of Interests from the end users i.e. potential customers. For example RDnet confirms that 3 global leading companies are fully interested in to use the results of the project in co-operation with RDnet and its partners.

Exploitation route to the market

The participating SMEs have existing ready-made markets and sales channels within the people safety sector. With new depth sensing technology they can offer, market and sell new enhanced solutions to old customers. Meanwhile, adopting the new technology opens new market segments as well as offers competitiveness in the old market sectors.
The project results covers the latest technology innovations. However, that is not enough, since the ultimate goal of the project was to create new revenue streams, ability hire employees, improve productivity and competitiveness for the project partners. Most of these targets of the project has been achieved. Primarily, the project results are easily deployed in the everyday businesses.

Dissemination activities

The role of dissemination activities was to communicate D-SenS results to a wider audience. The dissemination work was based on a dissemination plan created in an early phase of the project. This task worked in collaboration with exploitation activities in order to make sure, that no confidential material is disclosed which might put the exploitation of the results in danger. Public D-SenS-project web site was created ( and the site is being kept up to date.
Consortium created articles and had seminars from selected results. The list of the scientific (peer reviewed) journal articles (also listed in final report template A1):
• W.P. Sandberg: Extending the Stixel World with Online Self-supervised Color Modeling for Road-versus-Obstacle Segmentation. October 2014, Publisher: IEEE
• D.W.J.M. van de Wouw: Our improving IED object detection by exploiting scene geometry using stereo processing. February 2015, Publisher: SPIE
• M.H. Zwemer: A vision-based approach for tramway rail extraction. February 2015, Publisher: SPIE
• M.L.L.Rompen: Online Self-supervised Learning for Road Detection. May 2014, Publisher: IEEE.

The dissemination activities were (also listed in final report template A2):
• Csaba Beleznai, workshop: Reliable Left Luggage Detection Using Stereo Depth and Intensity Cues, 3rd Workshop on Consumer Depth Cameras for Computer Vision @ ICCV. December 2013, Sydney Australia
• Csaba Beleznai, summer school: 2D and 3D Computer Vision in Application-Oriented Tasks, Summer School on Image Processing. July 2013, Veszprem, Hungary
• Csaba Beleznai, summer school: 2D and 3D Computer Vision in Application-Oriented Tasks, Summer School on Image Processing. July 2014, Zagreb, Croatia.
• Daniel Moldovan, conference: 4th IEEE International Conference on Consumer Electronics – Berlin. September 2014, Berlin Germany.
• Daniel Moldovan, Machine Vision Trade Fair: Vision 2014. Stuttgart, Germany, November 2014.
• Workshop, Roberto Marega Soluzioni di videoanalisi applicate alla videosorveglianza @ the Engineer Association: Rome Italy October 2012.

Also the asset monitoring achievements were published on selected list of web magazines in Italy:

D-SenS video was created on selected topics for efficient dissemination of project results on the internet. Project partners participated to various events and exhibitions.
The final plan for the use and dissemination of the knowledge provides a good base for co-operation between all partners of the D-SenS group of companies.

We set the demonstrator systems up for both evaluating and disseminating the results. The demonstration locations were: the assisted living demonstrator in Espoo, Finland (IsCom, ADVANSEE, VTT), the smart building demonstrator in Espoo, Finland (Innorange, RDnet, VTT) as well and the security demonstrator in Finland and Italy (XTrust, Secom, AIT).
The demonstrations verified and tested the video and depth sensing analysis in real life situation. The demonstration and piloting improved the market potential of the applications and generate data for marketing the application to the customers of the participating SME’s. Also the demonstrations were a good dissemination tool and presenting them to potential end-users fastened the negotiations distinctly.

List of Websites:
The address of the project public website

The Project Coordinator, Meri Helmi, RDnet
The Exploitation Manager, Hannu Kulju, RDnet
The Technical Manager, Johannes Peltola, VTT
The Dissemination Manager, Roberto Marega, XTrust

RDnet Finland
Esa Reilio: +358 40 502 6762
Hannu Kulju: +358 40 5035 417
Meri Helmi +358 40 485 7244
Ville Ojansivu +358 50 589 0806

ViNoTion, Netherlands
Egbert Jaspers: +31 6 15410981

Advansee, France
Thierry Corbiere +33 680 457 375,

XTrust, Italy
Roberto Marega: +39 334 80 85 220

Secom, Italy
Francesco Polzella: +3931073734

IsCom, Finland
Tommi Kokkonen: +358 400 565670

Innorange, Finland
Antti Lappeteläinen: +358 400 468003

VTT, Finland
Johannes Peltola: +358 40 769 4056 :

TU/e, Netherlands
Peter H.N. de With: +31 6 3970 7535

AIT, Austria
Manfred Gruber: +43 664 8157877