Skip to main content

Urban Sensing through User Generated Contents

Final Report Summary - URBAN SENSING (Urban Sensing through User Generated Contents)

Executive Summary:
Urban Sensing project brought a new product concerning the urban design, city planning and urban management market: a platform extracting patterns of use and citizens' perceptions related or concerning city spaces, through robust analysis of User Generated Content (UGC) shared by the city users and inhabitants over social networks and digital media.
Traditional data collection methods - such as surveys, interviews and ethnographic observations – have some limitations like high cost in terms of time and resources and limited size of the samples they can generally cover. More recently, methods based on real-time data and web harvesting have been adopted, but a focus on their spatial and temporal dimension is still lacking.
Applying text mining and conversation analysis to geo-localized UGC, Urban Sensing is able to provide meaningful indexes and dynamic maps depicting citizens' shared perceptions, emotions, hints and opinions regarding public services, urban spaces, time-based events, and the city as a whole.
A set of visualisation tools provides interaction with this maps and indexes allowing an insight into understanding how public policies, spatial interventions, events and transformations are perceived within a city, and at the same time it gives hints to designers, developers and entrepreneurs adopting a more human-centred approach toward our cities’ evolution.

Project Context and Objectives:
The Urban Sensing assumption is that by conducting an analysis of data sets based on text data extracted from UGC there is the possibility to recognize multiple stories, as they emerge, overlap and influence each other, unfolding from city users’ mental representations and spatial experiences of city spaces.
The idea is to focus on the operational level of the city, on its everyday aspects rather than on exceptional events or on tourists’ activity patterns. Moreover, the described environmental images are both spatial and temporal, as distribution of qualities is considered by city users in both time and space.
The aim is to constantly extract indications on urban emotions and well-being through a real time harvesting on social media generated content and to combine them with data sets (e.g. urban morphology, architectural types, city budget, opportunities, public services).
The tools and services provided by Urban Sensing support the SMEs involved in the consortium in their decision-making processes through a set of personalization of the platform in three main information domains:
• Urban Policy assessment
o Analyse users’ perceptions related to specific geographic areas
o Understand how population reacts to new urban policies
o Detect the lack of structures offered by institutions and city administrations
o Sensitize decision-makers and practitioners in the field of architecture and design, as well as the general public, about public space potentialities
• Exhibitions, Events assessment
o Discover meaningful relationships and connections between places, people and uses
o Understand how specific features of city spaces, services and events affect people's emotions
o Understand how specific user groups use public spaces
o Detect post-event/fair reactions and comments by citizens and participants
• Public transportation, commuting assessment
o Where people is requesting information about transportation
o When people is requesting information about transportation
o Information about the bus line
o Information about how many stops there are after user’s position
o User’s typically used stops
o User’s route from stop to stop
o Working time of the transportation authority
Closing a cybernetic feedback loop between cities and their citizens, the platform monitors and maps real-time information related to shared perceptions, emotions and opinions in urban environments. This provide a better understanding of how events, spatial interventions and transformations are perceived within a city, and at the same time it also gives hints to designers, developers and entrepreneurs who are willing to adopt a more human-centred approach toward cities’ evolution.
Traditional data collection methods such as surveys, interviews and ethnographic observations or more recent methods based on real-time data harvesting and analysis (e.g. the use of mobile devices to collect spatio-temporal data) have provided so far a good mechanism to investigate specific behaviours related to certain urban areas.
Technology development and the emergent participation of internet users, in terms of social interaction, have led to a redefinition of the ways that underline information sharing possibilities. Today all citizens can in principle produce and share information about their everyday life experiences and they actually do it, mostly using social platforms such as Twitter and Facebook among others. User Generated Content, are those type of data, content made public via the Internet, that reflects a certain degree of creative effort, and that has been created overreaching professional routines and practices.
Our assumption is that by conducting an analysis of data sets based on text data extracted from UGC we can recognize multiple stories, as they emerge, overlap and influence each other, unfolding from city users’ mental representations and spatial experiences of city spaces.
These data, correlated with other ones, like log data about time and type of particular events, are able to show citizens' patterns of use, spatial experiences and related perceptions of the urban environment; returning maps of a social, psychological, and environmental ecology, the perceived cityscape.

Project Results:
During the last period of the project, since M10 (July 2013) until the end of the project M29 (February 2015), all the results of the projects have been reached.
The different three customisations of the Urban Sensing platform have been designed for all the SMEs involved and for each of them a demonstration and validation session have been arranged:
• Urban Policy assessment
The main beneficiary of this result, and of the relative personalisation of the Urban Sensing platform, is Accurat (ACC). Due to a commercial interest in the analysis of the dynamics of mass retail, ACC uses the Urban Sensing platform to:
o Map movements of people in the surroundings of important malls in the city centre and in the suburbs. The data are grouped and sliced by weekdays and weekends, trying to identify emerging trends and to relate them with information on fluxes from the retailers to prove the concept and the validity of the indicators. The visualizations and the analysis also focus on how it's possible to correlate specific promotions with fluxes of customers
o Isolate clusters of students in the surroundings of universities and monitoring their city usage habits in order to understand patterns of movement and common trends (e.g. what are students' preferred neighbourhoods of the city at night and where they like to spend time). This type of data could be used as a tool for real estate investors to plan the construction or repurposing of old buildings into student dorms or services
o Monitor the promotion action concerning specific products. The crowd sourced data are used to demonstrate how the impact of a promotional campaign can be evaluated geographically, and thus adjust the location of the commercial action
o Asses the distribution of nightlife all over the city, in order to provide useful information to brands targeting a specific audience. Each specific social network has its own demographics; by highlighting differences in usage during evenings and nights of the weekdays and events, it will be possible to map the habits of different groups of users in relation to the placing they look for when they want to have fun (i.e. Instagram has a younger user base than twitter, and this difference can provide useful insights city wise)
This list represents the concrete demonstrations arranged at the end of the first iteration of the Urban Sensing project when ACC provides to its customers, one for each area of interest e.g. a liquor company a real estate company, a mall, the evidence of people behaviour in respect with changes in product prices, special offers, monument of interest. The data coming from the social media have been collected to show how ACC could improve its business activities exploiting the data available on the social media for the benefits of the geographical analysis needed by its clients
• Public transportation, commuting assessment
The main beneficiary of this result, and of the relative personalisation of the Urban Sensing platform is, at first stage Mobivery (MOBI) and then, SISU LABS (SISU). During this period MOBI declared bankrupt causing difficulties inside the consortium that immediately replaced it with a new partner SISU. Even if the MOBI’s bankrupt produced a lot of problems related to the project management from the point of view of the technical development no delays have been registered and the second iteration has been closed in time with output concerning information from users about public transport in Madrid, to measure their level of satisfaction, usage habits, potential service problems.
After a personalization of Urban Sensing collectors the platform was completely integrated with Malcom (MOBI’s platform developed for Empresa Municipal de Transportes de Madrid) and able to collect from it information about the Empresa Municipal de Transportes de Madrid (EMT) like:
o General information about the device use by people
o Where people is requesting information about the EMT
o When people is requesting information
o Information about the bus line
o Information about how many Stops there are after user position
o User's typically used bus stops
o User's route from stop to stop
o Working time of EMT
As indicated before, all the information were related to the area of Madrid, using predefined bounding box to cover the entire city transport area.
After Malcom personalisation the already developed Urban Sensing collectors for Twitter, Foursquare etc. have been tuned to the Spanish language (common used words or slang sentences) to better understand and classify text related to emotions and/or sentiments.
The complete personalization of Urban Sensing platform allow the generation of a complete graphical interface to the map of Madrid showing the feedbacks obtained from users in a given area at a particular time.
Even if all the development have been completed a real validation of the Urban Sensing platform could not be arranged due to the MOBI’s default.
• Exhibitions, Events assessment
The main beneficiary of this result, and of the relative personalisation of the Urban Sensing platform, is LUST (LST). The biggest change in respect with the previous customisations in this third version is that in almost all functionalities that are added, tweets and social media are not only visually plotted on a map, but also they can be explored on the level of a single UGC item in terms of content, context, emotion, etc., or on the level of a selection. The main improvements in the third iteration are:
o New ways of visualizing information:
• Content and Context / Keywords and topics extracted from UGC are represented on a map through circular shapes
• Content and Context / Words associated to geometric clusters-areas processed by the platform can be visualized as word or tag clouds on a grid superimposed on the map
• Content and Context / Highlighting visually coherent groups among the clouds
• People Movement visualizations
• Height maps / Languages
• Cloud visualization (Isolines)
o Extraction of topics and contextualization:
• Languages / Spatial distribution of contributions in selected languages, coupled with feedback sentiments
• Add context to a set of data, by gathering content that is relevant to the object of the analysis, including news, Google search
• Inclusion of content from mainstream media feeds
• Inclusion of content from N-gram
• Trend detection
• Management system for setting up topic channels
o Interface:
• Perform multiple cities (comparative) analysis
• Possibility to add filters over the collected data
• Search within data on map (magnifying lens, visualise patterns and analyses on the map and/or side panel)
• Easy pull and push (import and export) of data streams through standardized APIs
• Dashboard showing textual description and graphs for each specific visualization
o General functionalities
• Support for Dutch in the natural language processing tools developed for English, Italian and Spanish languages
• Implementation of the emotion detection algorithms (and ranking)
Those three different iterations provided an improvement of the initial platform and at the end of the project each SMEs have been provided with the final version of the Urban Sensing platform.
To reach the final version of the platform an iterative approach has been followed. The SMEs and the RTDPs cooperated together, since the requirements definition up to each demonstration, to a SMEs customers or internally to each SMEs.
The complete list of the overall identified requirements is available in D2.2 - Systems requirement specification and it has been structured in accordance with the usual differentiation between functional and non-functional requirements. The first part of the document lists functional requirements, organized around gathering, processing and visualization of data. This organization of requirements in the first part comes from the basic groups of functions to be provided by the application. In the second part of the document a non-functional requirements are listed. Non-functional requirements include requirements related to performance, security, software quality or documentation. Each requirement is specified by means of following attributes:
• Identifier – a unique ID of the requirement which will be used for referencing the requirement throughout the whole project
• Description – a description of the requirement including differential description according to other similar but different requirements
• Traceability – a reference to the sources which led to the formulation of the requirement
• Verification – a description of how to verify the fulfilment of the requirement
The iterative approach affect not only the requirements definition but also the software architecture that characterise Urban Sensing solution; if during the first iteration the SW architecture was specified in a conceptual way completely independent to any specific implementation technology during the second and the third iterations the conceptual specification has been progressively supplemented with the implementation notes specifying additional constrains for the technologies used for the platform implementation, deployment and operation conditions.
In Figure 1 we reported the final Urban Sensing architecture where:
• C/M – are the Data connectors and mediators. Data connectors connect platform to external data sources, they can be:
o Pool Connectors – they periodically fetches (pool) data from the source and pushes data for subsequent processing
o Push connectors – they are implemented as the service invoked by the external system, which provides the data
In both cases, data from connectors are pushed to the platform via Data mediator, which transforms data from the source format into the Common Data Model (CDM). The connector development represented a fundamental part of the Urban Sensing project and for this issue a specific workpackage has been dedicated to it. At the end of the project the following connectors have been developed and deployed:
o Facebook - This connector searches for places in Facebook using keywords representing the city and controlling that each fetched place coordinates are inside the bounding box corresponding to the city. The connector calls the Facebook Graph API
o FlickR – This connector searches for photos taken in a certain time range and inside the current city bounding box coordinates. After that, for each collected photo it asks for details calling a second APIs method
o Foursquare - This connector searches for check-in in Foursquare places, available for a specific area, using latitude and longitude, radius and, optionally, a keyword to specify results
o Instagram - This connector searches for Instagram photo’s, available into a specific area, using latitude and longitude and radius from center
o Panoramio - This connector is responsible for connecting to Panoramio API, looking for photos in the defined bounding box.
o RSS – This connector fetches data from RSS endpoints of some web newspapers:
Italian newspaper ‘Corriere della sera’, Milan section:
http://www.corriere.it/rss/homepage_milano.xml
Italian newspaper ‘Repubblica’5, Milan section:
http://milano.repubblica.it/rss/rss2.0.xml
Spanish newspaper ‘El mundo’6, Madrid section:
http://estaticos.elmundo.es/elmundo/rss/madrid.xml
Spanish newspaper ‘El pais’7, Madrid section:
http://ep00.epimg.net/rss/ccaa/madrid.xml
Dutch newspaper ‘AD’:
http://www.ad.nl/rss.xml
o Twitter – This connector is used to obtain tweets related to the particular keywords, location or written by specific users. Twitter connector uses free API provided by Twitter, which sends tweets to the connector in real time, filtered by specific criteria
o Malcom - This connector is able to query into Amazon S3 storage of Malcom platform, asking for specific logs coming from EMT (Empresa Municipal de Transportes) mobile app. The connector can fetch data using a provided (by Mobiguo) account access and the Java library from Amazon, then it downloads collected data needed to analyse the user experience
o Open Data - Open data sources leveraged to enrich information related to user activities on maps. An open data file can be charged into a specific MongoDb collection, starting from JSON or CSV format and stored as JSON object. Software modules on server can use this data, searching for a geographical point into a geometry (e.g. district) and retrieving stored information for area
• AnS - Annotation service. It forms the main processing units of the platform implementing information extraction tasks. Annotation service receives Posts and adds Annotations extracted from textual content
• MS - Messaging service. It provides subscription/notification interface used for real-time distribution of the processed posts to the platform components or to the external services
• IS - Indexing service. For each post and associated annotations, Indexing service forms a tuple of fields, which are indexed for efficient information retrieval. Indexed fields can be formed from all properties of all referenced entities, sentiment, encapsulating Post properties and post author identity properties
• QS - Query service. It is tightly coupled with the Indexing service and both are implemented using the selected search platform such as Solr or ElasticSearch. The query result is the filtered stream of entities encoded in JSON. It is possible to specify which subset of entity properties should be retrieved from the index. Different set of fields can be selected for fetching in result and used in the query for filtering
• AgS - Aggregation service. It computes the views with aggregated statistics for the filtered data. It is implemented using the filter/aggregate model where data are at first filtered using the specified query and then aggregated in the incremental way. Computed values of aggregated statistic are stored in the cells of the n-dimensional grid with dimensions corresponding to the indexed properties
• WebCl - Web client. It provides web interface for the platform users. It allows specifying of constrains for data filtering and provides visualization of aggregated data. It is implemented as the web application using the HTML5 and Ajax technology, which is connected to the Query and Aggregation services and to the external Web Map service
If Data Connectors and Mediators represent the data gathering components used to collect and format the data coming from the different data sources in a Common Data Model (CDM), this latter is then annotated with Sentiment Analysis, Emotion Detection and Named Entity Recognition inside a set of components called Information Processing Components. Here we have a chain of processes each CDM needs to pass before being stored on a cluster of NoSQL databases. This processing chain extracts (a clear description about how it works is reported on D4.1 - Information processing components) the following information:
• Language identification (check for Italian, English, Spanish, Dutch)
• Sentiment analysis (evaluating the sentiment of the whole text, and within single part of speech. The evaluation reports a value from -3 to +3 for negative, positive and neutral sentiment of the text)
• Emotions detection (extracting the emotions associated with the text or with part of the speech. The detected emotions are Anger, Fear, Disgust, Sadness, Joy, Surprise
Each CDM is then augmented with this information together with a set of pre-processing information useful for the Aggregator engine:
• Pre-computed location at pixel resolution using Mercator Projection on a system with 19 zoom level and tiles of 256x256 pixels (used to aggregate point by pixel to increase the visualization performance at lower zoom level)
• Pre-computed geographical area (neighbourhoods, blocks, etc) using open geographical data available for each city
Then this enriched CDM is stored in a NoSQL database to be used by the Aggregator and by the Visualization Engine. The Aggregation Service provides persistent storage for the CDM data and query interface for the visualization engine and user interface. It is connected to the data gathering components and Annotation service through the message queue using the memcached protocol and provides REST-like interface for queries. Internally, it consists of the following sub-components:
• Memcached connector fetches CDM data from the message queue
• Indexer stores data into the persistent storage
• Query interface translates Aggregator query language into the query format of the underlying persistence storage and provides in-memory cache for result sets
The Aggregation Service is designed in a generic way and it is agnostic to the repository used to store CDM data. Current implementation is based on MongoDB repository and how it works is deeply explained in D5.2 - Platform integration prototype.
The Visualization Engine is the entry point for each user: it allows exploring data in different ways through a web interface. Each user can create one or more projects and within each project it is possible to create one or more layers over each city. Each layer consists of a set of selected sources, filters, and a visualization style. The user can easily configure all these parameters to customize each layer for their needs.
Filters and sources can be configured to:
• Specify the current data source
• Filter UGC by time in an aggregated or non-aggregated way
• Filter by languages
• Filter by sentiments
• Filter by emotions
Each layer can be configured with one of the following visualization style:
• Point cloud (simple visualization of each UGC with its own geographical coordinate)
• Height map (aggregated visualization that count the about of data for each cell formed by a grid configured by the user. The height of each cell is in relation with the total count of the UGC inside the relative grid cell)
• Height box map (similar to the height map, but it is displayed as a grid of rectangular boxes where their height is equal to the total amount of UGC inside its cell)
• Height sphere map (similar to height map, but display spheres with an elevation and a radius equal to the total amount of UGC inside its cell)
• Direction map (the area is divided in a grid. For each cell an arrow describe the density, the speed and the direction of the people that moves from that cell)
• Line path (shows the path of movement for each single user)
Each visualization style can be customized as follows:
• Point cloud: a colour and alpha value can be set to the points
• Height map, height box map, height sphere map: the user can select a linear gradient with 3 stop colours and relative alpha values. It’s possible to specify also a discrete gradient, that is very useful to display the height-map as iso-lines. The user can configure the grid dimension with a slider
• Direction map: the user can change the colour and general alpha of the arrows, together with the possibility to change the grid dimension
• Line path: the user can change the linear gradient scale with 3 step colour that is used to depict each user path
Each visualization style can be configured to act as a one-shot visualization (all the data of a given time interval) or time aggregated visualization (the visualization smoothly morph from a state to another based on timestamps). Each layer that is configured to be a time aggregated visualization provides a slider and a playback control. The user can use the slider and the playback control to move through time.
The following visualization styles have some additional configuration in case they are used as time aggregated visualization:
• Point cloud: a slider adjusts the decay of each point, the decay is the amount of time a point remain visible during an aggregated visualization
• Line path: a slider adjusts the decay of each line. In this case, the decay is the amount of time that each visible line segment represents. Longer lines mean smaller time passes between two or more places where a single user posted a UGC. Shorter lines mean higher time passes between two or more UGC
The user via drag & drop can change the layers’ order.
The visualizations are drawn on two different canvases:
• The visualization can be drawn as a Google Map 2D overlay. The Google Map tiles can be customized using a set of predefined styles or by adding a JSON style configuration
• The visualization can be drawn as a 3D visualization over a plane
The Visualization Engine uses the Three.JS library to visualize the data. This library provides the ability to exploit OpenGL on the browser through its web standard called WebGL. This enables the data to be handled on Graphic Processing Unit (GPU) instead of being processed slowly by the CPU, supporting a high variety of graphic processing possibilities not available on a standard 2D canvas.
All the visualizations are written using OpenGL shaders in GLSL language. This takes advantage of multi core GPU to compute faster calculation and handle smooth transition between visualization customization like styling or aggregated data playback.
One point of strength of visualizing data through GPU is the possibility to handle a higher volume of points, lines and geometries on the same screen. On HTML5 Canvas or through SVG, the maximum number of elements, points, lines etc. doesn’t exceed a few thousands. With WebGL it’s possible to draw more than a million of points at the same time, or to move more than ten thousand multi-point lines through time.
The web application was built using one of the last full-stack javascript framework available called Meteor.js (www.meteor.com). Meteor is a complete open source platform for building web and mobile apps in pure JavaScript, without having the need to build separately server side code and client side code. It provides a mechanism to updates live the screen accordingly to the database changes. This means that two users can collaborate together on the same interface and they will see each other’s changes. It was released on its 1.0 version in October 2014. Its adoption by the Urban Sensing team was due to the possibility to rapidly prototype the application without having to write a lot of server side code for saving project, layers and style configurations.
The development environment created during the project consists of a two servers infrastructure, vertically scaled in memory and CPU. The connectors use one server to store and gather data. The annotator and the aggregator engine, together with the visualization engine, use the second server. From this development environment, the Urban Sensing Consortium moved to a new, horizontally scaled, infrastructure based on the following configuration:
• 1 server for Connectors/Mediators:
This is a server of 2GB of RAM and 200GB of HD. It is hosted on a different server farm than the other servers but on the same city
• 1 server for Annotator/Aggregator/Mongos:
This is a server with 6GB of RAM and 30GB of SSD
• 4 servers for MongoDB Shard hosted at node1, node2, node3, node4:
These are 4 servers with 12GB of RAM and 60GB of SSD. They are on the same server farm as the Annotator/Aggregator.
This configuration provides enough performance and memory capacity to process and compute aggregation in near-real-time over a huge volume of data. Now the NoSQL database, where the CDM annotated data are stored, is configured as a shard, which can be easily extended to horizontally scale the cluster increasing the memory (RAM, and disk) and performance capacity.
Images and examples concerning the Urban Sensing visualisation are deeply described in Section 4 of D5.2 - Platform integration prototype.

Potential Impact:
Nowadays simply addressing the need of services is not enough; the user experience is becoming a central topic in planning a complex system and experience is a multidimensional constellation of components.
User’s perception is one of these components and it is deeply related to the images that the users have about a geographic area. The final mental image users have about specific geographic areas is measured with the modes in which these places are perceived and utilized by the user and his/her social group between intervals of time or culture.
Planners and designers have to deal with this perception in the same way we do with more objective information, such as analytical data and statistics. Even if the numbers are showing that a service is properly working, if users have a different perception, this information immediately becomes crucial, even against the supposed objectivity of the analytical input.
Traditionally SMEs offering services tied to specific geographic areas (urban planners, event organizers, etc.) start their work with a deep observation and inquiry of the specific spatial and temporal context, defining social and environmental characteristics and surveying the population to gather and map an understanding of the experiences in such contexts. SMEs working with iterative design processes may need to observe and analyse these contexts before the design intervention and after the design intervention, in order to gather feedback from the real users. These research activities are usually time and resources consuming and require specific competences in research methods; moreover those analysis are performed at a specific time and then may not reflect the temporal dynamics and evolutions (e.g. the way the space is perceived after the design intervention or after time intervals).
The possibility offered by Urban Sensing goes beyond the idea of support sampling and observation contexts. The platform gives a high cut of the current costs in terms of time, providing a near real-time consideration of a particular aspect (noise, policy acceptance, event acceptance and citizen responses), providing a direct overview of the possible outcome of applying a policy or organize specific events.
ACC, SISU and LUST, the SMEs beneficiaries of the Urban Sensing project, intend to use the platform “as is” after the end of the project, for clients and potential clients that want to get a grip on the city and how they can use the data that is continuously generated by people:
• As a service to get insights and discover trends in urban patterns, and act upon them
• As an add-on to existing initiatives on the urban environment, ranging from biennials, research institutes, online urban platforms, start-ups, etc. where the product adds value in, on the one hand visualizing information, and on the other hand help people to identify problems and opportunities
The Urban Sensing SMEs identify the following way of exploitation:
• The platform will be used to support SMEs’ services, and therefore widen the scope of their services (better projects, better clients, stronger proposition for future projects)
• Specific parts of the platform can be reused in different projects, think of the semantic orientation, the data collectors, the sentiment analysis, etc.
• The beneficiaries intend to sell the IPR in the project
• As a platform/product
• As a tool to deliver specific consultancy to clients in the form of a report
• By licensing sale and system-as-a-service (SAAS) use for specific clients
• By using visualizations and data for providing innovative editorial products (on-line, interactive)
• By licensing of a specific elements of the platform (certain “layers”). Users have only access to those “layers” that are licensed
And the following main market of interest:
• Public institutions
o City Councils / policy makers
o Festivals
o Conventions
o Public transport authorities
o Biennials/exhibitions/events
o Fairs
o Tourism and territorial promotion
o Neighbourhood initiatives
o Smart city initiatives
o Future Cities initiatives
o Special initiatives (e.g. Expo 2015 Milan)
• Private investors
o Real estate investors
o Urban developers
o Infrastructure development
o Trend analysis
o Telecom
o Media outlets
o Targeted advertising
o Market research
o Public relations
o Live entertainment
o Geo-marketing
o Mobility
• Private companies and agencies looking for geo-marketing insights and brand perception in relation to territories:
o Urban planners
o Architecture firms
o City planners
o Urban Future initiatives
The Urban Sensing Consortium promoted the results of the project on several conferences all around Europe but also on United State of America and on several journals, not only scientific but also for masses and industries.

List of Websites:
www.urban-sensing.eu