Web application delivers text analysis tools for all
Text analysis tools(opens in new window) are used to mine unstructured data in the form of text, and turn this into structured, actionable information. They include tools for preparing and processing texts, as well as tools for analysing data resulting from this processing. “These tools are used by researchers who have large quantities of digital information,” explains QUANTEDA(opens in new window) project coordinator Kenneth Benoit, professor of computational social science at the London School of Economics and Political Science(opens in new window), UK. “This information might be in the form of historical documents, literature, government documents or social media posts. Analytic tools might be used to identify differences in language usage, or to examine and assess psychological perspectives expressed in the texts.” Existing solutions fall broadly into two categories – software libraries for computer programmers, and end-user applications that require no programming ability. “While software libraries provide extensive technical capabilities, such power is only available to programmers with a high level of technical knowledge,” says Benoit. “User-friendly software on the other hand offers only limited functionality.” Benoit recognised the need to make text analysis tools available to a wider audience; easy to use, but with a high degree of flexibility and power.
Powerful text analysis
The QUANTEDA project set out to create open-source software to meet this need. A previous https://erc.europa.eu/ (European Research Council)-funded project, called QUANTESS, was critical to the development of the back end of the application. Benoit sought to build on this and develop a marketable prototype that would require no programming experience. The newly developed software runs entirely on a cloud server and is accessible through a web browser. This means that the ability to perform powerful analyses of large volumes of text is not limited by a user’s hardware capacity, making it far more accessible. Interface components were then developed and can be easily translated into any language. The programme can be made into Hindi, Chinese or Spanish versions, or any language for which the menus and help pages have been translated. “We also wanted to ensure that this web application would be within the financial reach of students, early-career researchers and users from developing countries,” adds Benoit. “To achieve this, a flexible pricing policy has been developed and reflects ability to pay.” Benoit notes that many online applications, such as Dropbox, GitHub, RStudio and Slack successfully use such a model.
Tools available for all
The application to come out of this latest project, called Quanteda Guru(opens in new window), is now available for trial or purchase. “In addition to academic researchers, we have identified other potential users who are data-rich but information-poor,” notes Benoit. “These include medical professionals, especially healthcare providers, and government departments such as ministries of justice or departments of health.” Other potential end users include business intelligence and marketing firms, which often use social media, and insurance firms. These need to analyse and assess risk and accident reports. Law enforcement officials also need to analyse written reports, while customer assistance staff and call centres are often required to analyse and log incidents. “We are currently working to add new features, improve the user experience, and scale up performance,” he adds. “The great thing is that this application is built on an open-source software library. This means that the source code of the analytic engine is open to scrutiny, scientifically validated, and subject to continual development and improvement.”