Skip to main content

Quantitative Analysis of Textual Data for Social Sciences

Final Report Summary - QUANTESS (Quantitative Analysis of Textual Data for Social Sciences)

QUANTESS was designed to advance the field in the development of methods and tools for the quantitative analysis of social science text. Through applications in analyzing “text as data” for political and other social sciences, through both its numerous article publications and three major software packages published for the R language, it has achieved this outcome. Research outputs have included new methodologies for scaling latent quantities from text, including from bag-of-words methods and the automated application of dictionaries, for automatic coding of texts by combining crowd-sourced sentence annotation with statistical scaling, and unsupervised methods for uncovering latent quantities from text, using either word counts or human-annotated codes. All of these developments have been accompanied by a major software library for the R language, quanteda, that is currently downloaded by nearly 5,000 users per month. This tool enables powerful, flexible and fast natural language processing and quantitative analysis of text, using fully open-source, documented, and tested methods. Accompanying this package are spacyr (for tagging parts of speech, extracting entities, and parsing dependencies) and readtext (for making it easy to read any text into R, including converting them from a variety of formats). Finally, the project has enabled several community-building initiatives, including the founding of a Text as Data Society (which has held five annual conferences, including one hosted using project funds), a Text Analysis Developers’ Workshop, and educational dissemination activities to train students in the use of the methods and tools developed by the project.