Skip to main content

Elites, networks, and power in modern urban China (1830-1949).

Periodic Reporting for period 2 - ENPMUC (Elites, networks, and power in modern urban China (1830-1949).)

Reporting period: 2020-03-01 to 2021-08-31

The project examines the transformation of elites in China over a century (1830-1949) in an era of tremendous change. It intends to break through the existing limits of access to historical information that is embedded in complex sources of a different nature forming now massive digital corpora. Understanding this particular historical process will provide keys for a significant revision of what made modern China until today. Our vision of modern China has been — and still is — tainted by the revolutionary experience and the interpretations that resulted thereof. The approach we pursue to produce scalable data-rich history will serve to collect and deliver historical information at an unprecedented scale, to reshape the analysis of existing sources, and to develop the tools and techniques for exploration/exploitation of massive historical corpora. The key objectives of the project include: Analyzing urban elites in modern China at the level of actors rather than state institutions or community organizations; Analyzing the vectors, patterns and timelines of the involvement of elites in public action; Investigating the process of transnationalization of urban elites; Demonstrating the capacity to radically change the scale and quality of historical information; Establishing long-term digital historical resources in the form of innovative databases for extended research by the broader scholarly community.
The implementation of the project in the last 30 months followed several interconnected tracks:
Acquisition of the first major corpora in English and Chinese (newspapers, but also directories, dictionaries, etc.) in digital format and creation of the documentary infrastructure to support the access, process, and preservation of these corpora (SolR Database).
Recruitment and training of postdoctoral researchers (history, NLP) and specialists (data science, GIS) to constitute the core team of the project, around which a large circle of scholars in history, computing, and linguistics are collaborating.
Evaluation, selection and creation of the instruments and methods to be applied to the corpora, and integration of these instruments and methods into a coherent infrastructure with well-defined, fully transparent and reproducible workflows.
Training sessions in various advanced techniques (data visualization, R language, MCA, NLP) for the scholars involved in the project.
Creation of the beta version of our two major databases: Modern China Biographical Database and Modern China Geospatial Database, with a preliminary public interface (on-going and not made public yet).
Opening of a research blog which we deemed sufficient in the initial phase of the project to publicize our actions and accomplishments (1,800 unique visitors and 2,556 visits in February 2020).
Creation of a full-scale web portal that presents all the on-going projects and serves as an access hub to all our instruments and resources.
Creation of a collection on Peers – an open review platform developed by scholars from Aix-Marseille and Chicago University – for publishing multimedia scholarship. At this stage we have published four pamphlets – Digital History Lab – and initiated a series of Podcasts.
Organization of two international workshops, one on biographical databases, one on elites and networks in China. The former served to bring together top-rate experts to discuss and assess existing biographical databases. The latter brought together historians of China on the specific theme of elites, knowledge, and power. An edited volume is currently being prepared by the P.I.
Participation to various conferences (Biographical Data in a Digital World 2019, DADH 2019, AAS 2020 [panel accepted, but conference cancelled due to coronavirus], EHSSC 2021 [rescheduled from 2020] and seminars (Bristol University, Aix-Marseille University, Naples University [postponed to autumn 2021].
Significant advances were made in several directions: collection of biographical data on 78,000 individuals, both Chinese and non-Chinese, under their various denominations (1 to 9 names), ready for inclusion in the Biographical database. This is a key element for the identification of historical figures in historical sources across languages; collection of all the biographical pages for Chinese in the Chinese and English Wikipedia to be processed for data extraction. This is a major methodological breakthrough at both the level of collection and data extraction. Indexing of all the English and Chinese corpora (newspapers) to enable the identification and extraction of all named entities (persons, institutions, locations, etc.) at the level of articles and initial construction of a graph tool for the exploration of data. For Chinese newspapers, the ENPMUC research team is the only one worldwide to have gained access to these resources with the capability to process them with advanced techniques of data mining. We have developed a set of tools – enpchina package - specifically tailored to harness these massive corpora of multilingual (English, Chinese) digitized sources (newspapers, who’s who, biographical dictionaries, Wikipedia). The package provides an integrated environment that (1) ensures the transparency, traceability and replicability of workflows (2) provides export facilities for easily sharing the data and results (3) enables historians with limited programming skills to control the entire process from data extraction to exploration, analysis and publishing the results.
As the project develops, we have been training PhD and postdoctoral researchers to face the challenges of data-driven approaches to historical research in the long term. They represent the rising generation of scholars who in turn will be fully equipped to train the current and next generations of historians. As part of this crucial outcome, we are creating tutorials and other teaching materials in the forms of R markdown documents, replicable scripts and videos. Moreover, we have started to develop a complete training course in R programming for a broader community of social scientists, which will be freely available as a “MOOC” by the end of the project.
enpmuc-image.png