Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Revealing Unseen Text with THz Waves

Periodic Reporting for period 1 - RUTE (Revealing Unseen Text with THz Waves)

Reporting period: 2022-01-16 to 2024-01-15

Ancient codices are often fragile, brittle, or damaged, making them difficult to handle without risking permanent damage. In some cases, multiple pages are stacked, rolled, or folded, and the manuscript is too delicate to open. The RUTE project aims to read these "closed books" without opening them, using Terahertz (THz) Time-Domain Spectroscopy and Imaging (THz-TDSI). This non-invasive technique leverages THz radiation's ability to penetrate soft materials like paper and detect unique spectral responses of ancient inks. The main objectives of RUTE are:
• O1: Create a unique dataset of THz images of manuscripts using ancient recipes for inks, supports, and bindings.
• O2: Develop a pipeline for separating, enhancing, and restoring single-page text from closed books.
• O3: Design methods for deblurring and denoising THz images using learned class-specific priors.
• O4: Develop an open-source and user-friendly interface for THz image reconstruction.
RUTE's methodology, involving beam shape effects removal and the use of text-tailored prior knowledge, aims to significantly enhance the readability of revealed text. The project outcomes will have a profound impact on the analysis, preservation, and digitization of valuable European and global cultural heritage assets. By unlocking hidden texts in ancient codices stored in museums, archives, and private collections, RUTE will uncover secrets concealed for centuries.
The RUTE project established a benchmark for using THz time-domain imaging with advanced computer vision and machine learning to uncover and enhance the readability of hidden text in closed books. Mockups simulating ancient manuscripts were created in collaboration with the chemistry lab of the Centre for Cultural Heritage Technology (CCHT) of the Italian Institute of Technology (host institute). The fellow completed a 13.5-hour "High Risk Safety Course at Ca' Foscari" for lab access certification.
Initially, three commercial inks were purchased, and ten inks, including iron-gall and copper-based, were synthesized using ancient recipes. To enrich the dataset, additional mockups were created with cellulose pallets and ancient-like paper, featuring text in synthesized iron-gall ink. Images of the mockups were acquired using a hyperspectral camera (HERA, Nireos) to verify if the created inks had unique spectral fingerprints in the near-infrared region. The fellow received hands-on training with the hyperspectral camera and visited the Correr Museum’s Archive in Venice to secure access to invaluable historical documents. As part of the training and dataset creation phase, the fellow conducted THz image acquisition with the Department of Molecular Sciences and Nanosystems at Ca' Foscari using a commercial THz-TDS system, TOPTICA TeraFlash Pro. Protocols for system settings, data naming, storage, and safety procedures were established. Challenges, such as paper curvature producing artifacts, were addressed by repeating scans with fixed paper and digitally correcting images with a method developed in collaboration with a colleague.
The fellow transferred knowledge on data curation and pre-processing to CCHT’s PhD students and junior postdocs. The primary outcome was a paper presented at the International Workshop on Fine Art Pattern Extraction and Recognition (FAPER) during the International Conference on Image Analysis and Processing (ICIAP, 2022). Parallelly, a novel method was designed for restoring THz time-domain images in reflection geometry, focusing on beam-shape effects removal and denoising, published in IEEE Transactions on Terahertz Science and Technology. A self-referenced method for correcting geometrical distortion due to sample surface curvature was also developed and published in the same journal.
During THz image acquisition, the challenge of optimizing step size without reducing resolution was addressed by developing a blind super-resolution approach, presented at the International Workshop on Mobile Terahertz Systems (IWMTS, 2023). Knowledge transfer led to a publication at the International Geoscience and Remote Sensing Symposium (IGARSS, 2024). Given the impracticality of training CNNs with many input images due to lengthy scanning times, a process for creating THz synthetic data was developed during a secondment at Instituto de Telecomunicações in Lisbon. This method, combining THz images' data structure with CNNs, was presented at the International Conference on Computer Vision Theory and Applications (VISAPP 2024), where the fellow also served as an oral session chair. This work revealed text from up to eight pages of a closed book and is being prepared for submission to IEEE Transactions on Image Processing.
Previously developed methods were tested on text-containing images using OCR algorithms tailored for handwritten text, with promising results being drafted for submission to the International Journal on Document Analysis and Recognition.
A strategic dissemination plan maximized RUTE’s impact. Scientific dissemination included presentations at ICIAP in Lecce, IWMTS in Bonn, and VISAPP in Rome. A dedicated website provided content for academic and general audiences. An international workshop titled “Illuminating the Past: Uncovering Hidden Layers in Cultural Heritage with Hyperspectral and THz Imaging” showcased project results and facilitated networking. Throughout the project, the fellow and the host institute maintained an active social media presence, sharing key results and updates. To explore further development, the fellow attended webinars on intellectual property, technology transfer, and gender aspects in research.
The RUTE project has provided new insights into using non-invasive THz imaging techniques for scanning fragile ancient manuscripts, establishing the fellow as an expert in THz time-domain image restoration with several state-of-the-art results:
1. Low-Rank Property Utilization: RUTE developed the first method leveraging the low-rank property of THz time-domain data acquired in reflection mode to restore images of complex cultural heritage items. This approach restores THz images in the 0.25–6 THz range, spanning over four octaves.
2. Combining Data Structure with CNNs: RUTE introduced the first method that combines the underlying data structure with CNNs for THz image restoration, presented at an international conference. This method, along with the open-source code and a data synthesis pipeline, will be submitted to a high-impact journal.
3. Blind Deblurring of Hyperspectral Images: RUTE extended text-tailored priors to hyperspectral document images, proposing the first method for blind deblurring of hyperspectral images containing text. This method significantly improved text readability, verified by an automatic handwritten recognition system.
The project's outcomes will significantly impact digitizing and preserving ancient manuscripts by leveraging advanced computer vision and machine learning techniques. RUTE successfully read through closed manuscripts, revealing and enhancing text from up to eight pages and developing approaches applicable beyond hyperspectral and THz imaging.
Illustration of the RUTE project (Marina Ljubenovic)
My booklet 0 0