Printed Documents Authentication

Periodic Reporting for period 1 - PRINTOUT (Printed Documents Authentication)

Okres sprawozdawczy: 2020-06-15 do 2022-06-14

The cheap, easy access and wide use of peripheral devices such as printers and scanners have played a major role in the amount of printed information generated today. From advertisements, currencies, books, newspapers, magazines, contracts, product packaging, etc., there is always a printing technology involved. With advancements in staffless and cashless stores adoption in big cities, supermarkets and stores will make available only printed data (such as the QR-CODES) for purchases and interaction with clients, making such printing and scanning technologies crucial in a near future. Notwithstanding such advancements in the availability of printed information, the lack of regulation and forensic procedures of such kind of medium has allowed counterfeiters and other criminals to use such technology for bad purposes. For example, printed documents that are proofs in criminal investigations, such as the ones related to corruption and money laundering, can be found in a suspect's house; fake currency can be printed and distributed in a neighborhood, thus harming the local economy; domestic or international terrorist plans can be found in a facility; pedophiles can print and distribute child porn in order to avoid security agencies control over the Internet; deceivers can fake badges to have access to restricted areas, hitting up the organization and security of events. Finally, such modern technologies in printing and scanning have made counterfeiting easier and more profitable than ever, as counterfeiters can perfectly copy and print packages of fake products to resemble the original ones. Such a problem has made the International Chamber of Commerce raise an alarm of €3.7 trillion losses due to counterfeiting and piracy, with 5.4 million jobs at risk by 2022. Products counterfeiting has also a significant impact on health: according to the World Health Organization, up to half of the malaria medications could be fake.

In the project PrintOut, we aim to tackle the above-mentioned problems by performing research on Computer Vision and Machine Learning solutions for printed document forensics. The project aims to tackle the following problems in research:

(i) lack of cheap procedures using precise statistical models to perform robust Digital Image Forensics on printed documents;
(ii) lack of comprehensive training data for machine learning models
(iii) open-set (or unknown classes) classification
(iv) security/adversarial attacks

The work performed until the end of the project includes:

1- Construction of the dataset of printed images in very difficult scenarios, which we called VIPPrint. Such a dataset includes, at the time this report was written, 12 modern color laser printers and contains pristine images and also fake images generated by deep fakes. Such a dataset is available at Zenodo.

2- A journal paper presenting the abovementioned dataset and its challenges for machine learning solutions was published in the Open Access Journal MDPI Journal of Imaging, where we also performed a comparative study of recent approaches on machine learning solutions for source linking of printed patterns and also digital image manipulation detection on the printed domain.

3- We published our work on open-set printer source attribution using an Ensemble of Siamese Neural Networks in the IEEE Access Journal.

4- We also finished the research on ad-hoc image descriptors for multilevel 2D barcodes anticounterfeiting. Such work was published at the IEEE Workshop on Information Forensics and Security special session on the forensics of printed documents.

5- We worked on adversarial machine learning for printed document authentication. We did an in-depth comparative study of common adversarial machine learning approaches, we proposed the first one based on Expectation Over Transformation for printer source attribution classifiers, and we found a simple but effective adversarial training approach to make machine learning classifiers strong against these adversarial attacks. Our work is under review at the MMFORWILD 2022 Workshop.

6- We started the collaboration with an Italian Srl company VidiTrust. We focused on outperforming their anti-counterfeiting system using siamese networks, especially on new authentic barcodes that were not used to train the classifier. For this work, the researcher is been advising and teaching an undergrad student called Nischay Purnekar in his master thesis. Such work has already promising results in both open and closed sets and the student will continue such a work in his Ph.D. thesis.

7- A website for the project was created (https://www3.diism.unisi.it/~ferreira/printout/) where we present, motivate, and update the results of the research.

8- The researcher has been presenting his solutions for the academic and the private sector, being in touch with entrepreneurs. A Demo showing the Siamese Network solutions for open set attribution was shown to a group of master's degree students and a group of entrepreneurs that visited the University of Siena in May of 2022. Additionally, the researcher participated in the Bright Night event and also taught 2 seminars.

So far, the progress beyond the state of the are:

1- We proposed for the first time the siamese networks use for the printer attribution problem and also touched on the open set printer source attribution for the first time.

2- We proposed a solution on siamese networks to authenticate barcodes from an Italian SRL company called Viditrust. Such a solution has promising performance and outperforms their existing authentication system for three reasons: (i) it has one-shot training capabilities, which means they do not need too much data to train; (ii) it has open set capabilities, which means that it can deal with any color of barcode pattern; and (iii) by comparing a suspect barcode with a genuine barcode in our siamese networks, we can perform test augmentation (including more genuine barcodes in the comparison) thus improving classifiers accuracy.

3- We touched on the problem of adversarial machine learning for printer source attribution for the first time. We showed that reprinting and adversarial attacks can easily fool classifiers, but a simple fine-tune of these classifiers with reprinted data can make classifiers stronger against several attacks, not only the attack that was used to fine-tune the classifier.

The expected results and impacts are:

1- We aim to evolve the Viditrust solution on anti-counterfeiting, avoiding constant retraining of their models for every new barcode they have to train. Current results are promising and we aim at helping their systems to include less human effort for classifier retraining for any new anti-counterfeiting products (barcodes) they launch (open-set scenario).

2- Our proposed dataset started further involvement of the digital image forensic community to investigate print and scan attacks against image manipulation detectors. Actually, there is already one work considering the proposed dataset to detect deepfake images also in the rebroadcasted (or printed) domain.

3- With our work in adversarial machine learning, we expect to inspire other researchers to extend the adversarial attacks against classifiers trained with adversarial samples.

Project's logo

Periodic Reporting for period 1 - PRINTOUT (Printed Documents Authentication)

Udostępnij tę stronę

Pobierz