Periodic Reporting for period 1 - PRINTOUT (Printed Documents Authentication)
Reporting period: 2020-06-15 to 2022-06-14
In the project PrintOut, we aim to tackle the above-mentioned problems by performing research on Computer Vision and Machine Learning solutions for printed document forensics. The project aims to tackle the following problems in research:
(i) lack of cheap procedures using precise statistical models to perform robust Digital Image Forensics on printed documents;
(ii) lack of comprehensive training data for machine learning models
(iii) open-set (or unknown classes) classification
(iv) security/adversarial attacks
1- Construction of the dataset of printed images in very difficult scenarios, which we called VIPPrint. Such a dataset includes, at the time this report was written, 12 modern color laser printers and contains pristine images and also fake images generated by deep fakes. Such a dataset is available at Zenodo.
2- A journal paper presenting the abovementioned dataset and its challenges for machine learning solutions was published in the Open Access Journal MDPI Journal of Imaging, where we also performed a comparative study of recent approaches on machine learning solutions for source linking of printed patterns and also digital image manipulation detection on the printed domain.
3- We published our work on open-set printer source attribution using an Ensemble of Siamese Neural Networks in the IEEE Access Journal.
4- We also finished the research on ad-hoc image descriptors for multilevel 2D barcodes anticounterfeiting. Such work was published at the IEEE Workshop on Information Forensics and Security special session on the forensics of printed documents.
5- We worked on adversarial machine learning for printed document authentication. We did an in-depth comparative study of common adversarial machine learning approaches, we proposed the first one based on Expectation Over Transformation for printer source attribution classifiers, and we found a simple but effective adversarial training approach to make machine learning classifiers strong against these adversarial attacks. Our work is under review at the MMFORWILD 2022 Workshop.
6- We started the collaboration with an Italian Srl company VidiTrust. We focused on outperforming their anti-counterfeiting system using siamese networks, especially on new authentic barcodes that were not used to train the classifier. For this work, the researcher is been advising and teaching an undergrad student called Nischay Purnekar in his master thesis. Such work has already promising results in both open and closed sets and the student will continue such a work in his Ph.D. thesis.
7- A website for the project was created (https://www3.diism.unisi.it/~ferreira/printout/) where we present, motivate, and update the results of the research.
8- The researcher has been presenting his solutions for the academic and the private sector, being in touch with entrepreneurs. A Demo showing the Siamese Network solutions for open set attribution was shown to a group of master's degree students and a group of entrepreneurs that visited the University of Siena in May of 2022. Additionally, the researcher participated in the Bright Night event and also taught 2 seminars.
1- We proposed for the first time the siamese networks use for the printer attribution problem and also touched on the open set printer source attribution for the first time.
2- We proposed a solution on siamese networks to authenticate barcodes from an Italian SRL company called Viditrust. Such a solution has promising performance and outperforms their existing authentication system for three reasons: (i) it has one-shot training capabilities, which means they do not need too much data to train; (ii) it has open set capabilities, which means that it can deal with any color of barcode pattern; and (iii) by comparing a suspect barcode with a genuine barcode in our siamese networks, we can perform test augmentation (including more genuine barcodes in the comparison) thus improving classifiers accuracy.
3- We touched on the problem of adversarial machine learning for printer source attribution for the first time. We showed that reprinting and adversarial attacks can easily fool classifiers, but a simple fine-tune of these classifiers with reprinted data can make classifiers stronger against several attacks, not only the attack that was used to fine-tune the classifier.
The expected results and impacts are:
1- We aim to evolve the Viditrust solution on anti-counterfeiting, avoiding constant retraining of their models for every new barcode they have to train. Current results are promising and we aim at helping their systems to include less human effort for classifier retraining for any new anti-counterfeiting products (barcodes) they launch (open-set scenario).
2- Our proposed dataset started further involvement of the digital image forensic community to investigate print and scan attacks against image manipulation detectors. Actually, there is already one work considering the proposed dataset to detect deepfake images also in the rebroadcasted (or printed) domain.
3- With our work in adversarial machine learning, we expect to inspire other researchers to extend the adversarial attacks against classifiers trained with adversarial samples.