Skip to main content
European Commission logo print header

Develop an innovative computer system which understands contractual documents by applying artificial intelligence

Periodic Reporting for period 1 - Computer Linguist (Develop an innovative computer system which understands contractual documents by applying artificial intelligence)

Reporting period: 2019-12-01 to 2020-11-30

Customers are facing several challenges regarding contract management due to contract volume and heterogeneity as well as growing requirements from stakeholders. Contract managers have to mange hundreds of contracts from different vendors with different renewal dates. Contracts have a non-standardized format and are mainly paper-based. There is no universal language for contracts and a lack of standardization in wording for similar terms. Contract management is a manual and work-intense process, the data accuracy is a concern as well as the timeliness. It is difficult to conduct effective analysis and management of contracts.

Contract manage every aspect of businesses and a large aspect of our private life. Gaining a better understanding of contractual terms and conditions a well as key information for management is a strong need, especially for non-lawyers.

The overall objective of the project is the development of several smart legal services, which understand contractual language and extract key information.
During the project the innovation associate held various workshops with end users to understand the business problem, the context and the required end result. The intermediate result was a mockup of a website and a specification of a smart service. A metadata extraction service has been created and deployed in an NLP platform and web service that automates text data from contract documents so that our customer may retrieve desired information at ease. Metadata extraction is done by using an information extraction method called Named Entity Recognition (NER). We have worked with three methods for named entity recognition i.e pattern lookup using regular expression, rule based approach, and machine learning models. We then describe an actual system for finding named entities in contract document and evaluate its confidence score. Finally, we created web application that was deployed for the most optimal model using AWS ECS with docker containers. Named Entity Recognition consists actually of two substeps: Named Entity Classification and Named Entity Identification. We applied text preprocessing framework as given by using tokenization, normalization and noise removal. The model has been trained with labeled test data, which has been prepared together with a customer. The model has been implemented in the Python library spaCy by using pattern matching, rule based approaches, machine learning and a hybrid approach. Afterwards the model has been evaluated on validation data. Finally the model has been deployed in the cloud, using ECS, Fargate and Flask.

The main results are a working metadata extraction service with a clearly defined API and a prototype frontend, which can be used vor validation, testing and prototyping of the service. The results have been properly checked into a code library and documented for different stakeholders.
The project has not progressed beyond state of the art. The accuracy of the model was not sufficient for the customers for productive use.
Service API Results
Architecture
Prototype Frontend