Periodic Reporting for period 1 - EUCLCORP (European Union Case Law Corpus: creating a multilingual and searchable corpus of case law from EU member state courts and the European Court of Justice.)
Période du rapport: 2016-07-01 au 2017-12-31
BACKGROUND
During the course of research on the ERC-funded 'Law and Language at the European Court of Justice (LLECJ)' project, a gap in the resources currently available to analyse the case law of that court became apparent. First, while many excellent multilingual databases relating to EU law exist, there was no resource that allowed users of EU law easily and comprehensively to compare the meanings of legal terms across EU languages and member state legal systems. Secondly, while the influence of the ECJ on national member state law is well-documented, influence also flows in the other direction: from member state to EU law level. The special connection between ECJ and national courts allows legal terms and concepts to migrate in both directions but currently there is no resource which allows users of EU law to track the migration of such terms and concepts. The EUCLCORP was thus designed to develop and test an innovative corpus, which would address that gap.
PROJECT AIM
To develop and test an innovative EU Case Law Corpus (EUCLCORP), which will be a standardised, multidimensional and multilingual corpus of the case law of the Court of Justice of the European Union (ECJ) and of the constitutional/supreme courts of EU member states.
RESULTS
1. The project has achieved its objective by developing and testing EUCLCORP, which consists of all electronically available judgments produced by the ECJ across 23 EU official languages (all official EU languages except Irish) as well as judgments from seven EU national constitutional/supreme courts (France, UK, Italy, Portugal, Spain, Finland, Czech Republic).
2. The EUCLCORP project has also developed a design framework and tools for a systematic representation of court judgments. This is significant for three reasons: (a) because different national courts have developed different standards which typically do not correspond to each other, which makes comparison of those courts' judgments on a systematic level difficult. As a result of the framework developed within the EUCLCORP project, all of the judgments included in the corpus are documented in a standardised way; (b) that standardised documentation allows the same search strategy to be used across all judgments, regardless of languages and courts; (b) with some minor adjustments, the framework can be applied to any kind of judgments. This means that judgments from more courts can easily be incorporated into the existing system and made available for searching via the EUCLCORP web interface.
Details of the EUCLCORP project, including links to the resource itself can be found at: www.llecj.karenmcauliffe.com/euclcorp
HOW DOES EUCLCORP WORK?
Unlike databases, in which users can carry out only relatively straightforward searches, for the occurrence of specific terms or keywords, corpora allow users to search and track how particular linguistic expressions and features are used in context. This means that EUCLCORP allows users to extract words and phrases in context, discover how those words and phrases are used by the ECJ and national courts and get a sense of what they really mean. at the heard of the corpus approach, which underlies any search within EUCLCORP, is the idea of collocation. Collocations are frequently co-occurring multiword expressions that build units of meaning. All languages consist of units of meaning and EUCLCORP allows users to identify units of meaning in the context of European Union case law. Because the focus of EUCLCORP is on meaning, it can arguably provide more valuable terminological information than a dictionary or terminology database. EUCLCORP can, in particular, be used to create bespoke terminology databases based on words and phrases that are relevant for the individual user.
EUCLCORP allows users to perform complex terminological and phraseological searches in judgments, based on Corpus Query Language (CQL). This allows users to search for very precisely defined expressions, which can include individual words, multiword expressions and complex grammatical patterns. Results can be shown either at the sentence level or in combination with other co-occurring words, this is in contrast to current database resources, which produce results at the whole document level. Specific functions include:
1. Lemma searching: users can search for all forms of a particular term using the lemma function. For example, a search for the term ‘see’ using this function [lemma=‘see’] will produce results including all occurrences of the verb ‘see’ in all of its forms: ‘see’, ‘sees’, ‘seeing’, ‘saw’, ‘seen’ etc. This is a much more precise method of searching than trunctation/stemming search functions used in databases.
2. Complex queries: the use of CQL makes it possible to identify multiword expressions associated with particular terms. For example, a user may wish to find out how the verb ‘exclude’ is used in a construction containing ‘from’ + a noun (i.e. how the expression ‘exclude…from X’ is used). The relevant query ([lemma=”exclude”] []{1,3} “from”) produces the following expressions in ECJ case law: ‘exclude any other person from enjoyment of such a right’, ‘excluding goods from the system of deducting VAT’, ‘excluded from benefitting from old-age insurance’, exclude an economic operator from a procedure’. Again, this is a more precise and targeted method of searching than can be done in a database.
3. Collocation analysis: this function allows the user to identify the context in which expressions most frequently and most typically occur. For example a search for collocations of ‘create’ within ECJ judgments produces: ‘obstacles’, ‘impression’, ‘confusion’, ‘uncertainty’, ‘risk’ and ‘inequality’. This function can be valuable to very quickly identify how terms are typically used across all judgments. This functionality is not available in existing database resources.
4. Parallel concordance lines: this function allows users to specify a search term in a source language and then to identify all sentences that contain translation equivalents of the term in a target language. This allows users to not only identify translation equivalents across judgments, but also to see the context in which those terms are used.
5. Search within sections: users can restrict searches to specific sections, (e.g. ‘Grounds’), a function that is not available in current database resources, and to specific year ranges (e.g. 1977-1995).
POTENTIAL APPLICATIONS
The various functions described in the section above may be useful in practical ways to translators and/or terminologists. Some of the applications that our team has identified include:
- By identifying typical expressions in the case law associated with particular terms, collocation analysis can provide users with a tool to create detailed terminology databases, or to update current terminology databases with contextual information.
- Collocation analysis of the same term across both ECJ and national court judgments allows users to identify typical usages of those terms by the ECJ and within national legal systems. This may highlight potential areas of confusion where terms are used differently in different systems and contexts, and can thus inform terminological/translation choice.
- Parallel concordance lines are useful for identifying translation equivalents across languages and comparing different translation options in context.
The EUCLCORP was delivered on schedule and under budget on the 31st of December 2017
 
           
        