Community Research and Development Information Service - CORDIS

Abstract

In 2004, the inter-institutional Patent Statistics Task Force decided to create a Worldwide Statistical Database under the acronym of PATSTAT, which has to be understood as one single patent statistics raw database, held by the EPO and developed in cooperation with the WIPO, the OECD and Eurostat. The document presents a method for cleaning and harmonising the names of patent applicants within PATSTAT. This method was defined on the basis of already existing approaches. Its main working steps are related to the character cleaning and standardizing, legal form indication removal, non-significant character removal, approximate string searching, keyword searching etc. The application of the method leads to a considerable reduction of name diversity after name cleaning. This improves the data quality of the patent raw data and the aggregated patent statistics thoroughly.

Additional information

Authors: MAGERMAN T, European Commission, Eurostat, Brussels (BE);VAN LOOY B, European Commission, Eurostat, Brussels (BE);SONG X, European Commission, Eurostat, Brussels (BE)
Bibliographic Reference: Luxembourg, Office for Official Publications of the European Communities, 2006. Various paging, free of charge
Availability: http://bookshop.europa.eu/is-bin/INTERSHOP.enfinity/WFS/EU-Bookshop-Site/en_GB/-/EUR/ViewPublication-Start?PublicationKey=KSAV06002 (Catalogue Number: KS-AV-06-002-EN-N)
ISBN: ISBN: 92-79-02500-7
Record Number: 200719449 / Last updated on: 2007-10-15
Category: PUBLICATION
Original language: en
Available languages: en