Worldwide Intelligent Semantic Patent Extraction & Retrieval

CORDIS provides links to public deliverables and publications of HORIZON projects.

Links to deliverables and publications from FP7 projects, as well as links to some specific result types such as dataset and software, are dynamically retrieved from OpenAIRE .

Exploitable results

Patent archives contain bibliographic data, free text and patent drawings. Several products exist that allow search and retrieval of specific patents on the basis of their bibliographic or textual content. But until now, patent drawings were an unexploited resource in terms of tracking prior art, novelty and the infringement of intellectual property. As part of the IST project WISPER, a UK research group at BMT have produced the first search engine that allows patent drawings (henceforth images) to be compared to a target image submitted by a user. The system is automatic and does not require the textual annotation of images by hand. We will firstly describe what the data looks like. Patent drawings are typically black and white line compositions with few areas of uniform texture. A drawing will often be contaminated by extra numbers and lines that relate object components to a key in the text body of the patent. The drawings are sometimes generated by machine but overwhelmingly they are hand drawn in an ad-hoc fashion. The image content is frequently meaningless on its own; objects cannot be easily named or segmented. In addition, the conceptual space covered by patent archives is vast and there is little uniformity of object representation within classes. Variation due to scale, translation, rotation and occlusion is common. The noise is such that different human searchers will return varying result sets when faced with an identical search task, even when working with small archives of 500 images. All of these factors added to the technical difficulties of searching such an archive and worked against the use of conventional classification or object based solutions. Our group approached the task by using a dedicated parser to extract the images and then a series of steps to standardise the images in an amenable format. Then, a sequence of novel treatments of each image was devised to simplify, fingerprint and then index each image. The precise nature of the fingerprint method used is commercially sensitive. Search results of the WISPER IPM (Image Processing Module) significantly outperform those achieved by human searchers. Crucially, large image archives cannot be adequately searched at all by humans. Large in this context is of the order of tens of thousands of images. Such search is visually disorientating for human, takes unacceptably long times to complete and is error-prone. Therefore, any system that can produce results at a level higher than chance is an improvement. For multiple searches, done by multiple users, the WISPER IPM performs at 3.25 times better than chance The WISPER IPM has succeeded in reducing the variability in an originally very noisy dataset. Some irrelevant results are always returned but these can be easily and quickly ignored. The current status of the WISPER IPM is that of a laboratory demonstrator. Each image takes around 10 seconds to index and this is done just once per image. Once loaded in memory, it can search clusters of up to 50,000 images in a few seconds. Our code and hardware is not optimised at all. We estimate that the GlobalPat patent archive produced by the European Patent Office contains about 2.5 million images and hence there is a large potential for WISPER to become the first pan-European patent image search engine. The WISPER IPM methodology should be readily applicable to other large archives with similar data, in any domain in which it is unreasonable to expect humans to either annotate or explicitly search thousands of images.

Searching for OpenAIRE data...

Exploitable results

Share this page Share this page on social networks

Download Download the content of the page