EDITO is developing an application for press editors to deliver advanced on-line news services. The system is intended to retrieve press cuttings from a large newspaper database, filtered according to the subscriber-specified subject. It has tools for viewing, marking up and entering articles, for automatic global indexing over all articles and newspapers, and for individually enquiries in several languages. Besides using semantic modelling techniques to give accurately profiled information, the system will offer spin-off services such as access to full back issues from clippings and a searchable on-line database of past newspapers.
The Market Situation
A large number of players have been invading the information highways over the year, to propose a wide range of retrieval tools, from basic compilers (Yahoo, Altavista, Excite,...), to intelligent agents. Most of these are American in origin and pay little attention to mutlilingualism or local concerns.
In the US, press clipping services are now migrating to the on-line market where the electronic press has a 4% share. In the European Union where on-line electronic publishing has been limited to a few servers, four American distributors are planning press clipping services which will access multiple news sources. In France two of the main distributors are offering clippings from about ten news sources but these are only available on Minitel.
In Europe, electronic publishing of news is still an emerging market whereas in the US it is more firmly established. There the nine main news providers supply information from 20 to 30 sources, but very few in full text; they have between 100-600,000 subscribers each and pay the newspapers a commission of 10-20% of their revenue. This is growing at about 20% annually. The distribution of the press electronically has not made a significant impact on the sales of newspapers, instead it is considered as an additional source of income.
Also in the US, the online press clipping market is growing very fast at 40% annually with two companies already having 25,000 subscribers. In Europe a demand already exists in the English speaking market, but is under developed in the French speaking sector. Large companies are ready to subcontract the tedious task of information collection, while SMEs which are inundated with data, are ready to buy pre-selected information if the price is right. On the Internet the situation is more complex, where about 100 sources are distributed almost free of charge, and in various forms.
The French Market
A first analysis has been carried out regarding the French market for on-line services which provide press articles and information, the results of which can be summarized by 3 figures:
- 2 thousand actual subscribers to electronic press information,
- 2 million white collars with a potential interest in press on-line services,
- 2 billion articles xeroxed every year (most without any copyright attached !).
Electronic press consumption in France remains impeded by a poor level of installed base with only 18% of homes fitted with PCs. Nevertheless, the French online information market is one of the highest in Europe, ranking only second to UK, thanks to the Minitel, which continues to support 25 thousand services, 6.5 million terminals, 80 million yearly connection hours and generates a revenue of more than 1 BECU.
In 95, on-line press information generated 370 MFF net turnover, with a monthly connection level of 200,000 hours. Turnover of newswires agencies (mainly at Agence France Presse) was 760 MFF. The market of Press Clippings Services (estimated at 90 MFF) is definitely growing (25% yearly since 93), but is still underexploited (less than 12% of available selections being sold).
EDITO has focused in the first year on French publishing concerns as a priority and has completed definition, specification, and technical development of a basic Prototype in French, in addition to a market analysis for French press cuttings/clippings.
Potential users of Press Clippings Services (PCS) in France are to be found within two very different categories:
1. Very small enterprises (establishments with 0-5 employees) which number 2.4 million, out of which only 6% are using PCS, and which require a service at a very attractive price (500 FF per month max.) based on simple research (indexes, quotes, or proper nouns),
2. Large companies, which number 23,000, out of which 30% are using PCS (60% of the total distribution), which require a baseline selection of articles to help streamline internal services, and sophisticated selection processes based on user profiles.
The basic filtering process requirements across the market imply use of: language engineering resources and natural language engines as a technological baseline, easy and interactive access through a user-friendly interface, support for emerging networking technologies, and wide or complete coverage through international multi-source handling.
Implementation of a first prototype system has been completed by integration of a development platform including the full-text engine (TOPIC), an indexation engine (ALETH), an RDBMS (SYBASE), and a terminology maintenance station, with remote access to the database for retrieval through the French Minitel. Development has also involved lexical and semantic analysis of a corpus made out of all articles that have been published by the French partners in the consortium throughout October 95, preparation of a test corpus including a multi-source database and collection of a list of requests for validation purposes.
The prototype includes an operating model of the future core system with basic acquisition, indexation, and query engines. It uses generic linguistic resources, a limited terminology (intended for the validation tests), a monolingual natural language query interface in French, and two user interfaces (Minitel and an emulated terminal on PC).
The Way Ahead
At the start of 1997 the EDITO consortium suffered a takeover of two partners by a large conglomerate with interests outside of the scope of the project. In addition, the outcome of a recent court case threw into doubt the feasibility of the original project objectives from a legal (IPR) point of view. As a result the group were unable to sustain development according to the plan for the year and were forced to withdraw EDITO from the Telematics Applications Programme.
The experience gained during the lifetime of the project however remains of value to the partners involved, especially in the areas of requirements, market awareness, the US market lead and potential gains in the European market. This experience would also be valuable to any future involvement in the development of similar information services.
Relevant and valuable information access is a definite need for anyone, be he a professional user (e.g. a researcher at a SME who needs to retrieve data out of many sources and distribute it internally), or an individual.
Press Editors are among these professional users : journalists and researchers often request information from external data bases, in order to build exhaustive files which they are defining through precise - and personal - profiles.
As of today, a large volume of published information - more than 5 thousand newspapers - is available on-line, but through independent and heterogeneous applications. Very few sources are full-text searchable, and none provides for multi-source or multilingual requests.
Three products will result from the project: Automated Press Clippings, Press Electronic Distribution, and Documentary Data Base, with the aim of being multi-source, and providing for full-text and natural language requests, in several European languages - French, English, Dutch, and Italian, at their respective development stages.
Progress and results
The first product to be issued will be an Automated Press Clippings application in French. The first prototype will also demonstrate an interactive selection of press clippings based on user profiles expressed in French and Dutch.
The second phase will provide for full Dutch and Italian interrogation, connection to European bases in Belgium and Italy, and availability of preliminary functions for Electronic Press Distribution, and Documentary Data Base.
During and after completion of the project, marketing and technical information, as well as demonstration packages will be made accessible from the Web at www.axime.com.
Demonstration of the project will involve user groups, who will be selected from individuals and professionals representative at testing :
- ergonomy of the research position to the consultation server,
- adequacy of the selected clippings as compared with a given profile.
User Groups shall comprise both a Core User Group, and an Interest Group. The former will consist of personnel from members of the Consortium. The latter shall include a broader range of personnel to ensure the applicability of the system to a wide range of domains (administration, communication, financial, etc.).
Rapid extensions of the system will be pursued through addition of new sources (i.e. other press editors to join the Consortium), new languages, and new services (such as TV and radio transcriptions). These additions will be facilitated by connection to similar servers throughout Europe, which the Consortium is clearly anticipating.
Funding SchemeCSC - Cost-sharing contracts