Broadening the Bioinformatics Infrastructure to unicellular, animal, and plant science

Final Report Summary - ALLBIO (Broadening the Bioinformatics Infrastructure to unicellular, animal, and plant science)

Executive Summary:
Over the past decade, the world-wide scientific community has spent many billions of Euros on determining and deciphering the human genome. In this process, mankind learned much about itself, but the original claims (like, for example, president Clinton’s remark that cancer soon would be a disease of the past) were over-optimistic. Nevertheless, the human genome project produced a great reservoir of knowledge and a battery of excellent software. The AllBio team believes that many of the computational facilities of this human genome endeavour can also be put to good use in fields that deal with unicellular, plant and (farm) animal genomes. AllBio is hence developing a coordinated action plan for the use of ‘omics’ data related to these other species, using what has been learned in the human genome project. Obviously, a good fit with pan-European ELIXIR activities is one of the major guiding principles for the AllBio initiative, and the experiences gained in recent FP6 Networks of Excellence (like EMBRACE, BioSapiens and ENFIN) will be put to good use. The AllBio action plan comprises a series of consecutive steps, commencing with the identification of bioinformatics challenges faced in the fields of unicellular, plant and (farm) animal genome research. Research communities will be built consisting of life scientists and bioinformaticians who, together, will study the scientific problems, and define the actions needed to solve them. Where needed, AllBio will teach courses to enhance awareness of genome problems among bioinformaticians, and of solutions that might exist already among life scientists. At a higher level, AllBio will also seek to identify common denominators between the bioinformatics needs of its various communities.
After several events for community building to get an overview about state-of-the-art of the bioinformatics in the non-human bioinformatics life science areas, the AllBio partners collected “test cases” to identify gaps and existing challenges in the bioinformatics field from the user communities. From the ~60 test-cases collected via questionnaires and interviews, 14 test cases (encompassing unicellular organisms, plants and farm animals) were selected to be addressed in so-called “hack-a-thons”. A hack-a-thon is an event in which software – that is bioinformatics - developers convene to work together on defined problems and challenges and produce new software tools and webservices. In AllBio we used this approach of hack-a-thons to work on the test cases defined and provided by the user community. These hack-a-thons have been extremely successful and produced not only new software solutions, but also provided an excellent basis for cross-disciplinary interaction: to integrate existing knowledge, to join “wet lab” and “dry lab” experts, to provide hands-on training in bioinformatics, and enforce cross-disciplinary interaction.
The scientific communities could promote and implement the hack-a-thons as a routine activity e.g. in the educational system or for PhD students in the life sciences. Encouraging interdisciplinary interaction in the hack-a-thons could leverage the existing resources and avoid duplication of efforts.
The consortium developed guidelines as recommendations how to organise hack-a-thons based on the experience in AllBio and will publish these in a scientific article to make the information available to the life science communities.

Additional major achievements in AllBio are:
- The AllBio partners organised 53 events, workshops, seminars, training schools and tutorials over the period of three years with a total of nearly 2.000 participants. The participants came from 36 countries, among them participants from 8 non-European countries.
- 13 publications were produced, from which 10 are published in peer-reviewed journals with 2 additional publications still in the pipeline
- 3 software packages have been newly developed in the frame of the hack-a-thon activities
- 3 new initiatives were started that lead to sustainability of the AllBio efforts
- Ready-to-use bioinformatics workflow solutions in the form of Virtual Machines were produced
- Launch of the AllBioCatalogue with > 4.000 entries
- Links created with > 30 European life science projects and initiatives
- Piloting of hack-a-thons as successful instruments to develop software solutions and creating synergies between “wet lab” and “dry lab” experts

Project Context and Objectives:
AllBio's primary objective was to reflect on the FP7 call “KBBE.2011.3.6-02: Supporting the development of Bioinformatics Infrastructures for the effective exploitation of genome data: Beyond health applications". AllBio intended to develop a coordinated action plan for the harvest of the information from 'omics' data related to other species than Homo sapiens.
The AllBio partners created links to other European networks and initiatives and worked on the facilitation and stimulation of the exchange of data, protocols, software, experiences, and ideas throughout all related bioinformatics and life science fields. The AllBio explored opportunities for new communication routes to promote species-independent bioinformatics, which will distribute knowledge and expertise via concerted education and publication.
The initiating vehicle for achieving the objectives was to start with community building efforts in their close environment and were looking for angles how these different life science communities can be extended towards applicability in many life science fields. The partners identified and tested ideas how to make software and databases/information collections more generic and better interoperable.

The successful concepts resulted in the following achievements:
- The community building events have been a successful element to discover needs and challenges; the AllBio efforts also approached national and European life science activities and entered an overall discourse how these initiatives can be encouraged to create more links between each other. As the number of activities is huge, it will need a more strategic approach on a middle- to long-term scale to bring them all together to create synergies, avoid duplication and create the impact we aim for in the European context. This will need also a more intensive engagement of funders (and suitable incentives) to encourage this interaction in the future.
- The BioCatalogue was extended by the collection of around 4.000 new links mainly from the non-human life science fields and published as the AllBioCatalogue; the next essential step is now to integrate this in pan-European infrastructures to ensure sustainability and continuous curation.
- The EDAM ontology that was created mainly on the basis of the human life science field was challenged to be transferred to other life science fields to make it more generic and interoperable among all life science areas.
- The concept of hack-a-thons has been tested and validated as an excellent tool to foster interaction between all life science fields and to engage “wet lab” and “dry lab” scientists to work more closely together on practical solutions.
- The new concept of “brain-a-thons” evolved from the hack-a-thon exercises that has the potential to support intelligent project design and make life science projects more efficient in respect to outcome and re-usability of research data.

Indicators/Measurable objectives:

For the AllBio project the following measurable indicators have been defined to demonstrate the success of the project:
(i) Number of external collaborators: Via the community building activities, more than 1.000 researchers could be informed about AllBio and invited to participate

(ii) Number of well described test cases: More than 60 test cases have been collected from which 14 have been selected to be included in the AllBio hack-a-thons.

(iii) Number of registered tools, services and databases in the inventory: For the AllBio Catalogue around 130 webpages have been analysed. More around 4.000 new tools, webservics and databases were included, around 5.500 links relate to general and generic tools, whereas nearly 2.000 links refer to organism-specific information.

(iv) Strong coordination through a series of dissemination activities (in the DoW at least 20 workshops in total were anticipated): Overall 27 community building events have been organised by the AllBio partners, more than this number of workshops and events have been visited by the partners to disseminate the information about AllBio.

(v) Coordination and harmonisation with other international activities: Over the duration of the AllBio project more than 30 European projects and initiatives expressed their interest in cooperation with AllBio. The interaction with BioMedBridges, ELIXIR, GOBLET, IMG.ORG and ISBE was most intensive and is the basis for future joint actions.

In the original project application DoW a set of indicators were formulated that should demonstrate the efficiency of the AllBio activities within the first 12 months:
- number of external collaborators: 40; AllBio reached more than 1.000 collaborators
- number of well described test cases: 40, of which at least 10 will enter in WP3 for solution; more than 60 test cases have been collected, from which 14 were of such a good quality that these could be submitted to the the hack-a-thons.
- number of registered items in the inventory: 200; for the AllBioCatalogue registry, more than 8.000 entries were collected.

Project Results:
These fields are rapidly gaining importance, not least because the growing world population requires greater food production at a time when increasing concern about the environment requires that increased food production must be associated with reduced use of fertilisers, fungicides, herbicides, insecticides and importantly also, water; scientists are consequently also researching the possibility of growing crops under a wide variety of harsh conditions in terms of drought, salinity and extreme temperatures. Society demands that farm animals should be reared using fewer hormones and fewer antibiotics, and with optimal nutrition. A deep understanding of the functioning of bacteria is rapidly gaining importance in fields as diverse as human health, digesting plant material for energy production, and the fermentation of food. The research aimed at such lofty goals begins with a solid understanding of the information contained in the genomes of the species involved – and bioinformatics holds the key to this understanding. AllBio will chart this exciting landscape, and will ultimately draft a plan for the bioinformatics needed to support future European genome research in the unicellular, plant and (farm) animal fields.

WP1: Community building

The main objective of WP1, Community Building, has been to build a joint platform/ community of life science bioinformatics developers and users to identify demands and needs for the future. Test cases defined by interviews with the user community served as a basis to identify the main challenges in life science bioinformatics.
The first analysis of the community-building events revealed a great need for new and integrated bioinformatics tools and databases in the areas covered by this project. The identified communities submitted more than 60 test-cases during the first nine months of the project.
Local communities were built in several fields to further increase our joint insights in the scientific needs for the near future. Most of the activities directly or indirectly aimed to shed light on what were from the AllBio onset seen as the main topics of concern:

Fig. 1: Overview about the main topics of concern indicated by the life science user communities where solutions are needed.

Community building is an essential first step as all areas developed more or less independently from each other. The communities are already large and will further grow as the life sciences are a fast developing field based on the new emerging technological possibilities for the analysis of biomolecules. Since more than 10 years also systems biology evolved as a new area in the life sciences that builds on (functional) genomics and creates models to understand living systems and allow predictions how a cell, a tissue, an organ or whole organisms will react.
From AllBio, but also from the interaction with other projects and initiatives – e.g. the European infrastructures – we understand that the life science fields (including the respective bioinformatics) developed into silos that have only minimal true communication and connection between each other. This is true on one hand for the analytical methods where the number of available techniques increases steadily, and we need experts who can conduct the experiments. With the availability of a huge amount of new analytical methods we also generate a deluge of data that need to be managed and analysed. The field of bioinformatics is exploding, and the field also divided into specialised areas, e.g. statistics, data management or sequence annotation.
In this respect, the AllBio partners realised major challenges that need to be addressed in the near future to accelerate the interaction between the life science fields and between developers and users of bioinformatics tools and webservices:
There is a gap between the “wet lab” and “dry lab” communities: There is a need for more intensive interaction between these two principle communities
- to allow users of bioinformatics tools an active participation in the intelligent choice of the tools they could use to analyse their data
- to allow the users a certain degree and independency from the bioinformatics specialists to use and apply the tools
- to faster identify needs occurring in the daily work that can only be defined by the users
- to faster transmit these needs and challenges to develop solutions by the bioinformatics experts
- to develop appropriate solutions tailored to the users that new tools are intuitively understandable and can easily be applied

Based on the AllBio experience we came to the conclusion that teaching and training is the key to bring these communities together. The education system in universities dedicated to biology normally do not foresee mandatory courses in bioinformatics. Biology students often avoid this training burden with the result that despite the fact that the use of bioinformatics tools is a major part in modern biology research, many PhD students or even postdocs have no sufficient understanding of the mathematical analysis of their biological data. The teaching and training of bioinformatics must become a mandatory part of the educational system in Europe.
Teaching and training in bioinformatics must also be an integral part of the professional life of biologists. New tools and webservices are created with a high dynamic, and it is a challenge even for researchers in later career stages to keep up with this pace. Initiatives like GOBLET are essential to support the continuous training and information flow between the bioinformatics field and the user communities. But Europe needs a more strategic concept how this constant training and use of the existing material and information resources can be achieved in a more systematic way. a solution can only be provided if we join with already existing activities and support the interaction between these projects. A more structured approach would be very efficient for Europe. The promotion of existing training opportunities in the scientific communities must be a priority. Therefor ALLBIO participated in the creation of a Global Education and training initiative that will work beyond the end of the ALLBIO project.

There is also the need for a more systematic approach to increase the interaction between the experts of the “wet lab” and “dry lab” areas.
The first experiences with the test-case collection and the interaction with the users revealed that it is of crucial importance to connect both bioinformatician and users of bioinformatics tools in much better ways than exist at present. The situation we face is that users often accept unnecessary difficulties and circumstances, and are not aware that there might be problems that can be discussed with specialists to find solutions. On the other hand, experts in bioinformatics have no clear feedback on the needs and gaps in the user environment, as experts and users work in disconnected ways. It needs much more intensive communication to unearth the appropriate information, and to open the minds of users to a kind of “wishful thinking” to help developers to identify what is needed. A longer time was needed to acknowledge these circumstances, and to react and adapt the dissemination activities to this new awareness of the existing problems.
Young researchers might be an excellent group to start with, as they are still open for new opportunities and challenges. To participate in a kind of “think tank” activities could be a great opportunity for young researchers to develop their careers; incentives must be defined to attract participants for these specific activities.
The hack-a-thons we organised during the AlBio projects could serve as a model to promote the communication and interaction between “wet lab” and “dry lab” scientists by working jointly on solutions. For the whole pipeline from generation of ideas until the development and validation of solutions both types of experts are needed. With the hack-a-thons performed in AllBio we could demonstrate an efficient pipeline from raising awareness, identification of problems until the development of real solutions. Besides the creation of new tools the participants of the hack-a-thons were also successful to publish their project results which is a valuable contributions to their scientific career.
The scientific communities could promote and implement the hack-a-tons as a routine activity e.g. in the educational system or for PhD students in the life sciences. Universities might be able to give credit points for participation in hack-a-thons, hack-a-thons could be a routine element in PhD programs, ITNs etc. and offer these as training events. The AllBio partners intend to proceed with this activity after the end of the project.
The hack-a-thons could also serve as an excellent basis for integrating knowledge from different life science fields. The missing interaction between the different areas of the life sciences seems to be a problem that is difficult to overcome. Encouraging interdisciplinary interaction in the hack-a-thons could leverage the existing resources and avoid duplication of efforts.
In total, the AllBio partners organised 27 events around community building bringing together ~ 1.050 participants from 29 countries all over Europe and beyond.

WP2: Inventory and roadmap

The objective of WP2 was to generate a catalogue of bioinformatics tools and services that are available in the different fields of the life sciences (AllBioCatalogue). Based on this inventory, tools missing in the life science fields that are necessary to solve the test cases were identified and represented a basis for the principle understanding for a roadmap defining needs in bioinformatics tools and services.
A comprehensive search for tools, databases and services was conducted throughout the project, focusing, in particular, on those relevant to animals, unicellular organisms and plants, and to those required for the completion of the test-cases outlined in WP1. The results have been further organised into smaller groups, based on function.
The final list consists of ~4,600 tools, Web services and links; these are published on the AllBio website, made available to the scientific community from November 2014:
http://www.allbioinformatics.eu/doku.php?id=public:tools_and_services

The first idea was to collect information separately for animals, plants microorganisms and then from the human field. The AllBio partners (mainly SLU, partner 1) searched, in total, nearly 130 websites and links to gather the information. The distribution of the organism-specific websites across the three indicated areas is illustrated in Figure 1, as follows:

Fig. 2: Organism-specific websites and portals were analysed for available tools, Web services and databases.

The tools and Web services from the search are grouped as follows (duplication was allowed, owing to overlap of some fields/topics):
a. General tools and Web services: 5,561
b. Organism-specific tools and Web services: 1,940
• Human: 442
• Model organisms (incl. non-vertebrate animals): 1,084
• Livestock/animals: 176
• Plants: 203
• Unicellular organisms: 71

Detailed analysis of the available resources demonstrates that the majority of the tools, Web services and databases are designed for general use across all life science fields. For future development and building of portals, it is be recommended to provide more detailed descriptions of their resources, in order to help users to identify more swiftly and efficiently the most appropriate tools for their respective models/organisms or experimental setups.
Organism-specific resources will rapidly increase in number. For the time-point at which the majority of the information was collected (mid-term 2012), it was surprising to find that very high numbers of resources were already available for non-human organisms (see summary in Figure 2).

Fig. 3: Overview of the tools, Web services and databases for specific organisms in the life sciences.

The development of tools, Web services and databases is very dynamic, owing to rapid developments and advancements both across scientific disciplines in general and in analytical technologies in particular. This is evident from the large number of the tools, databases and services offered via the Internet. Therefore, for users, it is virtually impossible to maintain any kind of realistic and/or representative overview of the most recent developments across all the different areas of the life sciences. This is true even for bioinformatics experts. The information is fragmented and dispersed over vast numbers of websites. It remains a challenge, and a task for a long-term infrastructure, to develop a systematic approach to collect information from all these different sources, and provide a comprehensive and continuously updated resource. The European Infrastructure for Biological Information, ELIXIR (www.elixir-europe.org) and the Global Organisation for Bioinformatics Learning, Education and Training, GOBLET (www.mygoblet.org) are major recent initiatives that provide excellent and broad technical and training foundations for underpinning such an approach.
From the collection of information about bioinformatics tools, Web services and databases across the life sciences, the following observations were made (see also chapter 1: Search History):
- the development of new tools, Web services and databases is rapidly evolving and increasing in number, owing to advances in analytical technologies;
- information about bioinformatics tools and Web services is dispersed and incomplete;
- for most links that we searched, there was no adequate description either of the purpose of the website or of the information and links provided;
- for most tools and Web services, there is scarce or inadequate information about them;
- often, descriptions of tools and Web services do not indicate sufficiently clearly for which organisms they were created and/or are intended to be used;
- many tools and Web services are intended to work across all species (or are not specified), and are therefore not well described;
- some websites with interesting and relevant tools are not in English, and therefore limit the use of their information;
- there is a lack of overall strategy for providing comprehensive information to users and experts.

Based on the efforts we made in AllBio to generate a comprehensive inventory, we summarize our experiences as follows:
1. Europe needs a joint strategy for the collection of bioinformatics information and how to make that information available for users;
2. Europe needs a strategy to raise awareness of the problem of dispersed information resources across the life sciences, and an intelligent dissemination strategy to encourage developers to provide their information via a common infrastructure;
3. Developers need to decide on a minimal standard of meta-information for their tools/Web services/databases; this standard needs to be disseminated to other developers;
4. Infrastructures like ELIXIR and GOBLET need long-term perspectives for financial support to develop and disseminate bioinformatics information needed by European life scientists;
5. There is a need for a regular “think tank” activity to foster interactions between bioinformatics developers across all life-science fields.

WP3: Life Science Bioinformatics Projects

The objective of WP 3 was to introduce the selected “test cases” from WP1 to the bioinformatics community and find together with them solution for these test cases. The newly developed solutions were then validated under the coordination of WP5.
The AllBio Consortium collected ~60 test-cases via questionnaires and interviews, of which 15 (encompassing unicellular organisms, plants and farm animals) were deemed solvable with adaptations to software originally designed for working with human-genome data. Ultimately, eight were considered suitable for subsequent hack-a-thon sessions – the selection process is depicted in Figure 1.

Figure 4: Hack-a-thon workflow illustrating the fate of a test-case proposed by life scientists. Following a first phase of interviews, test-cases are collected, and various selection rounds determine (‘validate’) those suitable for subsequent hack-a-thon sessions. The hack-a-thons involve teams comprising the proposer (life scientist), a leader (bioinformatician), ‘hackers’ (bioinformaticians/
computer scientists) and, ideally, a coordinator. If a tool or meta-tool arises from the work, it is proposed for testing during a validation workshop. Ultimately, the team prepares a freely available tool, writes a publication, and optionally performs other disseminate activities.

Six of these eight problems were solved, two of which led to publications. Part of the success of each hack-a-thon lay in lessons learned from previous hack-a-thon events. A rigorous regime of evaluation of past events led to the creation of a script for organising successful hack-a-thons – this is available from the AllBio website (www.allbioinformatics.eu).

During the selection process a catalogue of criteria was agreed to help with the selection of suitable test cases for a hack-a-thon. If several criteria are not fulfilled by the proposed test cases, the success rate of hack-a-thons decreases dramatically:
- The problem needs to be clearly defined
- There must be a clear challenge
- There must be a perspective that a solution could be created (“impossible” challenges were excluded)
- The scientists submitting the test case must have a clear motivation to participate
- A data set must be available to work with
- The described problem must not be too complex; it is helpful if the problem can be subdivided into different units
- The workflow must be resolvable into non-linear tasks, so that if one step does not work, progression of other tasks is not blocked
- A test-case should not require extensive computing time (at least not during the hack-a-thon meeting)

Among the test case collection were also 2 cases that were based on projects newly funded and about to start with their practical work. The case providers asked for support to select the best bioinformatics tools to analyse the data to be generated in the frame of their project. During the discussion with bioinformatics experts it became apparent that these projects had a lack in careful experimental design. The planned hack-a-thon was then modified into a “brain-a-thon” session (this term was invented by Erik Alexandersson and Oren Tzfadia, two leading partners in solving the test cases #9 and #12). The main task for the experts in the “brain-a-thon” session was to go through the work plan with the scientists step by step along the pipeline from sample to data and optimize each aspect. This optimization work also included recommendations for more efficient data generation methods than originally proposed in the project.
From years of own experience the AllBio partners realized that the lack of careful experimental design and partner organization is often an underlying problem in cooperation projects, especially when these consortia are working in an interdisciplinary approach. In the new field of systems biology this problem is even more prominent and causes enormous problems when data generated during the project time need to be integrated to build a model. Experts estimate that more than 98% of the data generated in life science projects are of bad quality and are not re-usable due to the lack of careful experimental design and missing implementation of standards, appropriate statistics etc.

Structured hack-a-thons or dedicated “brain-a-thons” could be an excellent activity when a project has been funded and is ready to start. The project partners could ask the help of an expert consultancy to develop a detailed and precise activity plan. This approach will help to
• Identify the optimal experimental approach for the efficient use of project budget and time
• Select the strategy for the mathematical analysis of the data generated (this ensures a smooth interaction and data exchange between the partners, ensures reproducibility of the experiments, easier to publish project results)
• Ensure the adoption of standards and a best practice for data management (re-usability of the data for the broader science community, good publishing strategy, data archiving etc.)
• Precisely define interfaces and interaction between the project partners

In the frame of the hack-a-thon organisation the issue of the accessibility of the webservices was also emerging as an issue. For many approaches in data analysis a number of tools are combined in workflows and need to be available and need to be working at a sufficient speed.
Virtual machines seem the ideal solution for hackathons. “Virtual machines” combine several software packages in an appropriate environment and can be installed on any desktop computer as a complete package. The software tools and webservices from these virtual machines can then be used remotely or locally and work independent from an internet connection. This solution is of great advantage e.g.
- if the internet connection is poor
- if the data to be analysed are confidential, e.g. patient data
The initiative IMG.ORG (www.bioimg.org) was encouraged by the AllBio experience in the hack-a-thons and started to create virtual machines for different bioinformatics needs, fully installed applications and workflows that can be used by researchers in different areas of the life sciences communities.

WP4: Interoperability

When using multiple tools in a workflow one major challenge in data analysis is the lack of interoperability between multiple bioinformatics tools. Tools often have been designed independently from each other and lack appropriate interfaces between each other. The missing possibility for an easy data exchange is often a bottleneck and is limiting the efficiency of data analysis tremendously.
The main aim of the interoperability WP4 was to coordinate the process of making life science-related software packages, services and data resources work together, in a similar way as has been done over the last decade for many human health-related packages. Specific objectives were the identification and harmonisation of existing ontologies that are necessary for interoperability in this field; to establish minimum standards for web service deployment; to provide an on-line, searchable catalogue of resources; and to co-ordinate these activities whilst staying up-to-date with complementary initiatives in other projects.
Two main activities were relevant in WP4:
- To transfer the interoperability achievements that have been made in the human field to other life science fields; as an example the metagenomics field has been selected
- To discuss the interoperability issue in the context of the AllBioCatalogue

A corner stone for interoperability is an appropriate ontology. With the deluge of biological data produced in all life science fields it became a problem that these data and databases have different formats and are embedded in heterogeneous environments. Problems are caused by a diversity of semantics and data organization. Agreement on a specific ontology as a concept has first been developed by computer sciences to describe “..different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration.” . The urgent need - filled by EDAM - is for an ontology that unifies semantically the bioinformatics concepts in common use, provides the curator with a comprehensive controlled vocabulary that is broadly applicable, and supports new and powerful search, browse and query functions EDAM (originally from “EMBRACE Data and Methods”) is an ontology of well established, familiar concepts that are prevalent within bioinformatics, including types of data and data identifiers, data formats, operations and topics. EDAM is a simple ontology - essentially a set of terms with synonyms and definitions - organised into an intuitive hierarchy for convenient use by curators, software developers and end-users EDAM is also suitable for diverse application including for example within workbenches and workflow-management systems, software distributions, and resource registries..
The EDAM ontology approach was analysed for its suitability also for other life science fields than human. RUNMC together with UNIMAN organised a workshop series with experts from the metagenomics field to discuss this issue in detail. Some of the major problems faced by the metagenomics research field are related to storage, interoperability, data exchange, and ontologies. The result of the meetings was a staggering amount of exchange of ideas that directly fed into the metagenomics validation workshop that RUNMC organized in September.
The metagenomics meeting dealt partly with the social problems encountered on the path to acceptance of this (or any for that matter) ontology by the people in the metagenomics field feeding into general considerations around adoption of standards and SOPs in the academic environment.
As a result of the work done in D2.3 and D4.4 the partners came to the conclusion that it would be most useful for the various initiatives – including AllBio – to join efforts with other initiatives, especially with European infrastructures, to ensure a mid- to long-term lifetime of the resources and optimal curation and presentation for the users. This is true not only for the ontology efforts to achieve a broader acceptance and adoption, but also for the AllBioCatalogue registry of tools, webservices and databases. Therefore, the AllBio partners joined the BioMedBridges/ELIXIR initiative to ensure a life-time of the AllBio achievements beyond the life-time of the project.
Only this joint approach will ensure an optimal overview, availability, access and sustainability for the registry and the results from the efforts to create and implement a common ontology framework (EDAM).

WP5: Validation and Training

The objective of WP5 was the validation of the test case solutions produced in the hack-a-thons generated in WP3 for different areas of the life sciences.
The hack-a-ton exercise and the subsequent validation workshops clearly demonstrated a benefit for the scientific communities. The solved test cases prove in an excellent way that it is possible to develop solutions for bioinformatics problems based on hack-a-thons, and that the whole process has high potential to lead to success stories.
Besides the production of valuable software solutions, the hack-a-thons demonstrated a benefit also in other aspects:
• It is very good also for young scientists to participate in these hack-a thons to gain experience and learn to work in a solution-oriented manner.
• The young bioinformaticians also learn how work with non-bioinformaticians. But also lab-based biologists with no or only little background in bioinformatics learn via the hack-a-thons about principles in bioinformatics; they can apply their knowledge later on to plan their projects more carefully and respect the downstream procedures of data analysis already from the beginning of an experiment to produce high quality data and assure an optimal output and use of the data.
An important observation was that the group leader were young scientists taking the opportunity to get in contact with other international groups and specialist to increase there collaborative network, visibility and the possibility of a scientific article.

The fact that any of the submitted TCs describing more or less complex bioinformatics bottlenecks were solved in a frame of 18 month on a volunteer base with the only funds for organizing the ‘getting together’ is a success for AllBio and a strong prove of concept for the proposed strategy to approach bioionformatics problems developed by the AllBio Partners:
➢ Describing the bioinformatics bottleneck
➢ Brainstorming (brain-a-thon) in small group between biologists and bioinformaticians
➢ 1 to 2 well organized hack-a-thons to approach the problem

Summarizing the outcome and observations, the AllBio project demonstrated that with little effort and financial support and the right strategy many still standing bioinformatics bottlenecks can be tackled in a simple collaborative work giving young scientists the opportunity to become partners in a collaboration network of senior scientists and specialists to i) solve the bottleneck but also improve their curriculum. Therefore The AllBio partners heavily support the idea to continue supporting activities such as AllBio for the improvement and efficiency of European research activities, also beyond bioinformatics.
Training workshops addressing specific bioinformatics topics relevant for the AllBio project were organised by the AllBio partners to achieve a better distribution of the available tools, databases and services throughout all life science communities. It was an important task to invite workshop participants ideally from all different life science areas. Via the training workshops the AllBio partners contribute to the dissemination of information about already existing material, contribute to community building, raise awareness for the AllBio activities and ensure dissemination. The training workshops were also instruments for the validation of the test case solutions and the relevant entries in the AllBioCatalogue.
As validation workshops the AllBio partners organized 11 workshops, gathering over 400 participants and over 50 lecturers/teachers, which is a great success for AllBio. The workshops covered all specific bioinformatics topics relevant for the AllBio project; participants could learn the available (sometimes new) tools and databases needed for their research. More specifically, participants from a workshop were the first testers of a tool created to solve one test case. In the future, we should give more time to finish the hackathons, so that more workshops based on project’s test case solutions can be organized.
Overall the AllBio partners organised 26 training events with ~ 800 participants in total from 29 countries in Europe and beyond.
This demonstrates on one side the urgent need of practical bioinformatics training especially in the field of next generation sequencing data analysis but also on the other side the available capacity and willingness in Europe of a broad range of mainly researchers open for such practical training sessions.
Finally, the high number of participants in these meetings is an important contribution to the dissemination of information of existing and new resources in bioinformatics.

Summary

Here we provide the achievements of the AllBio project as a summary of the tables below.
A) Meetings and training activities
In summary, the AllBio partners organised 53 events, workshops, seminars, training schools and tutorials over the period of three years with a total of nearly 2.000 participants. The participants came from 36 countries, among them participants from 8 non-European countries. In the two general categories, Scientific Community Events and Training Schools, we had the following numbers:

Scientific Community events:
- In total 27 events were organised
- In total 1.050 participants
- 29 countries, among them 6 non-European countries

Training Schools/Tutorials:
- In total 26 events were organised
- In total 778 participants
- 28 countries, among them 2 non-European countries

Fig. 5: events organised by the AllBio partners distributed by years. In the first two years of the AllBio project community building had a major priority, in the third year we concentrated on training, validation and dissemination, therefore the number of training schools exceeded the scientific events in number.

From these impressive numbers we draw the following conclusion:

- There is a high interest in cross-disciplinary interaction
- Bioinformatics is a common need for all life science fields and represents therefore an excellent connecting topic to bring together the experts from all fields – also beyond life sciences as ICT is also connected to bioinformatics development
- Training in bioinformatics is an urgent need in all levels of career development, in academia as well as in industry; it needs a strategy development on a higher level to raise awareness among the scientists to get training, but also to provide sufficient training in bioinformatics for all European life scientists. GOBLET is an excellent approach, but needs a long-term perspective with sufficient financial support

B) Publications:
Over the period of the three years, the AllBio partners together with cooperation partners produced successfully 9 peer reviewed publications, all the articles are freely available as they have been published in Open Access journals. Two additional publications are still to be submitted. In addition, 2 articles including work of AllBio were published in non-peer reviewed journals.

C) Software packages:
In AllBio 3 software packages have been newly developed in the frame of the hack-a-thon activities:
• Web-portal to provide virtual machines for omics analysis (www.bioimg.org)
• GOBLET: Web-portal with teaching material on bioinformatics (www.goblet.org)
• ncARENA: an integrated resource for small non-coding RNAs functional annotation (webpage not yet available)

D) Initiation of new activities:
Based on the AllBio project, 3 new initiatives were started that lead to sustainability of the AllBio efforts:
- GOBLET, a global bioinformatics training portal
- IMG.ORG a portal to provide virtual machines to the communities
- The AllBioCatalogue will be part of the newly established BioMedBridges/ELIXIR registry for bioinformatics tools and webservices

Conclusions

From the intensive work within the AllBio project the following observations have been made:

a) To bring together experts and knowledge of the different fields of the life sciences is a long term task that needs a change in the mind-set of researchers; the added value that could be created by using the synergies of closer links between the fields would be amazing and create opportunities for new research topics. There is a need to create ideas how a fundamental change in attitude can be created and how to foster and incentify continuous communication and interaction between the life science fields and between “wet lab” and “dry lab” scientists. .
b) We notified that a missing communication between the existing life science fields increases also in the area of bioinformatics due to the dynamic development across the whole area of bioinformatics and the increasing specialisation of the experts. This development into separated expert fields is becoming even more obvious when looking at systems biology; here the close interaction between many different scientific disciplines is needed to build the models for biological systems. There is a clear need for a long-term activity to bring these different experts together for the benefit of all.
c) The hack-a-thon approach – including test case definitions – has been an extremely successful concept; the consortium realised a positive effect on the following levels:
- Awareness was raised among the “wet lab” scientists/users of bioinformatics tools and webservices to notify software developers about upcoming needs or problems they face in the daily routine work
- Standardisation is an important issue that needs to be addressed more in the software development and in the information for the users of software tools and webservices; hack-a-thons could also support the adoption of standards (e.g. EDAM ontology)
- Hack-a-thons is an excellent strategy to bring wet lab and dry lab scientists together
- Software tools resulting from a hack-a-thon activity provide excellent opportunities for young scientists to produce publications
- Hack-a-thons could be developed into a tool to promote the cooperation between human and other life science fields to work together on common solutions
- The hack-a-thons can be excellent incentive to inform young scientists about existing bioinformatics software and handling of experimental data
d) We need more interaction between different funded projects, initiatives and existing infrastructures; the interaction between AllBio and the COST Actions from the biomedical field have been very much appreciated by the participants and resulted in the identification of common needs in respect to bioinformatics. Training in bioinformatics was expressed as an urgent need across all areas of biomedical research. This need was taken up and pushed the establishment of GOBLET, a bioinformatics training platform for the life sciences.
e) The hack-a-thons revealed also another general need in bioinformatics: The bioinformatics tools and webservices are often distributed over a huge amount of webpages and portals, and only usable to a full extent if there is a sufficient Internet connection. If the Internet connection is poor or if sensible data (e.g. patient data) are involved, it would be a good solution to have the relevant tools and services installed remotely on desktop computers. The IMG.ORG initiative develops solutions for this problem providing virtual machines that can be downloaded and installed on desktop computers or on servers with virtualisation capability and then used without any internet connection. These virtual machines integrate various tools in analytical pipelines in a convenient environment and are easy to handle for users.

Outlook

a) Hack-a-thons and brain-a-thons could be further developed as strategic tools to develop practical solutions for user needs in bioinformatics, involve all communities, and create better links between “wet lab” and “dry lab”.
b) The issue around bioinformatics training of users on a broad scale needs to be discussed with relevant stakeholders. A solution is needed to raise awareness, increase the bioinformatics skills in the user communities and encourage young life scientists to integrate “wet lab and “dry lab” in their careers. GOBLET is a nucleus that could serve an excellent starting point to develop a strategic solution for this existing problem of lacking bioinformatics skills in the life sciences.
c) Further development of IMG.ORG and virtual machines will provide solutions for two issues: First, virtual machines provide whole analytical pipelines and appropriate environment for users to analyse their data, and second, virtual machines allow the use of webservices independent from sufficient internet connection.
d) Information of existing tools and services via the BioMedBridges/ELIXIR registry allows sustainability of the information and the chance to bring together in this portal more relevant information than any other portal was able t provide so far for the benefit of the user community.

Potential Impact:
With the achievements resulting from the AllBio activities a nucleus was created from which a larger impact can be generated with a suitable sustainability strategy. Several aspects have been addressed that are detailed below.

Effects on European competitiveness:

One of the major concerns for the European Research Area is the fragmentation in knowledge, resources and expertise. The fragmentation is caused by the geographical distribution across the European countries (and beyond), but we find also fragmentation within the research fields that are limiting the access to knowledge, research results and resources and – as a consequence – decelerate the effectiveness of research, the creation of new knowledge and targeted solutions and – finally - innovation.
Our project has greatly helped to identify neuralgic points and develop ideas for solutions. In pilots we were able to test ideas and prove their potential to contribute to overcome the fragmentation.
- Improvement of the availability of information/knowledge/research results for the research via the AllBioCatalogue: A comprehensive registry of all tools, webservices and databases across all life science areas is missing in the scientific landscape. The information is dispersed and spread over many countries, institutions, projects etc. Also the meta-information about the tools is scarce or sometimes even missing. To bring as much information as possible together in a portal would allow for the users a unique opportunity to access information, compare tools and make a sophisticated decision which tools to use. The AllBioCatalogue was built on the basis of the BioCatalogue, and an intensive search was performed to enlarge the BioCatalogue with information about the non-human life sciences. Based on the experience we made in exercise (described within WP2 summary) the AllBio partners concluded that a considerable impact can only be achieved if the initiative is enlarged. The teaming up with the just starting BioMedBridges/ELIXIR initiative to build a comprehensive registry will result in a unique resource for the users that will create much more impact than originally foreseen for the AllBioCatalogue. The ESFRI infrastructures have been established as sustainable entities to provide service, access to instrumentation and expert knowledge. These infrastructures work together across borders involving all European countries. The creation of a joint registry for bioinformatics tools, webservices and databases will represent the most comprehensive collection of information in this field. The collection will combine all fields of the life sciences and will combine the fragmented knowledge in a unique library. Considering the high dynamic of new developments in the area of bioinformatics, it will not be possible to make all information available via the central registry; still, the BioMedBridges/ELIXIR registry will provide a resource that is closest to a synopsis across all areas – which is supporting also the exchange of knowledge and cross-disciplinary interaction.
The advantage of joining with the BioMedBridges/ELIXIR registry is
(i) to create a more sustainable infrastructure as the ELIXIR has a middle- to long-term funding perspective and is already a legal structure that started working in 2013
(ii) the joining with an already large-scale activity like BioMedBridges/ELIXIR provides much more information than AllBio can deliver alone,
(iii) the joining of the BioMedBridges/ELIXIR registry appears to be a first leap into the direction to overcome fragmentation of the bioinformatics information available via the internet for the benefit of the user
(iv) avoiding duplication of efforts and a more efficient use of resources
(v) creating synergies: Within AllBio one major achievement was the community building and creating links with other initiatives. The linking with BioMedBridges/ELIXIR to jointly build the registry will have a considerable impact on the interaction between the life science fields.

- Creating synergies between the life science fields: In the life sciences the status quo reveals also a fragmented landscape where research is unconnected and conducted in silos. Innovation is often created by crossdisciplinary interaction. Via the community building events and activities to interact with other projects and initiatives is an important contribution to the efforts to combine different expertise to advance science. Europe could create a tremendous advantage if cross- and interdisciplinary interaction would become a daily routine exercise. The hack-a-thons could serve as an excellent tool to foster this interaction and support creation of new ideas and solutions to challenges.

- Closer interaction between “wet lab” and “dry lab” researchers: The same fragmentation that we observed between the life science fields we also face between the “wet lab” and “dry lab” world. Normally these experts work separately and have no regular interaction and communication. A lively communication between these expert fields would accelerate the identification of gaps, problems or areas of suboptimal performance or organisation, provide a fast route for user-driven problem-solving and new developments. Developing solutions jointly between user and developer will ensure that the products are targeted directly to user needs and are provided in a user-friendly package. The aforementioned hack-a-thons are excellent models to encourage and foster this interaction and will be an important contribution to advance research in Europe and increase the competitiveness of Europe in the world. A strategy will be developed by the AllBio partners how the model of hack-a-thons can be implemented in the daily routine work and which incentives can be given to the communities to adopt this new method.

- “Brain-a-thons”: In the life sciences – but maybe also in other scientific disciplines – we observe the problem that projects lack a detailed experimental design. The efficiency in research output and the re-usability of data generated in research projects can be increased dramatically by an optimization of the experimental design. In the AllBio hack-a-thon sessions we realized that a moderated discussion between the project partners – who represent different expertise fields – and a structured consultancy for optimization is leading to a much higher efficiency for the projects. The AllBio partners will develop a strategy how this experience can be translated into a service for the scientific community. An important secondary effect of such a consultancy process is the introduction of relevant standards and recommendations for an efficient data management. Both will contribute to a better re-usability of data. These “brain-a-thons” will be an important contribution to the optimization of research performance in Europe and the leverage of existing knowledge.

- Bioinformatics Training: An urgent need for whole Europe is the increase in bioinformatics knowledge in academia as well as in industry. The start of the GOBLET initiative is a major achievement to support education and training in bioinformatics by joining efforts not only in Europe, but also integrating activities from other regions in the world. The extensive use of this knowledge and information resource will increase the expertise level in Europe and contribute to create the intended impact to foster research and contribute to the economic development of European societies.

- Standards: Standardisation has been identified as an important topic for Horizon 2020 in all research areas; also in the life sciences, development of standards and their adoption are urgent issues on the agenda and need to be addressed. Standards are excellent tools for market access and increases the viability of new products to be launched on the market. Standards ensure high quality of the research data, but their broad adoption across academia and industry has not yet been achieved. It also lacks recognition, long-term funding and incentives for the development of standards. The hack-a-thons and “brain-a-thins” will be excellent multipliers for raising awareness for standards, contribute to their development and increase their broad adoption across the life sciences.

Dissemination of knowledge:

The AllBio project was disseminated along a series of action lines detailed below. In all mentioned categories, the AllBio partners have been successfully communicating about the AllBio project and the respective results. In the following compilation is a summary of all efforts and achievements.

(i) Publications in peer-reviewed journals
In total the AllBio partners - with and without their cooperation partners - published 10 articles in peers-reviews journals. All publications were submitted in Open Access journals and are freely available. Two publications were recently submitted and two additional publications are still under development and will be submitted early in 2015. A detailed list of the publications is given in template A1.

(ii) Publication in media for the broader public
One article about AllBio was published in the context of introducing the initiative to create a European infrastructure for systems biology. The article was published in the English edition of “systembiologie.de”. A video was produced with the coordinator Dr. Erik Bongcam-Rudloff about AllBio and published on youtube (https://www.youtube.com/watch?v=Wa8CKLBeKzk).

(iii) Webpages
The AllBio workshops and training events have been announced on several webpages, some of the announcements are also included as examples in the table for template A1.

(iv) Software packages
Two of the software packages that have been produced in the frame of the AllBio hack-a-thons are already subject of publications accepted in peer-reviewed journals. One additional software package will be launched soon. All three software packages will also be uploaded to the BioMedBridges/ELIXIR registry and GOBLET and are freely available. As all software packages carry the AllBio label this is also one element in the dissemination strategy.

(v) Presentations
The partners were also very active in the communication with other relevant initiatives within the scientific community. They introduced the AllBio project in presentations in conferences and workshops. These interactions resulted in perspectives for sustainability and continuation as a collaborative effort.
AllBio was presented e.g. in the following events during the last year:
• Coordinating meeting between European systems biology initiatives, 9 April 2013. Berlin, Germany
• “Next Generation Sequencing Conference (NGS) 2014”, June 2-4, 2014. Barcelona, Spain
• November 6th, 2013: “First official AGM of GOBLET. It marks the beginning of the future for GOBLET”. Norwich, UK

(vi) Workshops and conferences
Workshops and conferences were organised by the AllBio partners for the community building, but also to promote the AllBio initiative and explore new links with other activities. In total, the partners organised 27 events with overall 1.050 participants from countries, among them 6 non-European countries. Distributed over the years these were 11 events in 2012, 9 events in 2013 and finally 7 events in 2014.

(vii) Training activities
Training events were organised mainly to validate the software developments, raise awareness for the AllBio efforts, collect continuously information about needs and gaps defined by the users, and finally promote bioinformatics and the AllBio achievements across all areas.
Over the three years the AllBio partners organised 26 training schools and tutorials with overall 778 participants from 28 countries, among them 6 non-European participants. Distributed over the years these were 2 events in 2012 and 2013, but 21 events in 2014.

(viii) Linking with European infrastructures and relevant initiatives
ALLBIO partners initiated contacts and/or collaborative work with the following projects/ initiatives:
EpiConcept: http://cost-epiconcept.eu
microb3: http://www.microb3.eu/
Stategra: www.stategra.eu
Deann: www.deann.eu
PROLIFIC: www.euprolific.eu/
GOBLET: www.mygoblet.org
EMBnet: www.embnet.org
Biobankcloud: www.biobankcloud.com/
SeqAhead: www.seqahead.eu
ELIXIR: www.elixir-europe.org
H3Abionet: http://h3abionet.org
ISCB: www.iscb.org
NETTAB: www.nettab.org
BioMedBridges: www.biomedbridges.eu/
ISBE: http://project.isbe.eu/
IMG.ORG: https://bioimg.org/
Contact to min. 20 COST Actions and other European initiatives from the BMS field, e.g. EU Fish Biomed, Rabbit Genome RGB-Net, BM1102 Ciliates as model systems, BM0904 HDLnet, BM 1003 Microbial cell surface determinants of virulence, BM0902 Diagnosis of myeloproliferative disorders, 1000 Genomes, ICGC, GEAUVADIS, SPIDIA, Virtual Liver Network

The most intensive interaction is now going on with GOBLET to foster and enlarge the portal and provide material for teaching and training, with BioMedBridges/ELIXIR to jointly develop the registry of tools, webservices and databases for the life sciences and the building of virtual machines for IMG.ORG.

Exploitation

In the frame of the hack-a-thons organised around the test-cases three software tools have been successfully developed as potentially exploitable foreground:
• Software package: SV Autopilot - Structural Variation AUTOmated PIpeLine Optimization Tool. Automated analysis pipeline for structural variation of genomes in all species
• Software package: PlantPathX - Pathway analysis of poorly annotated but sequenced plant genomes
• Software package: PotGen – Functional annotation of the potato genome

For these developments, no background knowledge by any partner was existing or needed before start of the work on the software tools; therefore no legal issues were relevant to consider for the software developments. With the researchers participating in the development and the AllBio project partners it was discussed if there is an interest in commercial exploitation of the software tools. All parties agreed that these software tools will be made freely available to the scientific community embracing the Open Source and GPL models. These tools were not suitable for direct commercialisation, and also a patent application was not considered. It was decided that a better strategy will be the publication of the tools and the integration in relevant portals (www.bioimg.org) to make them available to the users.
The partners see an exploitation potential in offering service and training modules to academia and industry. The beneficiaries and the researchers who participated directly in the development of the software have the first right to use the opportunity to develop service and training offers. If these priority partners will not use the opportunity, an offer will be made to the other AllBio beneficiaries if they want to develop service/training packages.
The same strategy was decided for the two additional activities that resulted from the AllBio project:
• Web-portal to provide virtual machines for omics analysis (www.bioimg.org)
• GOBLET: Web-portal with teaching material on bioinformatics (www.goblet.org)
• ncARENA: an integrated resource for small non-coding RNAs functional annotation (webpage not yet available)

No further legal issues have been arising from the AllBio activities as the partners agreed to share the generated knowledge among each other and with the public.

List of Websites:

www.allbioinformatics.eu

Final Report Summary - ALLBIO (Broadening the Bioinformatics Infrastructure to unicellular, animal, and plant science)

Related documents

Download Download the content of the page