## Final Report Summary - IDEAL (Integrated DEsign and AnaLysis of small population group trials)

Executive Summary:

Clinical trials are the main means of evaluation new therapies for use in humans. The specific layout of a clinical trial depends on various aspects. Increasingly the scientific community has recognized that the size of the target population is relevant when planning a clinical trial. As statistical methods are considered as the backbone of the clinical trial with respect to design and analysis aspects, the question appears, whether the well accepted understood and evaluated standard techniques in designing and analysing clinical trials in moderate or larger populations are applicable in small clinical trials with a limited population in the back, too. At that point the EU FP7 funded Integrated Design and Analysis of small population group trials (IDeAl) project was set up to refine the statistical design and analysis methodology for clinical trials in small population groups by strictly following the concept of an improved integrative approach from various perspectives. These perspectives cover the assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, as well as the evaluation of biomarkers which are strongly related to regulators requirements. Of course, the dissemination of the results is a main purpose of IDeAl as well. Within the nine scientific work-packages the IDeAl consortium has developed:

• a new methodology for the selection of the best practice randomization procedure and subsequent analysis for a small population clinical trial taking possible bias into account

• a new optimized design and analysis strategy for comparing dose response profiles to extrapolate clinical trial results from a large to a small population

• statistical methods to adapt the significance level and allow confirmatory decision-making in clinical trials with vulnerable, small populations

• design evaluation methods enabling small clinical trials to be analysed through modelling of continuous or discrete longitudinal outcomes

• approaches to planning and analysing trials for identifying individual response and examining treatment effects in small populations

• new methods for sample size calculation, type 1 error control, model averaging and parameter precision in small populations group trials within non-linear mixed effects modelling

• new methods for identifying biomarkers and prognostic scores based on high dimensional genetic data in small population group trials

• how to optimise the overall value of drug development to patients, to regulators and to society under opacity in regulatory and payer rules as well as in very rare diseases

• methodology to evaluate potential surrogate markers and to analyse data from a small numbers of small trials, with emphasis on fast and easy computational strategies

Together with the asterix and InSPiRe project, the IDeAl findings are discussed with the regulators at the joint workshop hosted by the European Medical Agency (EMA) in March 2017. It becomes clear, that some findings need time to being implemented while braking barriers of traditional thinking in clinical trial methodology. The output of IDeAl is currently described in 63 scientific peer reviewed publications, more than 170 presentations and additional in a series of webinars.

For further information see the website www.ideal.rwth-aachen.de.

Project Context and Objectives:

1. Aim

The aim of the Integrated Design and Analysis of small population group clinical trials i.e. the IDeAl project was to refine the statistical methodology in small population group trials by strictly following the concept of an improved integration of design, conduct and analysis of clinical trials from various perspectives. These methodologies were addressed to the efficient assessment of the safety and/or efficacy of a treatment, universally applicable and not unique to specific diseases.

Statistical methodologies for design and analyse are the backbone of clinical trials aiming to evaluate new therapies. The theory of statistical design methodologies for clinical trials in large population groups is highly elaborated, well accepted and has reached a high standard. In particular, the operating characteristics of design and analysis methods for clinical trials in arbitrarily large populations are quite well known but may alter in small population groups. However, the applicability of standard clinical trial approaches to small populations has come under increasing scrutiny and criticism and the scientific community has been seeking for more advanced or new methods, recognizing that current theory does not reflect the special problems arising in clinical trials for small population groups. An example is the problem of “noise to effect” ratio, where the impact of bio-noise as result from avoidable and unavoidable non-systematic errors in the design and conduct of a trial could be handled by increasing the sample size in large populations. Obviously, this is not possible in small population groups, where the geographically sparse distributions of patients in the EU as well as worldwide and a very limited population size per time, results in low recruitment rates. So treatment effects may be overlooked by application of the classical design concepts because only large effects can be observed. On the other hand, the problem is also strongly related to unacceptable prolonged recruitment periods. Further, in contrast to “standard” clinical development programmes used in regulatory settings of drug legislation in large populations, conducting a series of large clinical trials is obviously not be feasible in small population groups.

2. Key challenges of the project

The key challenges originate to the major applications areas of small population clinical trials for treatment evaluations. Currently these are rare diseases, paediatric trials, subgroups of responder and personalized medicine.

Rare Diseases

In the EU, diseases that affect on average not more than 5 in 10 000 people are called rare. The resulting group of patients who are affected by a specific rare disease can be very small. A significant number of diseases occur in only 2 patients (e.g. Obesity-colitis-hypothyroidism- cardiac hypertrophy- developmental delay syndrome), see Orphanet (2016). Worldwide there exist around 8 000 rare diseases, affecting around 30 million people in the EU.

The ability of conventional statistical methods to evaluate new therapeutic approaches for any given rare diseases is limited. For instance long recruitment time as well as geographically sparse distribution result in organisational challenges. Biased study results due to time trend in the data may be more likely. Prolonged recruitment time may lead to reduced motivation of patients as well as physicians due to loss in believe about currentness of the research or because availability of new treatment developments. These problems may be among others more likely in small population clinical trials. However, the definition, of what is small and leads to reduced validity of conventional statistical methods is not quite clear. To give some figures about “what is a small trial” one can refer to the 63 orphan drug approvals in the EU from 2000 to 2010. Here 22 of 38 randomised controlled trials showed a total sample size below 50. Of course, one would assume that with a sample size of 50 or below long run arguments are likely to fail resulting in a reduced validity of the conventional statistical approaches to demonstrate the efficacy and safety of therapies.

Paediatric Trials

Specific to paediatric trials are ethical limitations as well as heterogeneity of age classes. This hampers clinical trials at least in size but evidence is needed for accepting new paediatric specific therapies. Motivated by the observation that market forces alone have proven insufficient to stimulate adequate research into medicinal products for the paediatric population, and their development and authorization, a new regulatory procedure, the so called paediatric investigation plan (PIP) has been taken over by a Paediatric Committee (PDCO) at the European Medicines Agency (EMA). The scope of PIPs may reach from the one extreme of a full programme (including pre-clinical research, pharmacokinetics, pharmacodynamics, dose finding studies and two fully powered pivotal Phase III studies) for diseases only existing in childhood to the other extreme of, for example, only a single (pharmacokinetic) case series in children. In the EU regulation, the option of fully or partially extrapolating knowledge and data from adults to paediatric populations is an obvious and widely applied approach to reduce the burden of drug development in children. In particular, sound statistical design and analysis methods to support extrapolation are underinvestigated so far.

Personalized Medicine

Recent consideration in the evaluation of new therapies take into account that the treatment effect varies between patients. This leads to the efforts to improve individual patient’s response by tailoring the treatment, the key idea in personalized medicine. This in the end could be understood as evaluation of the treatment, which works best in a particular patient. In the statistical sense, personalized medicine implies reduction of the variance resulting in new challenges with respect to study design and analysis. Of course, small population groups are linked to the popular vision of size one result.

Subgroups

Small population groups may occur as subgroups of responders to therapies, which might fail to succeed in the whole population. Subgroups may also be defined by individually tailored therapies or as regional subpopulation. Further small population subgroups may occur as instances of public health urgency. An EMA guideline to confirmatory investigate subgroups in clinical trials with therapy response is actually in the developing phase. Here it has to be realized that the resulting subgroups, whether identified in the analysis phase of trial or identified in planning phase might be considerable small. Of course, the statistical evaluation is affected by the risk of type-I-error inflation as well as lack of power resulting from the size of the subgroups.

The methodological framework on clinical trials in small populations from the regulator perspective is described in the EU by the EMA guidance and in the US by the draft guidance on rare disease. The major messages of the CHMP guidance [CHMP 2007] are that there exists no special method for designing, carrying out and analysing clinical trials in small population groups. Further, it is recommended to use as much as possible information for designing a clinical trial and extract as much as possible information from a clinical trial to make a valid benefit risk assessment possible. Additionally, the EMA guideline [EMA 2012] states that avoiding unnecessary clinical trials can be done by extrapolation, i.e. transfer knowledge from a large population to a small population.

As elaborated in the guideline, knowledge about the variability is essential for efficient study design and well planned use of the best available techniques to obtain and analyse information is crucial. However, sometimes bad design itself may be a source of additional variation or necessitate complex statistical models to extract all information from the data. Here, less conventional methodological approaches are mentioned, which increase the efficiency of the design and analysis for small population group clinical trials. It is stated that these methods are not often used because of increased complexity.

Starting from the actual regulatory guideline on small population clinical trials in the EU, IDeAl identified challenges in the statistical methodology that actually hamper the conduct of clinical trials in small population groups, such as reduced power of limited sample or population size, slower recruitment progress, elevated heterogeneity in patients’ outcome, difficulties in decision-making because of limited repeatability of trials, criticism of the validity of traditional statistical methods that are based on long run arguments or difficulties in measurement of patient outcome. These points are linked to the following research areas in small clinical trials:

• addressing pharmacological aspects,

• using prognostic information,

• pharmacogenetic information for tailored therapeutics including n-of-1 trials,

• extrapolation of information from large population to small population groups,

• design aspects, including optimal design and doses, choice of analysis,

• adaptive designs, including response adaptive methods and sequential designs,

• randomisation, including non-parametric methods,

• choice of endpoints,

• using Bayesian arguments and decision analysis.

IDeAl also recognized, that the developments in biostatistical methodology nowadays take frequently only a single aspect into consideration, e.g. adaptive designing a clinical trial is usually done without incorporating the information from randomization procedure or prognostic factors. The IDeAl approach is an integrative way of harmonized methodologies. In other words, there exists an unmet need to have tailored statistical design and analysis methods, where the population size is limited and or the sample size is small.

Summing up, the IDeAl approach of an integrative improvement with the methodology for clinical trials in small population groups covers the following areas:

• the assessment of randomization,

• the extrapolation of dose-response information,

• the study of adaptive trial designs,

• the development of optimal experimental designs in mixed models,

• the development of pharmacokinetic and individualized designs,

• the development of methods for simulation of clinical studies,

• the involvement and identification of genetic factors,

• decision-theoretic considerations as well as

• the evaluation of biomarkers, and surrogate endpoints.

In 2013, the IDeAl project was launched as a collaborative research activity of 10 European partners from 7 European countries accompanied by an external advisory board involving all relevant areas required to statistically design and analyse small population clinical trials.

3. Objectives

The project explored new methods for design and analysis of clinical studies, integrate and synthesise these into an effective strategy, so that the efficiency of clinical trials evaluating therapies for rare diseases can be significantly increased.

The objectives of IDeAl were to

• find the adequate randomization procedure for small population group trials by assessment of established randomisation procedures and formulation of a randomization based test.

• develop adequate statistical methodologies to extrapolate the dose-response information from source to target population.

• incorporate of external information into adaptively designed clinical studies for small population groups.

• develop optimal design in non-linear mixed models to analyse studies in small population groups.

• design pharmacogenetic small population group trials including cross-over trials, n-of-1 trials and enrichment trials.

• develop pharmacometrical methods to enable simulation clinical trials in small population groups based on non-linear mixed effects models.

• develop of new statistical models for prediction of the response to the therapy in small population group trials based on genetic factors and other covariates.

• to improve the rational basis for decisions, and help align different stakeholder perspectives.

• develop an efficient and feasible framework for biomarker and surrogate endpoints in small population group clinical trials.

• disseminate the newly developed statistical methodology of the IDEAL project.

Project Results:

Most statistical design and analysis methods for clinical trials have been developed in the setting of relatively large sample sizes, with confirmatory clinical trials often recruiting several hundreds or even thousands of patients. These methods may not be suitable to evaluate therapies in small populations. The general objective of the IDeAl project is broken down in 10 scientific work-packages focussing on the assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, as well as the evaluation of biomarkers and the dissemination of results. One additional work-package provide support for project management. The IDeAl project is accompanied by an advisory board of international experts with different professional backgrounds, representing both patients' interests, the views of the pharmaceutical industry as well as clinical and regulatory aspects.

In the following the achievements within the funding period ending on April, 30 2017 of the IDeAl project broken down by workpackages are described. Of course, there are several activities planned and will be conducted in the next future. Some of these are described in section 3.

1. Management Structure (WP1)

A comprehensive framework for the proper and trustful implementation of all contractual, scientific, administrative and financial tasks within the project work plan to achieve effective and efficient project coordination was built up by the coordinator. IDeAl is organized as a cooperative project of 11 work-packages. The leads of each work-package formed the steering committee which acts in various perspectives and have had various face to face meetings. Further Email contacts were implemented for fast decision makings. The consortium was accompanied by an external advisory board.

The management structure and mechanisms for communication were successfully implemented and practiced as project culture to guarantee high level of internal communication and a smooth information flow, to ensure knowledge exchange and monitoring of the research work between the project partners. The main communication means used were email correspondence and face-to-face meetings as well as telephone conferences. The most efficient way of communication within the consortium members was by email communication, intensified through bilateral phone calls and ad hoc appointments with the coordinator for open dialogue, strategic discussions and definition of future steps on research and dissemination.

The coordination of the project was very successful measured by means of the output of the project. All partners are involved in the impressive scientific and dissemination output, working together in close collaboration. As one partner once stated, IDeAl established a “new, unique scientific family” – a network of scientist from EU and outside working originally in different areas of clinical trials started working together with launch of the project. Of course, this network could start working only with an efficient management structure. All groups decided on the progress of the project work in a cooperative, seamless way. With this in the back, IDeAl could get in close contact and work together with other projects like asterix and InSPiRe without conflicting intellectual property rights.

Instruments supporting this process were the web-based project management platform ‘IDeAl Cloud’, the webpage and Email contacts. Furthermore, extending the proposal, the coordinator together with the work-package leads set up and chaired the “Young Scientists Working-Group” to further strengthen the scientific exchange and the internal communication between the principal investigators and all research assistants working under the IDeAL umbrella.

In addition, the scientific work progress of the project was accompanied by the advice of the IDeAL External Advisory Board consisting of international experts representing patients’ interests, views of the pharmaceutical industry as well as clinical and regulatory aspects. The board members particularly supported the consortium members in circulating the project results to the scientific community, in consultation and agreement with the coordinator.

During project run time, managing contracts between partners including the external advisory board, implementation and assistance of financial EU guidance, meeting organisation and making templates available are supportive measures.

2. Recommendations

2.1. Assessment of randomization procedures and randomisation based tests (WP2)

The gold standard in clinical trials to implement treatment allocation to patients is by using randomization. Here the element of chance in the allocation process is used to avoid or at least minimize the influence of bias on the estimate of the treatment difference. The properties of randomization procedures are well studied from the theoretical point of view, but little work has been done with respect to practical situations. In particular, most of the evaluations belong to the long run argument, which is hardly applicable in small clinical trials. On the other hand, the choice of the randomization procedure for a particular clinical trial is generally up to the scientist “feeling” and frequently not well motivated by scientific arguments. To avoid false decisions for a treatment effect caused by the lack of selecting, the best practice randomization procedure is searched for. To assess the value of randomization procedures for designing in particular in small clinical trials, a completely new methodology had to be developed.

Bias assessment of randomisation procedures

The assessment of bias in small population group trials calls for a completely new analysis tool. The statistical analysis of selection bias on the type-I-error probability was extended to a mathematical description of the model under misclassification (Langer, 2014). Recognizing that the biasing policy cannot be applied to time to event data, a modified biasing policy was developed (Rückbeil et al., 2017). Finally, the biasing policy was adapted to the specific research question in multi-arm clinical trials (Uschner, Hilgers, et al. 2017 submitted). Models for investigating the impact of chronological bias on the type-I-error probability were introduced by Tamm and Hilgers 2014. In the next step, a combined model was introduced to investigate the additive joint effect of selection and time trend bias on the type-I-error probability. Further another approach (Schindler, 2016) the so called linked assessment criterion, based on a normalized multi-criterion function, was developed which enables the combination of different criteria measured on different scales (e.g. balancing behaviour, correct guessing, etc.) to assess randomization procedures. The models applied to different randomisation procedures show the influence of the two biases on the type-I-error probability depending on the amount of bias. For instance, in a simulation study the risk of selection and chronological bias in two-arm parallel small population group trials based on simulated type-I-error rates deviates markedly depending on the size of the trial, the magnitude of the particular bias as well as the randomization procedure used.

Recommendation 1: Do not select a randomization procedure by arbitrary arguments, use scientific arguments taking into account the expected magnitude of bias.

Development of adequate randomisation procedures for small population groups

Although various randomization procedures have been proposed, no one procedure performs uniformly best. In the design phase of a clinical trial the scientist has to decide upon the "best practice" randomization procedure to be used, taking into account the practical research conditions of the trial with respect to the potential of bias. Up to now, less support has been available to guide the scientist in making this decision, i.e. weighting the properties of the randomization procedure with respect to practical conditions of the research question to be answered by the clinical trial.

Although there exist a large number of software products that assist the researcher to implement randomization, no tool, which covers a wide range of procedures and allows the comparative evaluation of the randomization procedures reflecting the specific clinical situation, has been proposed in the literature. A framework (Evaluation of Randomization procedures to clinical trial Design Optimization) to assess the impact of chronological and selection bias in a parallel group randomized clinical trial on the probability of a type-I-error to derive scientific arguments for the selection of an appropriate randomization procedure (Hilgers, Uschner et al. 2017 submitted) was developed. In order to conduct an ERDO (Uschner, Schindler et al. 2017) developed the R package randomizeR, which addressed this unmet need. The software randomizeR allows the generation of randomization sequences and the assessment of randomization procedures with respect to bias. A YouTube video and a manual facilitate the use of the freely available software.

Recommendation 2: In case of randomized clinical trial emphasis should be given to the selection of the used randomization procedure by following ERDO using randomizeR.

Development of randomisation tests for small population groups

(Kennes et al., 2015) provided an asymptotic likelihood ratio test to analyse randomized clinical trials that may be subject to selection bias for normally distributed responses. These results correlate well with the likelihood ratio test of Ivanova et al. (2005) for binary responses. Tamm and Hilgers (2014) stated, that unobserved time trends may induce a strong time trend bias in the treatment effect estimate and the test decision. According to our results, medium block sizes are sufficient to restrict chronological bias to an acceptable extent. Regardless of the block size, a blocked ANOVA should be used because the t-test is far too conservative, even for weak time trends. Similar Uschner, Hilgers, et al. (2017 submitted) proposed a biased corrected test for mutiarm clinical trials.

Finally, we propose exact randomization tests for small population group trials, based on the applied randomization procedures, i.e. randomization based inference. We found that randomization tests in small population group trials can be distorted by selection bias and developed a bias-corrected test based on a restriction of the reference set. The test yields unbiased p-values in the presence of selection bias, independent of the distribution of the responses. In addition, we propose an algorithm for the efficient generation of the restricted reference set. Finally, in randomization based inference the problem of dealing with missing observations is treated by Hilgers, Rosenberger, Heussen (2017 submitted).

Recommendation 3: In case of a randomized clinical trial, we recommend to conduct a sensitivity analysis to elaborate the impact of bias on the type-I-error probability.

2.2. Extrapolation dose response information (WP3)

The different reaction on medicinal drugs in various populations becomes of huge importance in clinical research. Here populations can be of small sizes. Consequently, the main objective of work-package 3 was the development of new statistical methodology for the extrapolation of dose response information and conclusions available from a given source population to make inference for another target population. By this, for example unnecessary studies can be avoided. In this context regression models are a very important tool to provide dose response information. In many cases the question occurs whether two dose response curves can be assumed as similar. This problem also appears in the situation of detecting non-inferiority and/or equivalence of different treatments (Liu et al., 2009). We derived new statistical procedures addressing the problem of comparing curves and extrapolating information, with a particular focus on trials with small sample sizes. The main achievements are the following:

New statistical measures for similarity of dose-response between a source and a target population

We started our work by improving the accuracy and the computational effort of confidence bands for the difference of two curves. The currently available methods (see for example Gsteiger et al., 2011) are based on the union-intersection test of Berger (1982) which yields procedures with extremely low power. In our approach, which is based on estimates of new measures of similarity (such as the maximum deviation) between two dose response curves, we achieved a clear improvement using bootstrap methods. Additionally, we developed a new statistical test for the hypothesis of similarity of dose response curves. The test decides for equivalence of the curves if an estimate of a distance is smaller than a given threshold, which is obtained by a (non-standard) constrained parametric bootstrap procedure. The finite sample properties are investigated by means of a simulation study (see Dette et al., 2017 and Möllenhoff, 2016, for the corresponding R package “TestingSimilarity”). These procedures have been developed in close cooperation with Professor Frank Bretz (Head of Biostatistics, Novartis), member of the Advisory Board, to ensure that all important features in drug development are addressed by the new methodology.

Recommendation 4: The comparison of dose response curves should be done by the bootstrap approach developed by Dette et al. (2017) instead of Gsteiger et al. (2011).

Extrapolation of efficacy and safety information

Our goal here was to quantify the information from the source population in order to extrapolate to the target population. We used the Minimum Effective Dose (MED) as a metric (see Ting, 2006) yielding a measure for similarity of dose response. The MED can be used to claim equivalence (to a certain amount) of information from the source and the target population. Confidence intervals and statistical tests were developed for this metric (see Bretz et al., 2017 submitted).

Recommendation 5: If the aim of the study is the extrapolation of efficacy and safety information, we recommend to consider and compare the MEDs of two given populations.

Robustness against incorrect model assumptions

Concerning the robustness of all these new techniques, we did numerous simulation studies investigating the sensitivity of the procedures with respect to misspecifications of the functional form (e.g. Schorning et al. 2016) . We could show a very robust performance of all derived methodology.

Recommendation 6: The derived methodology shows a very robust performance and can be used also in cases where no precise information about the functional form of the regression curves is available.

Minimisation of false claims through optimal experimental design and dissemination

Optimal designs for the comparison of curves have been developed, which minimizes the maximum width of the confidence band for the difference between two regression functions. In particular, it was demonstrated that the application of optimal designs instead of commonly used designs yields a reduction of the width of the confidence band by more than 50% (see Dette and Schorning, 2016, Dette, Schorning and Konstantinou, 2016).

Recommendation 7: In case of planning a dose-finding study comparing two populations, we recommend to use optimal designs in order to achieve substantially more precise results.

2.3. Adaptive design studies (WP4)

In adaptive designs accumulated data should be used to allow learning on the spot and, if necessary, redesign the ongoing trial at an adaptive interim analysis to increase the chances of success. Popular adaptations include changing the sample size, subgroup selection or dropping certain treatment groups. Such features are especially meaningful in small populations, where it is infeasible to conduct a series of (large) clinical trials. In Bauer et al. (2015) we summarized the developments over the past 25 years (see also Bauer et al. 2016). We reviewed the key methodological concepts, summarize regulatory and industry perspectives on such designs, and discuss case studies.

Development of evidence levels for small population groups

In small population groups, full independent development programs to demonstrate efficacy of an intervention are often not feasible. Therefore, there is a shortage of first-hand information that can support evidence in favour of a treatment. For example, children are regarded as a vulnerable population and in terms of medical research, they have to be protected from unnecessary risk. This inhibits research in children and leads to a situation, in which many medicines are registered for adults, but not for children. By EU regulation, paediatric investigation plans should be agreed on in early phases of drug development in adults. Here, extrapolation from adults (“source”) to children (“target”) is widely applied to reduce the burden and avoids unnecessary clinical trials in children. We proposed adaptive paediatric investigation plans explicitly foreseeing a re-evaluation of the early decision based on the information accumulated later from adults or elsewhere (Bauer and König, 2016).

We focused on the combination of target and source population data (extrapolation). We translated frequentist decision criteria (alpha-level boundaries for p-values) into the Bayesian framework (Hlavin, König et al., 2016). We introduced a “scepticism factor” to formulate a framework based on prior beliefs in order to investigate when the significance level for the test of the primary endpoint in confirmatory trials can be relaxed (and thus the sample size can be reduced) in the target population. The less sceptic one is that extrapolation from the source population is applicable, the higher the adjusted significance level will become for the pivotal trial in the target population and therefore the smaller the required sample sizes. Another way to adjust the significance level for efficacy testing is to incorporate safety data as well. We suggested a two-step safety selection and testing procedure for multi-armed clinical trials (Hlavin, Hampson and König, 2016).

Recommendation 8: In case of confirmatory testing, we recommend adapting the significance level by incorporating other information (e.g. using information from drug development programs in adults for designing and analyzing pediatric trials).

Adaptive designs for confirmatory model based decisions

We developed adaptive graph-based multiple testing procedure to allow testing of multiple objectives and designs adaptations in a confirmatory clinical trial (Klinglmüller et al., 2014). Because the adaptive test does not require knowledge of the multivariate distribution of test statistics, it is applicable in a wide range of scenarios including trials with multiple treatment comparisons, endpoints or subgroups, or combinations thereof. If, in the interim analysis, it is decided to continue the trial as planned, the adaptive test reduces to the originally planned multiple testing procedure. Only if adaptations are actually implemented, an adjusted test needs to be applied.

The MCPMod approach has recently attracted a lot of attention as it was the first statistical methodology, which has been ‘qualified’ by the European Medicines Agency. Originally, MCPMod has been developed for Phase IIb dose finding studies to characterize the dose response relationship under model uncertainty once a significant dose response signal has been established. We developed a new closed MCPMod methodology for confirmatory clinical trials to allow individuals claims that a drug has a positive effect for a specific dose (König et al., 2016). We applied the closed MCPMod methodology to adaptive two-stage designs by using an adaptive combination tests (Krasnozhon, Bornkamp et al., 2016).

In a recent review conducted by the European Medicines Agency (Hofer et al., 2017 submitted) it was shown that most of the adaptive design proposals were in oncology. Unfortunately, the important case of time-to-event endpoints was currently not addressed by the standard adaptive theory. We proposed an alternative frequentist adaptive test, which allows adaptations using all interim data (Magirr et al., 2016). We showed that other standard adaptive methods may ignore a substantial subset of the observed event times. We developed group sequential permutation tests for situations where underlying censoring mechanism would be different between the treatment groups (Brueckner et al., 2017 submitted).

Recommendation 9: In case of design modification during the conduct of a confirmatory clinical trial, we recommend using adaptive methods to ensure that the type-I-error is sufficiently controlled not to endanger confirmatory conclusions. Especially in clinical trial with multiple objectives special care has to be taken to address several sources of multiplicity.

Adaptive designs to enable comparative effectiveness analysis

Before a new drug can be prescribed by medical doctors to patients on a regular basis, its efficacy has to be demonstrated and the drug assessed by regulatory authorities, HTA and reimbursing bodies. We developed adaptive clinical trial designs to address both the needs of regulators and reimbursers simultaneously. Here questions like, is there a particular subgroup of patients, e.g. defined by genetic biomarkers, which benefit (more) from the experimental treatment; how can we incorporate historical data for decision making also enabling comparisons against control treatments have to be considered.

In the light of personalized (precision) medicine there is a huge debate whether subgroups, e.g. identified by a genetic biomarker and which are typically (very) small in size, may benefit (more) or not. We defined different utility functions to address the needs from a sponsor and public health perspective (Graf, Posch and König, 2014 and Ondra et al., 2016) to identify the optimal trial design. The optimization included, for example, the required sample sizes and the targeted population(s) for the trial (the full population or the targeted subgroup only) as well as the underlying multiple test procedure.

We showed there can be a substantial inflation of the type-I-error rate if investigators perform design adaptations such as treatment selection, sample size reassessment and change of randomisation allocation ratios and naively apply conventional frequentist hypothesis tests ignoring the adaptive nature of the trial (Graf, Bauer et al., 2014). We showed that response adaptive designs have several caveats such as inflation of the type-I-error rate or loss of power when the number of patients to be recruited is limited (Krasnozhon, Bornkamp et al., 2016). Instead of performing adaptations after each single observation, we suggest adaptive designs using adaptive combination tests, where design modifications are performed at a single interim analysis only.

Another important issue is, how could one perform comparative analyses if RCTs are not feasible or ethical? Due to various data sharing initiatives, there are now unprecedented opportunities as well as challenges (König et al., 2015). We propose a new framework for evidence generation called “threshold-crossing” (Eichler et al., 2016). The key issue for threshold-crossing is the upfront specification of an efficacy threshold based on existing RWD and/ or past RCT data. Based on the pre-defined threshold, efficacy is established a single arm thresholding trial. However, as the comparison in threshold design is against historical controls, it is prone to biases.

Recommendation 10: In case randomized control clinical trials are infeasible, we propose “threshold-crossing” designs within an adaptive development program as a way forward to enable comparison between different treatment options.

2.4. Optimal design in mixed models (WP5)

Nonlinear mixed effects models are used in model-based drug development to analyse all longitudinal data obtained during clinical trials. This is especially promising in small group trials as all collected measurements during the trial are kept to evaluate treatments. Therefore, finding good designs for these studies is important to get precise results and/or good power especially when there are limitations on the sample size and on the number of sample/visit per patient.

Following our pioneer work in optimal design based on the Fisher Information Matrix for non-linear mixed effects models with continuous data, the aims of this work-package were to

• Extend and evaluate this approach for longitudinal models with discrete data, repeated time to event and joint models

• Propose robust approaches with respect to parameter values as two-stage adaptive designs

• Propose robust approaches with respect to model uncertainty in design and analysis of pivotal trials analysed trough modelling

The goal was also to make the developments available in free software tools.

Evaluation of Fisher matrix for discrete and time to event longitudinal data

Evaluation of the Fisher Matrix for mixed models is often based on first-order linearization of the model which works poorly for discrete or time to event longitudinal data.

We developed two new methods to evaluate the Fisher Information Matrix. Both approaches use first Monte Carlo integration and then either Adaptive Gaussian Quadrature (MC-AGQ, Ueckert and Mentré, 2017) or Hamiltonian Monte Carlo (MC-HMC, Rivière et al., 2016). Both approaches were evaluated and compared on four different examples with continuous, binary, count or time to event repeated data.

We showed the adequacy of both approaches in the prediction of the standard errors using clinical trial simulation. The MC-AGQ approach is less computational demanding for models with few random effects, whereas MC-HMC computational effort increases only linearly with the number of random effects, hence more suitable for larger models. For both approaches, we show the importance of having large sampling number at the MC step. For the MC-AGQ method, we illustrated on the binary example the influence of the design (number of patients / number of repetitions) on the power to detect a treatment effect (Ueckert and Mentré, 2017).

The MC-HMC method was implemented in the R- package MIXFIM available in CRAN.

Recommendation 11: For evaluation of designs of studies with longitudinal discrete or time to event data, evaluation of the Fisher Information matrix should be done without linearization. Using the new approach MC-HMC (in MIXFIM) will provide adequate prediction of standard errors and allow to compare several designs.

Adaptive two-stage designs in non-linear mixed effects models

One limitation of the optimal design approach for nonlinear mixed effect model is the a priori knowledge needed on values of parameters. Adaptive designs are an alternative, increasingly developed for randomised clinical trial or dose-ranging studies, but rarely applied in nonlinear mixed effects model. Two-stage designs are more practical to implement in clinical settings than fully adaptive designs especially for small population groups.

We extended our package PFIM (www.pfim.biostat.fr) and released a new version PFIM4.0 to allow having a prior information matrix in design evaluation or optimisation, which therefore could be used for multi-stage designs.

We developed and evaluate by clinical simulation multi-stage designs using a pharmacokinetic/pharmacodynamics example with continuous longitudinal data with a total number of N=50 patients. After each cohort of patients, parameters were estimated and used for designing sampling times for the next cohort of patients keeping prior information already obtained to compute the Fisher Information Matrix.

We used first order linearization to evaluate the Fisher Information Matrix as implemented in PFIM which is suitable for continuous data. We studied the efficiency of single stage designs optimised with correct or wrong parameters. We then evaluated two stage designs, where the first cohort was designed from wrong parameters, varying the balance of patients in the two cohorts. We also studied the added value of designs with more stages.

We showed the good properties of adaptive two-stage designs when an initial guess on parameters is wrong (Lestini et al., 2015). In the studied example, the efficiency of the balanced two-stage design was almost as good as a one stage design that we would have obtained if the true parameters were known. With this small number of patients (N=50), the best two-stage design was the balanced design with equal number of patients in each cohort. Those results are consistent with those previously obtained (Dumont et al. 2016) with a simpler example. Having three or five stages, did not improve the efficiency of the designs and is more complex to implement.

The good properties of two-stage balanced design should hold for models with discrete longitudinal data (using then MC-AGQ or MC-HMC to compute the Fisher information matrix) but a clinical trial simulation was not performed.

Recommendation 12: When there is little information on the value of the parameters at the design stage, adaptive designs can be used. Two-stage balanced designs are a good compromise. The new version of PFIM can be used for adaptive design with continuous longitudinal data.

Model uncertainty in design for analyses of pivotal trials

It is important to contribute to the dissemination of model based analysis of pivotal clinical trials in drug evaluation for small population groups. These approaches allow using all individual information recorded, and therefore to decrease sample sizes. One main limitation, as seen by health authorities, is the possible lack of control of the type-I-error when performing model selection. Model averaging approaches are a good alternative. The idea of pre-specifying a number of candidate models is already applied in drug development, for instance for dose-response studies in the MCPMod approach, but was extended only recently for mixed effects models. Before the analysis step, it is needed to design studies which are adequate across a set of candidate nonlinear mixed effects models.

We proposed to use compound D-optimality criteria for designing studies, which are robust across a set of pre-specified model. We also proposed to be robust on the parameter values by defining prior distribution on each parameter and using the expected Fisher Information Matrix and hence using DE-optimality (for one model) or compound DE-optimality. As another integration is need to compute the expected Fisher Information Matrix over the distribution of the parameters, we extended the MC step in the MC-HMC method (Loingeville et al., 2017). We implemented this extension in a working version of MXFIM.

We evaluated those new developments on the count longitudinal data example where there is a model of the effect of dose on the Poisson parameter (Rivière et al., 2016, Ueckert and Mentré, 2017). We specified five different models of the dose effect (Loingeville et al., 2017). We optimised the two doses to be used, in addition to a placebo dose. We did it separately for each model, using D and then DE-optimality, and then across the five models (assuming equal weights of 1/5). We evaluated the loss of efficiency of each design for each model.

We found that the optimal doses varied from one model to the other and, for some models, also changed when robust approach on parameter is used (Loingeville et al., 2017). However, the loss of efficiency of using on design with another model can be important, for instance using the design optimal for a linear model when an Emax model is true lead to an efficiency of only 44%.

The robust design across the five models is a ‘compromised’ of the optimal design obtained for each model (Loingeville et al., 2017). Its efficiency is greater than 80% for each of the five models.

We developed an approach for one-stage design robust across a set of candidate models with uncertainty in parameters. We showed the robustness of the design obtained with this approach when finding optimal dose in a longitudinal count data model. The next steps are to make those new developments available in MIXFIM (presently available upon request), and to incorporate and evaluate this approach within two-stage adaptive designs.

Recommendation 13: When there is uncertainty in the model and on their parameters, a robust approach across candidate models should be used to design studies with longitudinal data.

1.5. Design of pharmaocogenetic trials (WP6)

In this work-package we have concentrated in particular on designs and analyses that pay close attention to sources of variation, exploiting, where appropriate, the predictive value of covariates but also the ability, for certain diseases, for patients to act as their own control. We have taken care to distinguish two major purposes of clinical trials: to establish whether treatments are effective and the wider and more difficult task of establishing when and for whom they are effective. We further pay attention to interpretation of the evidence resulting from a clinical trial (Senn, 2017b,c)

Necessary conditions for justifying a theranostic programme

The objective of this work-package was identifying under what conditions it was worth trying to identify a subpopulation for differential treatments.

We recognised early on, through interaction within the project, that excellent work was being done in WP9 that addressed a closely related problem, e.g. the problem of when should one stop trials in small populations. The reason that this is related is that if one should stop before one starts, it implies that studying the population in question is not worthwhile. We decided, therefore, to interact with WP9. Nevertheless, we have done quite a lot of work regarding one particular question of relevance in this context, namely how may evidence be found to establish whether further separate differentiation of a population, that would otherwise be treated the same, is warranted on the basis of response.

Recommendation 14: We recommend that response should not be defined using arbitrary and naïve dichotomies but that it should be analysed carefully paying due attention to components of variance and where possible using designs to identify them.

Development of within-patient trial designs

The object of this work-package was to put the analysis of n-of-1 trials on a firm, and logical foundation, bearing in mind two different purposes that they may have. First, establishing whether a treatment works at all (Lonergan et al. 2017). Second, establishing to what extent the effect varies from patient to patient. A thorough examination of this has been provided in a paper in PLOS One and associated code for analysis in R has been written (Araujo et al., 2017).

Recommendation 15: For the analysis of n-of-1 trials, we recommend using an approach that is a modified fixed-effects meta-analysis for the case where establishing the treatment works is the object and an approach through mixed models if variation in response to treatment is to be studied.

Development of between-patient trial designs

To extract as much information as possible from between-patient trials has a number of aspects, which are considered (see also Collignon et al., 2016). A side issue is that recent claims have been that significance test give positive results too easily, a situation, which if true, would have devastating consequences for trials in small populations. Some of the work has been devoted to addressing this and other work to a) making efficient use of covariates b) making appropriate use of historical information. This latter aspect of the work is ongoing.

Further work has considered the effect of sequential analysis on inferences. The attached diagram is taken from Senn (2014) and shows how the stopping rule does not have an influence on the inferences from a meta-analysis provided that the trials are weighted by information provided. Thus, inferences from combining small trials in rare diseases are unaffected by whether the trials were sequential or not.

Recommendation 16: When analysing between-patient studies we recommend avoiding information destroying transformations (such as dichotomies) and exploiting the explanatory power of covariates, which may be identified from ancillary studies and patient databases.

Sample Size determination

This task concerned sample size determination for clinical trials. The decision-theoretic aspect is covered by WP9. We have concentrated instead on addressing the challenge of n-of-1 trials, where many components of variation are involved and make sample size determination complex. Theory has been developed (Senn, 2017a) to cover this, practical approaches have been developed and code has been written in R®, SAS® and GenStat® to carry out this task.

Recommendation 17: In case of a conducting a series of n-of-1 trials we recommend paying close attention to the purpose of the study and calculating the sample size accordingly using the approach by (Senn, 2017a).

1.6. Simulation of clinical trials (WP7)

Analysis of clinical trial data using nonlinear mixed-effects models can have important advantages both with respect to the type of information gained and the statistical power for making inference (Karlsson et al., 2013). In general, the main disadvantage with a non-linear mixed effects (NLME) modelling approach is with the assumptions needed to create a NLME model. However, with the movement towards mechanistic models based on biological understanding (Danhof et al., 2008, Marshall et al., 2006), the validity of model assumptions becomes easier to evaluate. Mechanism based NMLE models can be of special interest in small population groups for multiple reasons (Lesko, 2012):

(1) practical limitations might severely hamper the possibility of sufficiently powering a study based on a statistical test making fewer assumptions

(2) When the population is a subset of subjects with a certain disease (e.g. pediatric) there can good opportunities for extrapolations based on prior information from a larger previously studied population (Gisleskog, 2002)

(3) The ability to pool data across studies, arms and periods can be essential to obtain enough information to assess the performance of a specific treatment in a small sub-population and this is often not possible without application of a model, and

(4) Application of NLME modelling as the primary analysis of clinical studies can also open up for innovative designs.

Improved methodology for power/sample size calculations with non-linear effects model based analysis

Power (the ability to identify a true drug effect of a certain size) is often of great importance in clinical trial planning and design, as it quantifies to which degree an experiment is able to distinguish a certain effect size from the null hypothesis. Determining the power of a study is easy when simple models for the description of the observations and a specific effect size are assumed, e.g. when the data are assumed to be normal and a fixed effect size is chosen, the power can be calculated analytically. However, for more complex, longitudinal models the joint distribution of the observations is less obvious and even the effect size might not be easily derivable. In this situation, usually no analytic derivation of the power can be obtained and one has to resort to Monte-Carlo simulations. Ideally, a Monte-Carlo study utilizes a model containing all available knowledge for a particular compound to simulate replicates of the trial and the intended analysis model (not necessarily equivalent to the simulation model) to analyze these replicates. For each analysed replicate the hypothesis test is carried out and the fraction of rejected null hypothesis provides an estimate for the power of the study. Clearly, this power estimate requires a large number of simulations and estimations to be stable, which can be time-consuming to obtain, especially when non-linear mixed effect models are used for the analysis.

A novel parametric power estimation (PPE) algorithm utilizing the theoretical distribution of the alternative hypothesis was developed in this work and compared to classical Monte-Carlo studies (fig. 1). The PPE algorithm estimates the unknown non-centrality parameter in the theoretical distribution from a limited number of Monte-Carlo simulation and estimations. Furthermore, from the estimated parameter a complete power versus sample size curve can be obtained analytically without additional simulations, drastically reducing runtimes for this computation (Ueckert et al., 2016).

A complicating factor in hypothesis testing with non-linear mixed effects models is to keep control of the type-I-error. One way to assess the actual significance level for the hypothesis test is to perform a permutation test. To facilitate this often computationally intensive procedure, a permutation test tool was developed within the free software PsN, http://psn.sourceforge.net and xpose, Lindbom et al., 2004, Keizer et al., 2013, Johnsson and Karlsson, 1999, Harling et al., 2016, Deng et al., 2015.

Recommendation 18: If fast computations of power curves are needed from a non-linear mixed effects model, we recommend using the parametric power estimation algorithm as implemented in the stochastic simulation and Estimation(SSE) tool of PsN (potentially with a type-I correction based on the “randtest” tool in PsN).

Demonstration of the value with mechanism based models in planning and analyzing studies in small population groups

We established proof-of-principle examples for how highly mechanistic systems pharmacology and/or systems biology models can be utilized in planning the analysis of clinical trials in small population groups. Based on simulations with the mechanism based models more parsimonious models suitable for estimation can utilized to understand drug effects and link to the mechanism based model (Wang et al., 2016, Wellhagen et al., 2015).

Recommendation 19: The simulation methods described above can be utilized to investigate the effects of using different, smaller, more parsimonious models to evaluate data from complicated biological systems prior to running a clinical study.

Handling model uncertainty in small population group clinical trial simulations

Model uncertainty is, for natural reasons, largest when based on estimation in a small sample size (e.g. small population groups) and at the same time a small sample size represents an extra challenge in accurately characterizing that uncertainty. Five projects were undertaken to investigate different aspects of model uncertainty of NLME models:

(1) Assessing Parameter Uncertainty Distributions Using Sampling Importance Resampling,

(2) Delta objective function value distributions as a method to diagnose uncertainty distributions, and

(3) Preconditioning of Nonlinear Mixed Effects Models for Stabilization of the Covariance Matrix and

(4) model-averaging and

(5) model based adaptive optimal design.

(1) Sampling Importance Resampling (SIR) (Rubin, 1988) was implemented in a non-linear mixed effect modeling free software as a user-friendly script. Investigation on optimal SIR settings was performed and tested on 30 real data examples. Diagnostics to judge SIR convergence were developed and can be applied to compare different uncertainty distributions. SIR now constitutes a powerful alternative to estimate and utilize parameter uncertainty, especially in the context of small populations (Dosne, Bergstrand et al., 2016).

(2) Confidence intervals determined by bootstrap and stochastic simulation and re-estimation were compared. This analysis showed that with regard to providing uncertainty estimates, bootstrap may be unsuitable for non-linear mixed effects analyses where datasets commonly would be considered “large enough”. The bootstrap delta objective function value distribution provides an easy way to assess if bootstrap results in parameter vectors contradicted by the original data (Dosne, Niebecker, Karlsson, 2016).

(3) A preconditioning method for NLME models to increase the computational stability of the variance-covariance matrix. Preconditioning is a widely used technique to increase the computational stability for numerically solving large sparse system of linear equations (Benzi, 2002). An automated preconditioning routine was made available as a part of the software package Perl-speaks-NONMEM (PsN). The results demonstrated that the variance-covariance matrix and the R-matrix can give a strong indication on the non-estimability of the model parameters if computed correctly, while other methods may not be able to do so (Aoki et al., 2016).

(4) Model averaging methods were investigated in the case of dose selection studies (phase IIb). The proposed method reduces the analysis bias originating from the model selection bias of single model structure based analysis. The proposed method can increase the probability of making correct decisions at the end of trials compared to conventional ANOVA-based Study Protocols (Aoki et al., 2014, Aoki et al., 2017).

(5) Model based adaptive optimal designs (MBAOD) were investigated for bridging studies from adults to children, and were able to reduce model parameter uncertainty. Comparing the relative estimation error of the final parameters estimates showed that MBAOD performed equally to traditional design approaches, while requiring fewer children to fulfill a commonly used precision criteria in most of the simulations (Strömberg and Hooker, 2015, 2016, 2017).

Recommendation 20: We recommend the use of Sampling Importance Resampling to characterize the uncertainty of non-linear mixed effects model parameter estimates in small sample size studies. Non-estimability of parameters may be assessed using preconditioning. The use of the bootstrap model averaging method (Method 2) (Aoki et al., 2016) is recommended when conducting model-based decision-making after a trial. Robust Model based adaptive optimal designs may be used to improve model certainty in clinical trials.

1.7. Genetic factors influencing the response to the therapy (WP8)

The power of clinical trials in small population group trials is diminished by patient’s heterogeneity. Currently it is possible to gather lots of so called “omics” (genomics, proteomics, metabolomics) data, which could be useful to describe this heterogeneity and increase the power of clinical trials as well as to define the groups of patients for personalized therapies. However, due to the relatively small sample sizes, the high dimensional “omics” data require extensive pre-processing. The main goal here is the reduction of the effective model size, so that the model parameters can be precisely estimated with the limited number of patients. Within the work on the IDeAl project, several new statistical methods for the dimensionality reduction and identification of important genetic predictors were developed. The simulation studies confirm good properties of these methods in the context of predicting the patients’ response to the treatment. Also, new theoretical mathematical results were obtained, which allow to identify the range of biological scenarios under which the popular methods of identification of important predictors are effective.

Bayesian methods for identification of genetic pathways involved in the development of disease and the response to the therapy

We developed a new approximate Bayesian methodology for identification of genetic pathways. The method clusters genes into pathways using high-dimensional gene expression data and in principle can be applied for dimensionality reduction of any type of “omics” data. The method is based on a non-trivial application of K-means algorithm, where the centre of each cluster is formed by a set of “principal components” and the distance of a given gene to a cluster centre is determined the value of the BIC criterion in the respective multiple regression model. The dimensionality of a given pathways (number of principal components) is estimated using the PEnalized SEmi-Integrated Likelihood method (PESEL) described in detail in (Sobczyk et al., 2016) and implemented in the R package “PESEL”. The number of pathways is estimated using the modified Bayesian Information Criterion (mBIC), which allows for incorporating a prior biological knowledge. The full methodology is implemented in the R package “varclust” (Sobczyk and Josse, 2016) and allows to analyse data sets much larger than ones which can be analysed with other competitive methods.

Recommendation 21: We recommend using “varclust” for clustering of gene expression data and extraction of a relatively small number of potential predictors of patients’ response to the treatment based on gene expression data.

Development and application of high dimensional model selection for identification of regulatory regions influencing detected pathways

Several methods of gene mapping were developed, which can be used for identification of regulatory regions as well as for identification of genes influencing important patients’ characteristics. We proposed a new method for identification of important genes in admixed populations (Szulc et al., 2017). The method, based on mBIC, allows to enhance the power of gene identification by using both the genotype and ancestry information of genetic markers. Two new convex methods, SLOPE and group SLOPE, for gene mapping were developed (Bogdan et al., 2015, Brzyski et al., 2017 and Brzyski et al., 2017 submitted), which allow to control the fraction of false discoveries when the true number of important genes is small or moderately large. Method “group SLOPE”, turns out to be specifically interesting in the context of identifying rare recessive variants, which might be the related to development of rare diseases. The practical limitations of convex methods of identifying important predictors are mathematically explained in Su et al. (2017 submitted).

mBIC2 and geneSLOPE can efficiently localize influential genes while controlling fraction of false discoveries. Similarly to LASSO, SLOPE allows for FDR control when the number of true causal genes is small or moderately large.

Recommendation 22: It is recommended to use the information on the ancestry of genetic markers when mapping genes in admixed population. It is also recommended to use both regular and group SLOPE, since regular SLOPE has a higher power of detection of additive gene effects, while group SLOPE allows for identification of rare recessive variants.

Statistical model relating response to the therapy in small population group trials based on identified genetic factors and other covariates, as well as their interactions.

The developed methods (mBIC2 and SLOPE) were used to estimate the genetic background and gene-treatment interaction and to predict the patients’ response to the treatment. Subsequently, a procedure for identifying the patients responsive to the treatment was proposed. New methods were compared to the classical approaches based on the single marker tests and least squares estimation in the full model as well as to the modern technique of adaptive least absolute shrinkage and selection operator (LASSO). Partial results of this study are reported in Frommlet et al. (2017 in preparation).

SLOPE, mBIC2 and adaptive LASSO have much better predictive properties than the methods based on single marker tests and the least-squares approach based on all available genetic data.

Single marker tests are very inefficient when the number of causal variants, k, is moderate or large, while the least squares approach works badly when k is small. mBIC2 and SLOPE have predictive properties similar to the ones of adaptive LASSO, with mBIC2 performing the best (having the largest precision in predicting the prognostic index and identifying responsive patients) when the number of genetic markers is larger than the sample size.

SLOPE and mBIC2 achieve these good predictive properties using much less biomarkers than adaptive LASSO, which selects many uninformative SNPs. Comparing SLOPE and mBIC2 we can observe that the methods work similarly for small k, while for larger k the predictive properties of mBIC2 are better.

Recommendation 23: If model building is based on highly correlated gene expression data, we recommend the use of SLOPE due to its computational tractability and good predictive properties.

1.8. Decision analysis (WP9)

The IDEAL project has covered several methodological areas concerning the design and analysis of clinical trials for small population groups. Work-package 9 (WP9) adds to this by analyzing decision making in trial design contexts. Furthermore, this work-package studies the interactions of different decision making stakeholders, and it provides recommendations for regulators, reimbursers and trial sponsors.

We first analyzed decision rules that varying stakeholders may have. These type of decision models are used in all subsequent manuscripts: Jobjörnsson et al. (2016) consider a sponsor’s Phase III go/no go decision and choice of sample size. Given a successful trial, it also models the sponsor’s pricing and the reimburser’s reaction to that. We next analysed the relation of sponsor’s willingness to invest to a population of candidate drugs, lay out the public incentivizing structure, in terms of requirements on clinical evidence (Miller and Burman, 2016 submitted). When a potentially predictive biomarker is present, we model how the design of the trial will affect expected public benefit as well as commercial value (Ondra et al., 2016). Further aspects of adaptations are considered (Ondra et al., 2017 in preparation). Dosing and sizing is modelled, and a decision theoretic framework for programme optimization is sketched (Burman, 2015). A pure societal perspective is set up in Jobjörnsson et al. (2016), where the goal function is simply to maximize the total health benefit in a limited population. In addition to several of the aspects studied in other WP9 publications, the thesis by Jobjörnsson (2016; Section 3.3) models the impact of in-transparency in the regulators’ benefit-risk evaluation on optimal decisions taken by the commercial sponsor.

A general suggestion is to formulate decision rules in a formal Bayesian decision theoretic framework. Even sub-optimal decisions can be modelled (Jobjörnsson et al., 2016) explicitly assessing the uncertainty from one stakeholder’s point of view of how another stakeholder will make decisions in different scenarios.

In the different publications, we have delivered guidance regarding how to formulate decision rules for varying stakeholders. The second deliverable was a software tool to allow numeric solutions of a wide variety of trial design optimization problems, using a Bayesian decision theoretic approach. The R package BDPOPT has been utilized for further research within IDeAl and it is also made publicly available (Jobjörnsson, 2015). Results in term of design optimization is provided in the different publications, for the varying situations they are studying. As seen in Jobjörnsson (2016) regarding regulatory rules, and in Jobjörnsson et al. (2016) regarding reimbursement rules, failure to communicate precise rules to other stakeholders, may lead to suboptimal design and development decisions by sponsors. One recommendation is to increase transparency in regulatory and payer decisions.

The methodology used in the work-package is based on decision theory. It has a distinct flavor of social science, when addressing policy issues, when discussing the formulation of utilities, and in assumptions about (so called) rational agents. This methodology also has some relevance to the important ethical issues around experimentation on human beings. We find that what is best for a patient, who may be included in a clinical trial, may be quite different from what gives the highest overall societal utility. We argue that the well-being of the individual patient must have priority (see Burman’s presentation at the EMA meeting, March 2017; cf. Ondra et al., 2016, page 14).

The third deliverable concerns investment decisions. It is perhaps not surprising that we find that rational sponsors are more keen on investing in drugs with larger market potential, and that sample sizes also tend to increase. We find that this behavior is partly optimal also from a public health perspective. However, there is often a discrepancy between sponsor and societal optimality. In the Ondra et al. (2016) model, larger sample sizes are generally favored from a public health view. Designs motivated by public health consideration will more often focus on the biomarker positive subpopulation. By applying mechanism design, explicitly considering how regulations will affect sponsor decisions, societal rules can be optimized. In the Miller and Burman (2016 submitted) framework, the sample size decrease with lower prevalence of the disease. Also, the regulatory requirements should be tailored to the population size. It is recommended that societal decision rules should be determined based on an understanding, and explicit modelling, of how they will inter-depend with commercial drug developing decisions.

Recommendation 24: Formulate decision rules in a formal Bayesian decision theoretic framework.

Recommendation 25: Societal decision rules (regulation, reimbursement) should be determined based on explicit modelling of how they will inter-depend with commercial drug developing decisions.

Recommendation 26: Increase transparency in regulatory and payer decisions.

Recommendation 27: The well-being of the individual trial patient must have priority.

1.9. Biomarker surrogate endpoints (WP10)

The major objective of WP10 was to develop an efficient and feasible framework for biomarker and surrogate endpoints in small population groups clinical trials. Including a proper incorporation of missing-data aspects, design aspects like randomisation methodology, optimal design, adaptive designs, decision theory, mixed models, cross-over trials as well as incorporating genetic markers and dose response information should be considered to a maximal extent. Simulation-based and other efficient estimation and evaluation methods should be used.

A viable framework for biomarker and surrogate endpoint evaluation in small population groups

Causal inference concepts have been used in the surrogate marker evaluation literature, but these developments were largely independent of meta-analytic and information-theoretic approaches. Yet, it is valuable to integrate all these frameworks to arrive at an optimal surrogate marker evaluation framework. Therefore, Alonso, Van der Elst, and Molenberghs (2015) proposed a causal-inference based for the evaluation of surrogate endpoints. The relationship between the causal-inference framework and two existing frameworks was examined: the relationship with the meta-analytic paradigm by Alonso et al. (2015) and Van der Elst et al. (2016), and the relationship with the information-theoretic framework in Alonso et al. (2016). The results are also presented in the book by Alonso et al. (2017). In particular, Chapter 15 is devoted to surrogate endpoints in rare diseases.

Recommendation 28: In case of small trials, which are in particular variable in size, we recommend the use of the causal inference framework, combined with efficient computational methods.

Surrogate endpoints and missing values

Missing data frequently arise in clinical trials, but the sensitivity of the different surrogate marker evaluation methods for missingness had not yet been studied. A large body of theory and methods to deal with missing values has been developed in other areas of statistics (e.g. in the context of longitudinal data analysis and in survey research), and a number of these results are valuable for surrogate endpoint evaluation as well. The conventional meta-analytic and information-theoretic framework imply maximum likelihood estimation, which is valid when missingness is assumed missing at random. However, maximum likelihood may be prohibitive in small studies, and therefore pseudo-likelihood and inverse probability weighting methods have been developed for missing data, that allow the use of the efficient computational methods and based on pseudo-likelihood. Results are presented in Hermans, Birhanu, Sotto et al. (2017).

Recommendation 29: In case of the evaluation of surrogate endpoints in small trials subject to missingness, we recommend the use of pseudo-likelihood estimation with proper inverse probability weighted and doubly robust corrections.

The incorporation of design aspects

To ensure the most efficient use of markers, having markers available is an important start but absolutely not sufficient. Specific design aspects have to be taken into account (adaptive designs, cross-over trials). Optimising the validation studies from various angles need to be undertaken (using randomisation methodology, optimal design results, and decision theory). Also, the use of state-of-the-art (non-linear) mixed model methodology need to be incorporated. For this, it is important to have at one’s disposition efficient and stable estimation strategies (Flórez Poveda et al., 2017 submitted).

Recommendation 30: In case of hierarchical and otherwise complex designs, we recommend using principled, yet fast and stable, two-stage approaches.

The use of genetic information

The book by Alonso et al. (2017) describes results on the use of genetic markers and genomics based markers. There are specific challenges that make the traditional validation framework less appropriate, in particular the fact of having huge amount of data, but with relatively little replication (see also Nasiri et al., 2017). This was addressed in a concerted effort, targeting biomarkers that realistically can be used in this context. In particular, Chapters 16 and 17 in the book are relevant in this context. The book in general and these chapters are accompanied by user-friendly SAS macros, R functions, and Shiny Apps.

Recommendation 31: In case of genetic and otherwise high-dimensional markers, we recommend the use the methodology expressly developed for this context, in conjunction with the software tools made available.

Incorporating dose-response information

Dose-response information is extremely valuable in the context of markers in general and surrogate endpoints in particular. It allows to study the differential effect of surrogates as a function of dose. Such differential aspects had been acknowledged, but not properly studied. This problem has been placed in the broader context of multivariate and even high-dimensional surrogate endpoints and studied in Alonso et al. (2017; Chapter 16) in the so-called QSTAR framework.

Recommendation 32: In case of a surrogate with dose-response or otherwise multivariate information present, we recommend to use the Quantitative Structure Transcription Assay Relationship framework results.

Efficient computational methods, simulation-based and other

When surrogate markers are evaluated, the use of multiple units (centers, trials, etc.) is needed, no matter which paradigm is used. It is well-known that full likelihood estimation is usually prohibitive in such complex hierarchical settings, in particular when trials are of unequal (and small) sizes. This phenomenon has been examined by van der Elst et al.(2016). Based on this, Hermans, Birhanu et al. (2017, 2017 submitted) propose solutions for simple but generic longitudinal settings with units of unequal size; these solutions are based on weighting methods. These articles and references therein provide a theoretical basis. Further, Flórez Poveda et al. (2017) provide a theoretical and practical examination of such weighting methods for the specific context of surrogate endpoints. Associated with all of this, throughout the book by Alonso et al. (2017), SAS macros, R functions, and Shiny Apps are provided that implement these methods in a user-friendly way.

Recommendation 33: In case of the evaluation of surrogate endpoints in small studies, we recommend using weighting based methods, because the methodology has been shown to work well theoretically, because it has been implemented in user-friendly SAS and R software, and because its practical performance is fast and stable.

3. Beyond IDeAl DoW

As described in the previous chapters, IDeAl has contributed to the most important areas of statistical design and analysis of small population clinical trials with a significant number of new results. This already refines the actual methodologies. However, IDeAl description of work program (DoW) stimulates further research within the group, which was addressed simultaneously. This new research go far beyond the initial IDeAl research plans. Some of these further results have already been summarized in scientific publications, whereas some other are still work in progress and therefore in the preparation phase. Among these several presentations are planned in the future

• Invited talk by Stephen Senn “Randomisation isn’t perfect but doing better is harder than you think.” 3 May 2017, Fourth Bayesian, Fiducial, and Frequentist Conference (BFF4)

• Invited talk by Frank Bretz “Threshold-crossing: A Useful Way to Establish the Counterfactual in Clinical Trials?” at BBS Spring Seminar The use of external data for decision making May 5, 2017

• Invited talk by Franz König “Threshold-crossing: A Useful Way to Establish the Counterfactual in Clinical Trials?” at PSI Conference, 16 May 2017.

• Invited talk by Stephen Senn “Thinking Statistically. What Counts and What Doesn't?” at CASI 2017, 37th Conference on Applied Statistics. Ireland. 15.-17. May 2017.

• Invited talk by Ralf-Dieter Hilgers about IDeAl and Randomization intitled “IDeAl Randomization” on May 17th, 2017 at the Department of Biometrie and Clinical Research - Seminar über neuere Methoden der Biometrie

• Several presenations and posters at the ISCB, July, 9th-13th 2018 in Vigo Spain

• Invited talk by Ralf-Dieter Hilgers about IDeAl findings special within the contributed session proposal on Small Clinical Trial for the CEN-ISBS Vienna 2017 meeting in Vienna 28.8-1.9.2017

• Invited talk by Ralf-Dieter Hilgers as invited speaker about IDeAl findings and future at the asterix end symposium 18th-19th, 2017 in Zaandam

• Invited talk by Ralf-Dieter Hilgers as key-note speaker with a presentation entitled Statistical designs of small population trials and member of Panel Discussion at Novartis, Basel October 16th -17th 2017

• Invited Tutorial “Regulatory statistics with some European perspectives” by Franz König, Martin Posch and Frank Bretz, December 2017. The 73rd Deming Conference on Applied Statistics. 7. December 2017, Atlantic City, USA. http://www.demingconference.com/

• Invited Short Course “Adaptive designs and multiple testing”. by F. König, M. Posch and F. Bretz December 2017. The 73rd Deming Conference on Applied Statistics. 7. December 2017, Atlantic City, USA. http://www.demingconference.com/

• Presentation at the Biometrisches Kolloquium (German Region) March, 2018 in Frankfurt/Main within the session “Rare Diseases”

• Presentation at the conference “Design of Experiments: New Challenges” April 30th to May 4th 2018 at the Centre International de Rencontres Mathematiques, Marseille

• Organisation of a session by Nicole Heussen about randomization at the Ninth International Workshop on Simulation will be held in Barcelona in 18-22 June, 2018

Some joint research led to recommendations in special applications. For instance, the lack of standards for reporting clinical trials using a crossover layout for evaluation of analgesic treatment for chronic pain resulted in a paper published in PAIN (Gewandter et al., 2016). There it is recommended to pay special interest to missing data, analysis method, reporting sensitivity analysis e.g. treatment x period interaction. Statistical design considerations in first in human studies, which usually are supposed to be of small size, and are necessary in all drug development programs were discussed in Bird et al., 2017. The 6 key issues highlighted in the paper are dose determination, availability of pharmacokinetic results, dosing interval, stopping rules, appraisal by safety committee, and clear algorithm required if combining approvals for single and multiple ascending dose studies. Further, in basal cell carcinoma the lack of knowledge based on historical data leads to the recommendation for a registry (Rübben et al., 2016).

In the area of personalized medicine, interindividual differences in the magnitude of response to an exercise training program (subject-by-training interaction; “individual response”) have received increasing scientific interest and are investigated form the statistical perspective (Heckensteden et al., 2015).

The following description of concomitant research is structured according to IDeAls finding, that there are three levels of actions necessary to improve statistical design and analysis methodology in small population clinical trials (Hilgers, König et al., 2016).

The first level belongs to the rigor use of the actually known and available best design and analysis methods. As linear mixed effects model become well known nowadays, they can be used successfully for the evaluation of endpoints in the longitudinal data setting. Van der Elst described useful linear mixed effects models to estimate the reliability of repeatedly measured endpoints (Van der Elst, Molenberghs, Hilgers et al., 2016). Of course this is one piece of endpoint evaluation and Van der Elst applied the methodology successfully to a small with respect to the participants – trial. However, linear mixed effects models are also useful to describe changes in treatment effects, by using the slope rather than difference in change. The former is more sensitive to changes and shows perhaps smaller variation, which makes it easier to detect differences between treatments. This was observed by applying linear mixed effects model to model the SARA 2 year registry data of the European Friedreich’s Ataxia Consortium for Translational Studies (EFACTS) (Reetz et al., 2016).

Limitations in well know approaches are the second level and with respect to that the use of randomization based inference in linear mixed effects model was considered. As IDeAl has shown that there are differences between the randomization procedures, the question arises, how the inference in linear mixed effects model with longitudinal data using the population based versus the randomization based inference coincide in small trials. (Burger, 2017 in preparation)

Another question addresses the unknown value of stratified analysis and randomization in small population clinical trials. As stratified randomization is recommended in particular in small clinical trials (CHMP 2007), the question whether this is useful in the design phase, i.e. implemented as stratified or covariate adaptive randomization procedure or in the analysis phase, i.e. implemented by including covariables in the statistical model has to be answered. This is currently investigated for dichotomous endpoint by Fitzner et al. (2017 in preparation).

Adaptive enrichment designs have received attention as they have the potential to make drug development process for personalized medicine more efficient. In Sugitani et al. (2017, submitted) we investigated different flexible alpha allocating strategies allowing the testing of an overall population and a targeted subgroup. We showed that by allowing an adaptive interim analysis to decide whether the full or only the targeted subgroup should be included for the remainder of the trial, the power can be substantially improved.

The third level concerns the development of new statistical approaches to design and analysis small population clinical trials. As randomization based inference are shown to be useful in particular in small clinical trials the problem how to deal with missing observations is not understood in randomization based inference. The questions traces back to the formulation of the so called reference set and was recently investigated in a paper (Hilgers, Rosenberger and Heussen, 2017 submitted)

Second the missing value are sometimes occur as a result of undetectable measurements. The problem is of relevance in pharmacometrics and that happens to appear in most clinical trials. The problem to derive estimates in this setting in multicentre trials where various laboratories are included in the analysis of measurements is investigated in a paper by Berger (Berger et al., 2017 in preparation).

New medicines for children should be subject to rigorous examination whilst taking steps to avoid unnecessary experimentation. Extrapolating from adult data can reduce uncertainty about a drug’s effects in younger patients meaning smaller trials may suffice. Assuming that conduct of this trial is conditional on having demonstrated a significant beneficial effect in adults, this goal is achieved by adopting a Bayesian approach to incorporate these adult data into the design and analysis of the paediatric trial (Hampson et al., 2017 in preparation)

The design of a combined seamless phase I/II trial as recommended by the Japan Interventional Radiology in Oncology Study Group (Kobayashi et al., 2009) is elaborated within a bachelor’s thesis under supervision by Ralf-Dieter Hilgers and Marcia Rückbeil. The design is intended to address questions of safety and efficacy while keeping the samples size very small. Furthermore, another master’s thesis supervised by Ralf-Dieter Hilgers and Marcia Rückbeil tackles the problem of how to incorporate registry data in a randomized clinical trial using a frequentist approach.

And finally, it has to be noted that big data aspects are present in small population research as well. The bridge between big data and small population clinical trials was build up resulting in recommendations for an European Union action plan in Auffray et al., (2016).

Ralf-Dieter Hilgers is currently working on two papers summarizing IDeAl results at different levels. First, the synergy paper together with Kit Roes (asterix) and Nigel Stallard (InSPiRe) developing a joint statement of all three project as an overall perspective. Further, to sum up the IDeAl view more specific a second paper of the IDeAl group is under preparation.

1.4 Summary

To summarize IDeAl’s results

• In WP 2 we developed a new methodology for the selection of the best practice randomization procedure and subsequent analysis for a small population clinical trial taking possible bias into account.

• In WP 3 we developed a new optimized design and analysis strategy for comparing dose response profiles to extrapolate clinical trial results from a large to a small population.

• In WP 4 we developed statistical methods to adapt the significance level and allow confirmatory decision-making in clinical trials with vulnerable, small populations.

• In WP 5 we developed design evaluation methods enabling small clinical trials to be analysed through modelling of continuous or discrete longitudinal outcomes.

• In WP 6 we developed approaches to planning and analysing trials for identifying individual response and examining treatment effects in small populations.

• In WP 7 We developed new methods for sample size calculation, type 1 error control, model averaging and parameter precision in small populations group trials within non-linear mixed effects modelling.

• In WP 8 we developed new methods for identifying biomarkers and prognostic scores based on high dimensional genetic data in small population group trials.

• In WP 9 we evaluated how to optimise the overall value of drug development to patients, to regulators and to society under opacity in regulatory and payer rules as well as in very rare diseases.

• In WP 10 we developed methodology to evaluate potential surrogate markers and to analyse data from small numbers of small trials, with emphasis on fast and easy computational strategies

From these results we derive the following list of recommendations:

Recommendation 1. Do not select a randomization procedure by arbitrary arguments, use scientific arguments taking into account the expected magnitude of bias.

Recommendation 2. In case of randomized clinical trial emphasis should be given to the selection of the used randomization procedure by following ERDO using randomizeR.

Recommendation 3. In case of a randomized clinical trial, we recommend to conduct a sensitivity analysis to elaborate the impact of bias on the type-I-error probability.

Recommendation 4. The comparison of dose response curves should be done by the bootstrap approach developed by Dette et al. (2017) instead of Gsteiger et al. (2011).

Recommendation 5. If the aim of the study is the extrapolation of efficacy and safety information, we recommend to consider and compare the MEDs of two given populations.

Recommendation 6. The derived methodology shows a very robust performance and can be used also in cases where no precise information about the functional form of the regression curves is available.

Recommendation 7. In case of planning a dose-finding study comparing two populations, we recommend to use optimal designs in order to achieve substantially more precise results.

Recommendation 8. In case of confirmatory testing, we recommend adapting the significance level by incorporating other information (e.g. using information from drug development programs in adults for designing and analysing pediatric trials).

Recommendation 9. In case of design modification during the conduct of a confirmatory clinical trial, we recommend using adaptive methods to ensure that the type-I-error is sufficiently controlled not to endanger confirmatory conclusions. Especially in clinical trial with multiple objectives special care has to be taken to address several sources of multiplicity.

Recommendation 10. In case randomized control clinical trials are infeasible, we propose “threshold-crossing” designs within an adaptive development program as a way forward to enable comparison between different treatment options.

Recommendation 11. For evaluation of designs of studies with longitudinal discrete or time to event data, evaluation of the Fisher Information matrix should be done without linearization. Using the new approach MC-HMC (in MIXFIM) will provide adequate prediction of standard errors and allow to compare several designs.

Recommendation 12. When there is little information on the value of the parameters at the design stage, adaptive designs can be used. Two-stage balanced designs are a good compromise. The new version of PFIM can be used for adaptive design with continuous longitudinal data.

Recommendation 13. When there is uncertainty in the model and on their parameters, a robust approach across candidate models should be used to design studies with longitudinal data.

Recommendation 14. We recommend that response should not be defined using arbitrary and naïve dichotomies but that it should be analysed carefully paying due attention to components of variance and where possible using designs to identify them.

Recommendation 15. For the analysis of n-of-1 trials, we recommend using an approach that is a modified fixed-effects meta-analysis for the case where establishing the treatment works is the object and an approach through mixed models if variation in response to treatment is to be studied.

Recommendation 16. When analysing between-patient studies we recommend avoiding information destroying transformations (such as dichotomies) and exploiting the explanatory power of covariates, which may be identified from ancillary studies and patient databases.

Recommendation 17. In case of a conducting a series of n-of-1 trials we recommend paying close attention to the purpose of the study and calculating the sample size accordingly using the approach by (Senn, 2017).

Recommendation 18. If fast computations of power curves are needed from a non-linear mixed effects model, we recommend using the parametric power estimation algorithm as implemented in the stochastic simulation and Estimation(SSE) tool of PsN (potentially with a type-I correction based on the “randtest” tool in PsN).

Recommendation 19. The simulation methods described above can be utilized to investigate the effects of using different, smaller, more parsimonious models to evaluate data from complicated biological systems prior to running a clinical study.

Recommendation 20. We recommend the use of Sampling Importance Resampling to characterize the uncertainty of non-linear mixed effects model parameter estimates in small sample size studies. Non-estimability of parameters may be assessed using preconditioning. The use of the bootstrap model averaging method (Method 2) (Aoki et al., 2016) is recommended when conducting model-based decision-making after a trial. Robust Model based adaptive optimal designs may be used to improve model certainty in clinical trials.

Recommendation 21. We recommend using “varclust” for clustering of gene expression data and extraction of a relatively small number of potential predictors of patients’ response to the treatment based on gene expression data.

Recommendation 22. It is recommended to use the information on the ancestry of genetic markers when mapping genes in admixed population. It is also recommended to use both regular and group SLOPE, since regular SLOPE has a higher power of detection of additive gene effects, while group SLOPE allows for identification of rare recessive variants.

Recommendation 23. If model building is based on highly correlated gene expression data, we recommend the use of SLOPE due to its computational tractability and good predictive properties.

Recommendation 24. Formulate decision rules in a formal Bayesian decision theoretic framework.

Recommendation 25. Societal decision rules (regulation, reimbursement) should be determined based on explicit modelling of how they will inter-depend with commercial drug developing decisions.

Recommendation 26. Increase transparency in regulatory and payer decisions.

Recommendation 27. The well-being of the individual trial patient must have priority.

Recommendation 28. In case of small trials, which are in particular variable in size, we recommend the use of the causal inference framework, combined with efficient computational methods.

Recommendation 29. In case of the evaluation of surrogate endpoints in small trials subject to missingness, we recommend the use of pseudo-likelihood estimation with proper inverse probability weighted and doubly robust corrections.

Recommendation 30. In case of hierarchical and otherwise complex designs, we recommend using principled, yet fast and stable, two-stage approaches.

Recommendation 31. In case of genetic and otherwise high-dimensional markers, we recommend the use the methodology expressly developed for this context, in conjunction with the software tools made available.

Recommendation 32. In case of a surrogate with dose-response or otherwise multivariate information present, we recommend to use the Quantitative Structure Transcription Assay Relationship framework results.

Recommendation 33. In case of the evaluation of surrogate endpoints in small studies, we recommend using weighting based methods, because the methodology has been shown to work well theoretically, because it has been implemented in user-friendly SAS and R software, and because its practical performance is fast and stable.

The following list contains references not included in the appendix.

(1) Benzi, M. (2002): Preconditioning techniques for large linear systems: a survey. Journal of Computational Physics 182, 418–477.

(2) Berger, R. L. (1982): Multiparameter hypothesis testing and acceptance sampling. Technometrics 24, 295–300.

(3) COMMITTEE FOR MEDICINAL PRODUCTS FOR HUMAN USE (2009): Guideline on clinical trials in small populations. [Online] 2006. [Cited: February 1, 2013.] www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC 500003615.pdf.

(4) Danhof, M., de Lange, E. C. M., Della Pasqua, O. E., Ploeger, B. A. & Voskuyl, R. A. (2008): Mechanism-based pharmacokinetic-pharmacodynamic (PK-PD) modeling in translational drug research. Trends in pharmacological sciences 29, 186–91

(5) Dumont, C., Chenel, M., Mentré, F. (2016): Two-stage adaptive designs in nonlinear mixed effects models: application to pharmacokinetics in children. Communications in Statistics - Simulation and Computation, 45: 1511-25.

(6) Gisleskog, P. O., Karlsson, M. O. & Beal, S. L. (2002): Use of prior information to stabilize a population data analysis. Journal of pharmacokinetics and pharmacodynamics 29, 473–505

(7) Gsteiger, S. , Bretz, F. and Liu, W. (2011): Simultaneous Confidence Bands for Nonlinear Regression Models with Application to Population Pharmacokinetic Analyses, Journal of Biopharmaceutical Statistics, 21(4), 708- 725.

(8) Ivanova, A., Barrier, R. C., and Berger, V. W. (2005): Adjusting for observable selection bias in block randomized trials. Statistics in Medicine 24, 1537–1546.

(9) Jonsson, E. N. & Karlsson, M. O. (1999): Xpose--an S-PLUS based population pharmacokinetic/pharmacodynamic model building aid for NONMEM. Computer methods and programs in biomedicine 58, 51–64 (1999).

(10) Karlsson, K. E., Vong, C., Bergstrand, M., Jonsson, E. N. & Karlsson, M. O. (2013): Comparisons of Analysis Methods for Proof-of-Concept Trials. CPT: Pharmacometrics & Systems Pharmacology 2, e23

(11) Keizer, R. J., Karlsson, M. O. & Hooker, A. (2013): Modeling and Simulation Workbench for NONMEM: Tutorial on Pirana, PsN, and Xpose. CPT: pharmacometrics & systems pharmacology 2, e50

(12) Kobayashi, T. et al. (2009): Phase I/II clinical study of percutaneous vertebroplasty (PVP) as palliation for painful malignant vertebral compression fractures (PMVCF): JIVROSG-0202. Annals of Oncology 20: 1943-1947.

(13) Lesko, L. J. (2012): Drug Research and Translational Bioinformatics. Clinical Pharmacology & Therapeutics 91, 960–962

(14) Lestini, G., Dumont, C., Mentré, F. (2015): Influence of the size of cohorts in adaptive design for nonlinear mixed effects models: an evaluation by simulation for a pharmacokinetic and pharmacodynamic model for a biomarker in oncology. Pharmaceutical Research, 32:3159–69.

(15) Lindbom, L., Ribbing, J. & Jonsson, E. N. (2004): Perl-speaks-NONMEM (PsN)—a Perl module for NONMEM related programming. Computer Methods and Programs in Biomedicine 75, 85–94

(16) Liu, W., Bretz, F., Hayter, A. J. and Wynn, H. (2009): Assessing nonsuperiority, noninferiority, or equivalence when comparing two regression models over a restricted covariate region., Biometrics, 65 (4). pp. 1279-1287

(17) Marshall, S., Macintyre, F., James, I., Krams, M. & Jonsson, N. E. (2006): Role of mechanistically-based pharmacokinetic/pharmacodynamic models in drug development : a case study of a therapeutic protein. Clinical pharmacokinetics 45, 177–97

(18) Rubin, D. B. (1988): Using the SIR Algorithm to Simulate Posterior Distributions. Bayesian Statistics 3, 395–402

(19) Ting, N. (2006), Dose Finding in Drug Development, New York: Springer.

Potential Impact:

1. Potential impact

Ultimately, the main positive impact of IDeAl will be to patients with rare diseases, patients in other small groups (such as children or biomarker-defined subpopulations) and – through spill-over effects – also to patients in larger disease populations. The positive effects for individuals will partly be mediated by progress in trial methodology, partly by improved regulatory and reimbursement decisions. Methodological progress will lead to more cost-effective and reliable trials, facilitating clinical research and paving the way for medical and pharmaceutical advances. Building a more rational and transparent basis for societal decisions will incentivize commercial investments and lead to the marketing of a larger number of pharmaceuticals.

Science

The development of statistical and pharmacometric methodology is the backbone of the IDeAl project. Successful methodological development is a necessary, but not sufficient, condition for meaningful impact to patients and to the society. We will start by discussing IDeAl’s role for the relevant methodological sciences, and will return to discussing important contributions to the wider society below.

Direct scientific impact can be indicated by the number of publications and presentations, which journals have been targeted and by the number of citations of different articles. Bibliometric measurements have well-known pros and cons, and we will only use such approaches briefly to give some indication of the quality of the output. At the time of writing, 63 articles with support from the IDeAl grant have been published or are accepted for publication in scientific journals from a range of different disciplines.

Articles have been published in some of the very best statistical journals, such as Annals of Statistics, Journal of the American Statistical Association (JASA), Biostatistics, Biometrics, and Statistics in Medicine. IDeAl has published seven articles in the latter journal, which can be said to be the premier statistical journal dedicated to medical applications. Journals in Statistics and related fields have traditionally had much lower Impact Factors than many other disciplines, that is, rapid citations are relatively in-frequent. As the project recently ended, the vast majority of articles are published this year or the last. It is therefore worth noting that several IDeAl articles in statistical journals have already been cited frequently. For example, Google Scholar lists 50 citations for Bogdan et al. (Ann Appl Stat, 2015); and 20+ citations each for Alonso et al. (Biostatistics, 2015), Bauer et al. (Stat Med, 2016) and König et al. (Biom J, 2015).

Articles have also been published in leading journals in a range of quantitative disciplines outside Statistics, such as Pharmacometrics (J Pharmacokin Pharmacodyn; Clin Pharmacol Ther), Health Economics (JHE) and Epidemiology (Eur J Epid). This indicates that the research within IDEAL has impact more broadly within methodological fields. For example, there are already 99 citations of Greenland et al. (Eur J Epid, 2016), not much more than one year after publication.

Also a couple of medical journals have been targeted by IDeAl sponsored publications. For example, both Gewandter et al. (Pain, 2014) and Hecksteden et al. (J Appl Physiol, 2015) have more than 30 citations each in the database of Google Scholar. Further IDeAl contributed to publications in the high ranking medical journals Lancet Neurology (Reetz, 2016) and JAMA Dermatology (Rübben, 2016) as well as rare disease specific one (Hilgers, Roes and Stallard 2016, Orphanet Journal).

Work from the IDeAl project has generated more than 170 presentations at conferences, workshops, etc. and exposed a large number of other scientists to the ideas and methodology developed within the project. A number of PhD students and career-young scientists have been trained through IDeAl -related work, and have moved on to serve the scientific community.

The Drug Information Association (DIA) is the leading professional organization for individuals working with drug discovery, development and regulations, and has over 18 000 members. DIA’s Adaptive Design Scientific Working Group (ADSWG) is arguable the world’s leading working group for issues around the design of clinical trials. The IDeAl project has presented twice at the ADSWG Key Opinion Leader (KOL) lecture series. Two of the IDeAl work package leaders, König and Burman, proposed at the beginning of the IDeAl project that DIA’s ADSWG should form a subteam to work on designs for small populations. This subteam has collected scientists from industry, academia and regulatory agencies, mainly from US and Europe. To date, the subteam has published almost 20 scientific articles, underlining the impact and global leadership that IDeAl has exercised globally and cross disciplines.

Clinical research

The methodology developed and refined within the IDeAl project will influence clinical research in a number of ways, ranging from the development and validation of genetic biomarkers, over extrapolation techniques, to optimizing trial design and analysis.

The importance of biomarkers is rapidly increasing in clinical research. This trend may be especially important for rare diseases. First, these diseases often have a stronger and simpler genetic causality than what is likely the case for large-population health problem, such as obesity, type II diabetes mellitus and smoking addiction. Secondly, the use of response biomarkers (as opposed to e.g. genetically based predictive biomarkers) may be more important when the limited patient population size makes it infeasible to use dichotomous hard endpoints or survival time as the primary variable in confirmatory trials. Thirdly, modern drug development often aims at personalizing treatments; biomarkers are used to define subpopulations for which different treatments may be optimal. Existing methods for the identification of important genes have been refined and new methods have been added to the arsenal of the geneticists. As an example, the “group SLOPE” approach (Brzyski et al., 2017 submitted) is especially useful for the understanding of the genetic origin of rare diseases. To aid the extrapolation of clinical data from one population to another, e.g. from an adult population to a much smaller pediatric one, one important step can be to compare the dose response curves for a response biomarker, which is easier to measure than the hard clinical endpoint. IDeAl has improved the previous standard methodology for comparing two such curves, and showed that the efficiency can be considerably increased using a bootstrap approach (Dette et al., 2017). Response biomarkers should preferably be validated and IDeAl has been working on integrating different approaches to find optimal ways for this. For small populations groups, a causal inference framework has proven especially useful (Alonso et al., 2017). This book also propose methods that are tailored to the specific problems of validating genetic markers, based on data from several studies. As opposed to this, the two papers by Ondra et al. (2016, 2017 in preparation) demonstrate how the design of a single clinical trial should be optimized when there exists a potentially important predictive biomarker that can be used to qualify the population.

Such enrichment designs, which can be either fixed or adaptive, as well as validating response variables with improved signal-to-noise ratio, are two examples of the design-related part of IDeAl’s work. The design of clinical trials in small population groups can be made more efficient and effective in a number of other ways. On one hand, much of the methodology to increase study power can be applied in large populations as well as in small. On the other hand, the balance is shifted from robust inference to effective inference when going from larger to increasingly smaller population sizes: For large cardiovascular and metabolic diseases, it may be feasible to randomize over 10 000 patients in a parallel group design to address mortality (and other hard endpoints). Such a design is robust in the sense that a significant result relies on very few non-verifiable conclusions. Mortality, life or death, is undeniable an important outcome to the patient, and there is a relatively small risk of bias. That the sample size needs to be huge to achieve sufficient power, and that the corresponding trial cost may be hundreds of million euros, may be acceptable if there are millions of eligible patients and the potential value of a successful treatment is great. For a much smaller patient population, the trial size will by necessity be much smaller. In order to be able to compare treatment options, it is therefore crucial to obtain as much information per patient as possible. This means that basing the inference on some reasonable assumptions may be justified if the signal-to-noise ratio can be improved. IDeAl methodology to improve this includes response biomarkers (Van der Elst et al., 2016), pharmacometric modelling (Karlsson et al., 2013) and cross-over or n-of-1 designs (Hecksteden et al., 2015). Repeated measures for each patients can be utilized through pharmacokinetic / pharmacodynamics (PK/PD) modelling to learn more from each patient. With small sample sizes, it is increasingly challenging to estimate the parameters in this type of non-linear models. IDeAl has therefore improved computational methods to allow more reliable estimations so that the potential gains in terms of sample size reduction can indeed be realized by clinical research groups conducting small trials. Optimal design methodology can be applied to further enhance the information retrieval. Such methodology is combined with adaptive design features to decrease the reliance on a priori assumptions, and novel computational methodology is provided (Loingeville et al., 2017). Although many clinical researchers are well aware of the benefits of conducting randomized clinical experiments, standard randomization procedures, which may be perfectly applicable to large trials, are often used also when the sample size is very small. IDeAl has worked tirelessly to raise awareness (e.g. Reetz et al., 2016, Gewandter et al., 2014, Greenland et al., 2016, Hilgers, Roes and Stallard 2016, Jonker et al. 2016, Lendrem et al., 2015, Hilgers et al., 2017), throughout the academic and industrial communities, of the importance of tailoring the randomization to the specific trial circumstances, including the sample size and risk of selection bias. The wider adoption of the ERDO framework, developed by IDeAl, will lead to more rational randomized patient allocation procedures, giving trial results that are more robust to selection bias and to inflation of the conditional type-I-error rate.

To facilitate applications in clinical research, IDeAl has developed several software packages, e.g. for randomization (Uschner et al., 2017); identification of genetic pathways (Brzyski et al., 2017 in preparation); surrogate markers (Alonso et al., 2017), extrapolation and curve comparisons (Möllenhoff, 2016); decision theoretic design optimization (Jobjörnsson, 2015); n-of-1 trials (Araujo et al., 2017), optimal designs for nonlinear mixed effects models (Riviere and Mentré, 2015).

Industry

Pharmaceutical and biotech industry in Europe will benefit from methodology that improves the identification of genetic / genomic markers, facilitating target identification in drug discovery. The finding of new mechanisms of action will allow a wider range of drug classes to be developed, tested and, if found to be beneficial in terms of benefit/risk, be marketed. A significant hindrance for the development of new drug classes is the enormous costs associated with drug development, costs that to a large extent are attributable to clinical trials. The IDeAl-developed methods, briefly discussed under the “Clinical research” above, that increase trial efficiency, power, and probability of success, constitute a partial antidote to accelerating trial costs. Furthermore, research within IDeAl has shown how the expected net present value can be maximized, by tuning design parameters as sample size and trial prevalence (Ondra et al., 2017 in preparation). The pricing of a new pharmaceutical has also been optimized (Jobjörnsson et al., 2016).

Commercial drug development is heavily dependent on EU regulations, EMA decisions and national reimbursement decisions. IDeAl has demonstrated that if pharmaceutical companies experience an intransparency in such societal decision rules, such as uncertainty of how benefit/risk and cost/effectiveness are weighted, the industry will not be able to design the best possible trial programmes (Jobjörnsson et al., 2016; Jobjörnsson, 2016).

Regulatory processes and health care systems

It is therefore important that regulators and payers strive for greater clarity. To be concrete, the relative importance of an anticipated safety outcome compared to the intended positive effect of a pharmaceutical can be discussed and defined at an early stage. Also, EMA and price regulators can work together to understand what is most important to the patients and the wider community. In case European regulators and payers can formulate clearer rules and align decisions between them, both industry and patients will benefit. This was one of the messages that IDeAl sent to regulators and payers when the European Medicines Agency (EMA) arranged the “Seventh Framework Programme small-population research methods projects and regulatory application workshop”, 29-30 March 2017 at the EMA headquarters in London.

This workshop was set up to discuss ideas from IDeAl and the related EU projects asterix and InSPiRe. The workshop, which is recorded and can be viewed from www.ema.europa.eu summarized much of the regulatory related output of the projects, and sparked much discussion. In her closing remarks, Anja Schiel, the chair of EMA’s Biostatistics Working Party commented on the value of the three projects: “I must say: Yes, cost effectiveness is on your side if I see what you put as an output and what it costed. That was a cheap gain, to put it mildly”.

IDeAl has directly interacted with regulatory representatives through a number of other channels as well. Examples include the comments on several regulatory documents, including draft guidance, the IRDiRC-EMA workshop on rare diseases, and the inclusion of regulators in the ADSWG small population working group. It could also be mentioned that one of the IDeAl researchers are currently seconded at the EMA as a statistical reviewer. And finally, IDeAl is the starting point for further research together with regulators, e.g. the FDA-project on evaluation of model-based bioequivalence statistical approaches of Holger Dette with France Mentré. Finally, the revision of the EMA guidance for “Clinical trials in small populations” is announced for 2018 and the findings of the three projects asterix, IDeAl and InSPiRe will be the backbone of the revision.

Patients and community

IDeAl was set out to improve the methodology for clinical trials in small population groups. That means that the most direct results are in terms of methodological advances, first communicated through publications and conference presentations. The positive results, regarding identification of genetic markers, improving trial efficiency, and optimising design, to mention a few, will gradually be taken up by clinical researchers. Our dissemination activities are raising awareness in the wide clinical community through a number of different routes. The smaller group of European regulators has been more directly targeted e.g. through the two EMA workshops. The European industry is reached partly through direct research contacts with statistical methodology groups at major companies, partly through many presentations at the key European pharmaceutical statistics meetings, and partly through the more widely targeted dissemination activities.

Novel and improved IDeAl methodology will therefore be used by different stakeholders to optimise trial design, analysis and regulations. The ultimate benefiters of this will be current and future patients with rare diseases or belonging to other small (sub)populations. Primarily, the patients will benefit from new medical treatments, the development of which is facilitated through IDeAl’s results on genetics as well as incentivising trials by cutting costs and improving quality. Secondarily, IDeAl methodology to reduce sample sizes will expose fewer patients to clinical experimentation and also lead to novel drugs faster reaching the patients in need. Trial patients will also benefit if IDeAl’s ideas on improving trial ethics gains wider attraction.

The community directly benefits from the improvement of the health of EU citizens. Indirectly this may translate into economic gains. Mechanism design approaches may also help optimise the society’s allocation of resources. Finally, the build-up of personal competence will aid European industry and research institutions. This clearly shows, that IDeAl contributes to the research in rare disease and as Dr. Irene Norstedt (Head of Innovative and Personalised Medicine Unit, European Commission) stated at the “Conference on the Development and Access of Medicines for Rare Diseases” held under the Maltese Presidency of the Council of the European Union in Valetta, March 21th 2017 stated, that EU assessed IDeAl as one of the lighthouse project in rare disease research funded by EU programs.

The former makes clear, that the EU funding kick of a snowball of methodological research in small population clinical trials, which currently expresses European leadership. The FDA project of Holger Dette and France Mentré on model-based methods to analyse bioequivalence studies is a positive sign on the one hand, however, potentially bears the risk, that European leadership get lost if further funding will not be implemented.

2 Main dissemination activities (WP11)

From the very beginning, IDeAl used different means to disseminate the results in various areas. The organizational structure was a key factor to successfully implement the dissemination activities at a worldwide level. IDeAl’s dissemination activities address different levels with a highlight list of progression in statistical methodology of small population clinical trials:

• networking leading to the connection to five other EU projects and involvement in the IRDIRC steering committee of the IRDIRC task force on small population clinical trials

• written output including actually 63 peer reviewed publication with new research findings and using social media channels as well as the IDeAl website,

• oral output including more than 170 presentations

• awareness sessions including short courses, tutorial and an IDeAl webinar series (with 11 webinars which are accessible via the IDeAl website)

• input to regulatory documents and guidelines with comments of four guidelines, actively giving input to the EMA “Extrapolation guideline” and conducting a joint workshop at the EMA

The spread of IDeAl work package leads over Europe and in cooperation of the external advisory board offers challenges and opportunities. The IDeAl group accepted the challenges and used the opportunities to disseminate the results and ideas to the stakeholders. Some finding need time to being implemented while braking barriers of traditional thinking in clinical trial methodology

The dissemination activity start from the inner part of the project network via the external advisory board. Researchers of the IDeAl project effectuated stays abroad to present the first results of our on-going project at international conferences and workshops, both internal and external (see highlight sections on WPs). Short-term face-to-face visits to other partners have been conducted, e.g. Ralf-Dieter Hilgers and Nicole Heussen (UKA) visited UHasselt, MUW, Carl-Fredrik Burman and Sebastian Jobjörnsson (CTH) visited MUW, France Mentré (INSERM), Sebastian Ueckert (UU), Marie-Karelle Riviere (INSERM) visited UHasselt, Malagorzata Bogdan (PWR) visited Chiara Sabatti (EAB, Stanford University). Ralf-Dieter Hilgers visited Professor Rosenberger and vice versa working on selection of best practice randomization procedures. The results were presented at the FDA on a video streamed talk on May, 6th, 2016. Professor Nicole Heussen presented her findings about missing mechanism acting in randomization based inference at the George Mason University on April, 29th 2016. Diane Uschner presented the randomizeR software at the NIH on October, 28th 2015.

Furthermore, the IDeAl project also stimulated additional research visits in the opposite direction. William F. Rosenberger (EAB) stayed for 3 month at the Department of Medical Statistics at the RWTH Aachen University funded by a Fulbright grand to work with several groups of the IDeAl Consortium, e.g. he visited Holger Dette and Franz König during his stay. His research visit was funded independent from the IDeAl grant. Gernot Wassmer (EAB) visited MUW to work on an invited paper on adaptive designs (research stay visit externally funded) c.f. section ‘publications resulting from the IDeAl project in the period’.

The IDeAl consortium established a “Young Scientist research group” which meets more than 10 times at various conferences and IDeAl meetings promoting the interactions between the workpackages. The young scientist group also serves as an investment in the future of dissemination of the IDeAl findings. Nine young scientist already get a position in pharmaceutical industry and by regulators.

The consortium organized dedicated sessions at conferences, e.g. MCP 2015 in Hyderabad, as well as full conferences. Highlights were the Design and Analysis of Experiments in Healthcare workshop in Cambridge, 2015 and together with asterix and InSPiRe the joint Small Population Symposium in 2014 in Vienna, the Seventh Framework Programme (FP7) Small-population Research Methods Projects and Regulatory Application workshop at the EMA in 2017.

The IDeAl consortium also organized two half-day seminars together with the International Biometric Society (IBS) – Viennese Section, whereby the seminar on ‘Innovative Methods in Drug Development’ in 2016 was co-hosted with the FP7 project Asterix.

The one week workshop on “Design and Analysis of Experiments in Healthcare” was organized by Rosemary Bailey (EAB), Ralf-Dieter Hilgers and Holger Dette at the Isaac Newton Institute, Cambridge, UK in July 2015. It was accompanied with a one day industry workshop on “Design of Experiments in Drug Development”. Here the challenges and results of IDeAl were discussed with selected experts in the field of statistical design from all over the world.

These activities led to including IDeAl as part of International Rare Diseases Research Consortium (IRDiRC) task force to advance progress in the field of “Small Population Clinical Trials”. Ralf-Dieter Hilgers was nominated as member of the steering committee in 2016. The IDeAl consortium participated to the Joint Workshop on “Small Population Clinical Trials Challenges in the Field of Rare Diseases” organized by IRDiRC hosted by the EMA on March 3rd 2016 and contributed to the IRDiRC report “Small Population Clinical Trials: Challenges in the Field of Rare Diseases” (July 2016).

IDeAl cooperated with 4 other EU research projects, EFACTS (FP7 Health 242193) addressed to the Friedreich Ataxia disease (see Reetz et al., 2016), the IMI project DDMoRe (FP7 IMI 115156 see Rivière et al.,2016), and the two other FP7 funded projects aiming to refine the statistical methods in small population group trials asterix (FP7 Health 603160) and InSPiRe (FP7 Health 602144), see publications Auffray et al. (2016), Graf, Posch and König (2014), König et al. (2015), Hlavin, König et al. (2016), Eichler et al. (2016), Magirr et al. (2016), Jonker et al. (2016), with the seminal paper of Hilgers, Roes and Stallard (2016).

The IDeAl group has published 63 papers (status June 2017) in peer reviewed journals, of which 34 are open access, and 11 of them in collaboration with a member of the external advisory board. The scope of the publications ranges from review articles to expert opinions, from applications to theoretical papers and layman description.

The papers received a lot of attention. For example, a joint paper including co-authors from InSPiRe, asterix and the European Medicines Agency on sharing clinical trial data (König et al., 2015) is currently listed as one of the most accessed and cited papers in Biometrical Journal. Three IDeAl papers were among the most accessed papers in the prestigious journal “Statistics in Medicine a paper on “Mastering variation: variance components and personalised medicine” (Senn, 2015), on extrapolation (Hlavin et al., 2016) and the featured article on 25 years of adaptive design (Bauer et al., 2015). The latter was also listed among the most cited in Statistics in Medicine in 2015 & 16. In Pharmaceutical Statistics the paper on adaptive paediatric investigation plans (Bauer and König, 2016) on most accessed in 2016.

As pointed out in one of the papers (Hilgers, König et al., 2016 ), the scope ranges from (i) advocating the rigor use of best available statistical methods to (ii) showing limitations when traditional methods fail to finally (iii) developing new statistical designs and analysis methods for small population clinical trials. Among review articles the joint paper with asterix and InSPiRe coordinators Kit Roes and Nigel Stallard in the Orphanet Journal show the options for development beyond the EMA guideline (Hilgers, Roes, Stallard, 2016). Applications of the best available statistical analysis methods to Friedreich Ataxia registry data leads to a joint publication in Lancet Neurology with the FP7 funded EFACTS project (Reetz, 2016). Of course, there are a number of various contributions to the biostatistical field in form of publications, but the consortium does not rest at that point. The project description in the May 2017 issue of impact – “Multidisciplinary health research” reaches among the 35 000 readers across Europe, North and South America, Asia-Pacific and Africa a core audience including national and regional funding agencies, research funding bodies, national, regional and local government, public sector organisations, policy and legislation organisations, universities, research institutes, research centres, NGOs and key industry/private sectors. However, IDeAl’s work does not end with the project lifetime. There are still at least 11 papers in the preparations phase and additional 14 already submitted.

In addition to publishing papers in “traditional” journals, the IDeAl consortium also had several blogs to disseminate results and have discussions in a timely manner.

Another important mean to inform the interested public was the electronic bi-annual newsletter, which was prepared by the IDeAl team. There have been seven issues, which were sent per email to over 200 registered e-mail addresses in 28 countries. Furthermore, all newsletter have been published on the IDeAl webpage. The information covered for example new research findings, list of accepted papers in peer-reviewed journals, information about meetings and talks as well as new software tools. The description of almost all software tools is included in the description of the publications including the theoretical background. Fifteen software packages, which are published on the CRAN website coming with a manual. All software tools can be accessed via the project website.

IDeAl used also social media account on TWITTER (account “@ideal_fp7”; >120 followers) and LinkedIn (group “IDEAL - FP7 Project “; >50 followers) to inform the broad public and important stakeholders on the progress and to promote IDeAl related events and main research activities. Also press releases have been launched to promote meetings and highlights like awards for research papers (see webpage).

With more than 170 oral presentations IDeAl reaches all relevant stakeholders. This includes presentations at the European Medicine Agency (EMA, 2014, 2016), the Food and Drug Administration (FDA) in 2016, at the Committee of Orphan Medicinal Products (COMP, 2017), EURORDIS (2014), Pharmaceuticals and Medical Devices Agency, Japan (PMDA) in 2015, at the Adaptive design working party of drug information association (DIA ADSWG) in 2014, Key Opinion Leaders (KOL) in 2016 and national institute of health (NIH, 2015). To bring research into practice it was important to provide sufficient training to clinicians, statisticians, sponsors and investigators of clinical trials. In addition to (usually short) oral presentations at conferences, we have successfully provided tailored trainings (e.g. short courses, workshops, tutorials, or summer schools) together with important and relevant learnt societies such as EURORDIS, ISCB or IBS. In total 21 workshops/short course and tutorials held by IDeAl members. Some of them are still available as videos, like the talks of workshop at the Isaac Newton Institute 2015.

In 2016 the IDeAl consortium launched its own IDeAl webinar series including 11 online lectures. The videos are still available via the project website linking to the IDeAl youtube channel. The webinars inform about the main research results of each work package. In average 50 participants from various stakeholders followed the webinars leading to an increased visibility of the project, while various researchers used to service to download the streamed videos from the website. Additional webinars were given for example to the DIA adaptive design working group, the American Statistical Association (ASA), the Royal Statistical Society (RSS) and Statisticians in the Pharmaceutical Industry (PSI). To give online webinar helped to reach out to a broad and international audience in an easy accessible way.

Finally, IDeAl had regular interactions with regulatory bodies throughout the conduct of the project (>15 occasions). IDeAl has commented on 5 EMA guidelines, has contributed to the reflection paper on extrapolation of efficacy and safety in paediatric medicine development, has contributed to IRDiRC’s report on small population clinical trials in rare diseases and conducted a joint workshop together with asterix and InSPiRe at the EMA in March 2017 to agree on relevant regulatory standards and methods in small population clinical trials. IDeAl members were invited to give special lectures to other international regulatory bodies including the US FDA (Hilgers and Heussen in 2016) and the Japanese regulatory agency PMDA (Bauer, König together with Posch in 2015).

In March 2017 Ralf-Dieter Hilgers was invited to give a presentation at the Strategic Review and Learning meeting during the Maltese Presidency of the Committee for Orphan Medicinal Products (COMP) in Malta. His talk entitled “Innovative statistical design methodologies for clinical trials in small populations focussing on rare diseases” reported on IDeAl research findings and was of high interest to the COMP.

In 2016 the senior medical director of the EMA initiated a joint project on how to incorporate external data sources (form other RCTS but also RWD) proposing threshold-crossing designs (Eichler et al., 2016). Furthermore, Franz König contributed to a review on experience gained on marketing authorization in Europe (Hofer et al. 2017, in preparation).

Although not explicitly funded by the project, IDeAl starts with dissemination of the results and contribution to applied research project. Ralf-Dieter Hilgers was involved in two eRARE project proposal of being one is currently funded (PDerm, PI: Ralf Ludwig, Lübeck). The contribution to analyse the data of the Friedreich Ataxia (EFACTS) was also presented at the third IRDiRC conference February 2017 in Paris. Another opportunity to discuss the findings in the oncology community, was the invited talk at the German cancer center (DKFZ) in January 2017.

3. Exploitation of results

The exploitation activities cover various fields

• contributions to regulatory guideline in the past and future

• deliver software codes

• scientific publications and presentation at conferences which last in the future

• applying new methodology in clinical trials in academia and industry

• Including new methods in Courses at Universities

The contributions to regulatory guideline in the past and future is one of the main concerns of IDeAl findings and dissemination. The perspective of a revision of the CHMP “Guideline on clinical trials in small populations” underlies the grand proposal and is expected for 2018, when all three projects come to an end. Perhaps one of the work package lead of asterix is expected to take the lead for the revision and IDeAl being closely related to asterix will bring the findings into the discussion, i.e. the regulatory context. IDeAl already commented on related guidance, e.g. 'Draft Guideline on evaluation of anticancer medicinal products in man'.

Further, IDeAl stays in contact with regulators at various occasions, e.g. IDeAl participated to the development of a “Framework of collaboration between the European Medicines Agency and academia”. IDeAl findings will help to implement the guidance on “Extrapolation of efficacy and safety in paediatric medicine development”, which is actually under discussion. Additionally, some of the research finding may have the potential to change regulatory requirements and may be selected as certification procedures for micro-, small- and medium-sized enterprises (SMEs) (see http://www.ema.europa.eu/). Within this context one may in particular think about the developments in WP2 – selection of the best practice randomization procedure, WP3 – extrapolation of dose response information, WP9 – decision theoretic evaluation of drug legislation and WP10 – evaluation of surrogate endpoints. Certification of procedure would corroborate the European leadership in design and analysis of small population clinical trials and supports IRDiRC targets. And some activities lead to funding outside EU, for instance the FDA project of Holger Dette and France Mentré about the “Evaluation of Model-Based BioEquivalence (MBBE) statistical approaches for sparse designs PK studies”. However, the funding of the fees for certification of procedures is currently unclear.

Exploitation also belongs to the delivered software codes. IDeAls software code has reached a lot of interest. The following list describes the delivered programs in brief.

1. Araujo, A. (2016): R-Code “Statistical Analysis of Series of N-of-1 Trials Using R”, http://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/nof1_rand_cycles_v8.pdf

2. Brzyski, D. Peterson, C., Candes, E.J. Bogdan, M., Sabatti, C., Sobczyk, P. (2016): R package "geneSLOPE" for genome-wide association studies with SLOPE. https://cran.r-project.org/web/packages/geneSLOPE/index.html

3. Graf, A., Bauer, P., Glimm, E., König, F. (2014): R-Code to calculate worst case type-I-error inflation in multiarmed clinical trials, http://onlinelibrary.wiley.com/doi/10.1002/bimj.201300153/suppinfo

4. Jobjörnsson, S. (2015): R package "bdpopt" for optimization of Bayesian Decision Problems. https://cran.r-project.org/web/packages/bdpopt/index.html

5. Hlavin, G. (2016): application for extrapolation to adjust significance level based on prior information, http://www.ideal-apps.rwth-aachen.de:3838/Extrapolation/

6. Möllenhoff,K. (2015): R package "TestingSimilarity" for testing similarity of dose response curves. https://cran.r-project.org/web/packages/TestingSimilarity/

7. Riviere, M.K. Mentré, F. (2015): R package “MIXFIM” for the evaluation and optimization of the Fisher Information Matrix in Non-Linear Mixed Effect Models using Markov Chains Monte Carlo for both discrete and continuous data. https://cran.r-project.org/web/packages/MIXFIM/

8. Schindler, D., Uschner, D., Manolov, M, Pham, M., Hilgers, R.-D. Heussen, N. (2016): R package "randomizR" on Randomization for clinical trials. https://cran.r-project.org/web/packages/randomizeR/

9. Senn, S, (2014): R, GenStat and SAS Code for Sample Size Considerations in N-of-1 trials, http://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/Sample-Size-Considerations-for-N-of-1-trials.zip

10. Sobczyk, P., Josse, J., Bogdan, M. (2015): R package "varclust" for dimensionality reduction via variables clustering. https://psobczyk.shinyapps.io/varclust_online/

11. Sobczyk, P., Josse, J., Bogdan, M. (2017): R package "pesel" Automatic estimation of number of principal components in PCA with PEnalized SEmi-integrated Likelihood (PESEL). https://github.com/psobczyk/pesel

12. Szulc, P., Frommlet, F., Tang, H., Bogdan, M. (2017): R application for joint genotype and admixture mapping in admixed populations, http://www.math.uni.wroc.pl/~mbogdan/admixtures/

13. Van der Elst, W., Alonso, A., Molenberghs, G. (2017): R package "EffectTreat" on the Prediction of Therapeutic Success. https://cran.r-project.org/web/packages/EffectTreat/index.html

14. Van der Elst, W., Meyvisch, P., Alonso, A., Ensor, H.M. Weir, C.J. Molenberghs, G. (2017): R Package "Surrogate" for evaluation of surrogate endpoints in clinical trials. https://cran.r-project.org/web/packages/Surrogate/

15. Van der Elst, W., Molenberghs, G., Hilgers, R.-D. Heussen, N. (2016): R package "CorrMixed" for the estimation of within subject correlations based on linear mixed effects models. https://cran.r-project.org/web/packages/CorrMixed/index.html

The downloads for most codes vary between 100 and 300 per month in the last 2 years whereas effecttreat has peak downloads of 830 and surrogate of 2515.

Scientific publications and presentations at conferences which last in the future are already descripted in chapter “Beyond DoW”.

And of course, the IDeAl findings will be included in regular courses at Universities, in particular in the educational programs at medical faculties as well as in the consultation of clinical trials. An example for the latter is the involvement of Ralf-Dieter Hilgers as responsible biostatistician in the European wide randomized, double-blind, placebo-controlled, parallel-group, multicentre study of the efficacy and safety of nicotinamide in patients with Friedreich´s ataxia (NICOFA) lead by Professor Jörg Schulz, RWTH-Aachen University.

The IDeAl project, as well as the asterix and InSPiRe projects, has resulted in development of innovative methodology for the statistical design and analysis of small population clinical trials and reflect current and future European leadership in this area. Some methods will rapidly be or are already being implemented, whilst for other more ground-breaking methods, further work will be needed prior to implementation in practice. The three project coordinators, Ralf-Dieter Hilgers, Kit Roes and Nigel Stallard continue to collaborate and are highly motivated to work together to advance knowledge and promote best practice in this area. Ongoing close work includes development of a proposal for a European Reference Network for statistical methodology in design and analysis of small population clinical trials, as suggested by Gerard Long (Eurordis).

List of Websites:

Website: www.ideal.rwth-aachen.de

Participants contact:

Coordinator: Professor Ralf-Dieter Hilgers (WP2), Department of Medical Statistics, University Clinic Aachen, RWTH Aachen University, Pauwelsstr. 19, 52074 Aachen, Germany. rhilgers@ukaachen.de

Professor Holger Dette (WP3), Ruhr University Bochum, Germany

Professor Franz König (WP4), University of Vienna, Austria

Professor France Mentré (WP5), Institut National de la Santé et de la Recherche Medicale, France

Professor Stephen Senn (WP6), Luxembourg Institute of Health (former Centre de Recherche Public de la Santé), Luxembourg

Professor Mats Karlsson (WP7), Uppsala University, Sweden

Professor Malgorzata Bogdan (WP8), Polytechnika Wroclawska, Poland

Professor Carl-Fredrik Burman (WP9), Chalmers and University of Technology, Sweden

Professor Geert Molenberghs (WP10), University Hasselt, Belgium

Professor Christoph Male (WP11), University of Vienna, Austria

Clinical trials are the main means of evaluation new therapies for use in humans. The specific layout of a clinical trial depends on various aspects. Increasingly the scientific community has recognized that the size of the target population is relevant when planning a clinical trial. As statistical methods are considered as the backbone of the clinical trial with respect to design and analysis aspects, the question appears, whether the well accepted understood and evaluated standard techniques in designing and analysing clinical trials in moderate or larger populations are applicable in small clinical trials with a limited population in the back, too. At that point the EU FP7 funded Integrated Design and Analysis of small population group trials (IDeAl) project was set up to refine the statistical design and analysis methodology for clinical trials in small population groups by strictly following the concept of an improved integrative approach from various perspectives. These perspectives cover the assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, as well as the evaluation of biomarkers which are strongly related to regulators requirements. Of course, the dissemination of the results is a main purpose of IDeAl as well. Within the nine scientific work-packages the IDeAl consortium has developed:

• a new methodology for the selection of the best practice randomization procedure and subsequent analysis for a small population clinical trial taking possible bias into account

• a new optimized design and analysis strategy for comparing dose response profiles to extrapolate clinical trial results from a large to a small population

• statistical methods to adapt the significance level and allow confirmatory decision-making in clinical trials with vulnerable, small populations

• design evaluation methods enabling small clinical trials to be analysed through modelling of continuous or discrete longitudinal outcomes

• approaches to planning and analysing trials for identifying individual response and examining treatment effects in small populations

• new methods for sample size calculation, type 1 error control, model averaging and parameter precision in small populations group trials within non-linear mixed effects modelling

• new methods for identifying biomarkers and prognostic scores based on high dimensional genetic data in small population group trials

• how to optimise the overall value of drug development to patients, to regulators and to society under opacity in regulatory and payer rules as well as in very rare diseases

• methodology to evaluate potential surrogate markers and to analyse data from a small numbers of small trials, with emphasis on fast and easy computational strategies

Together with the asterix and InSPiRe project, the IDeAl findings are discussed with the regulators at the joint workshop hosted by the European Medical Agency (EMA) in March 2017. It becomes clear, that some findings need time to being implemented while braking barriers of traditional thinking in clinical trial methodology. The output of IDeAl is currently described in 63 scientific peer reviewed publications, more than 170 presentations and additional in a series of webinars.

For further information see the website www.ideal.rwth-aachen.de.

Project Context and Objectives:

1. Aim

The aim of the Integrated Design and Analysis of small population group clinical trials i.e. the IDeAl project was to refine the statistical methodology in small population group trials by strictly following the concept of an improved integration of design, conduct and analysis of clinical trials from various perspectives. These methodologies were addressed to the efficient assessment of the safety and/or efficacy of a treatment, universally applicable and not unique to specific diseases.

Statistical methodologies for design and analyse are the backbone of clinical trials aiming to evaluate new therapies. The theory of statistical design methodologies for clinical trials in large population groups is highly elaborated, well accepted and has reached a high standard. In particular, the operating characteristics of design and analysis methods for clinical trials in arbitrarily large populations are quite well known but may alter in small population groups. However, the applicability of standard clinical trial approaches to small populations has come under increasing scrutiny and criticism and the scientific community has been seeking for more advanced or new methods, recognizing that current theory does not reflect the special problems arising in clinical trials for small population groups. An example is the problem of “noise to effect” ratio, where the impact of bio-noise as result from avoidable and unavoidable non-systematic errors in the design and conduct of a trial could be handled by increasing the sample size in large populations. Obviously, this is not possible in small population groups, where the geographically sparse distributions of patients in the EU as well as worldwide and a very limited population size per time, results in low recruitment rates. So treatment effects may be overlooked by application of the classical design concepts because only large effects can be observed. On the other hand, the problem is also strongly related to unacceptable prolonged recruitment periods. Further, in contrast to “standard” clinical development programmes used in regulatory settings of drug legislation in large populations, conducting a series of large clinical trials is obviously not be feasible in small population groups.

2. Key challenges of the project

The key challenges originate to the major applications areas of small population clinical trials for treatment evaluations. Currently these are rare diseases, paediatric trials, subgroups of responder and personalized medicine.

Rare Diseases

In the EU, diseases that affect on average not more than 5 in 10 000 people are called rare. The resulting group of patients who are affected by a specific rare disease can be very small. A significant number of diseases occur in only 2 patients (e.g. Obesity-colitis-hypothyroidism- cardiac hypertrophy- developmental delay syndrome), see Orphanet (2016). Worldwide there exist around 8 000 rare diseases, affecting around 30 million people in the EU.

The ability of conventional statistical methods to evaluate new therapeutic approaches for any given rare diseases is limited. For instance long recruitment time as well as geographically sparse distribution result in organisational challenges. Biased study results due to time trend in the data may be more likely. Prolonged recruitment time may lead to reduced motivation of patients as well as physicians due to loss in believe about currentness of the research or because availability of new treatment developments. These problems may be among others more likely in small population clinical trials. However, the definition, of what is small and leads to reduced validity of conventional statistical methods is not quite clear. To give some figures about “what is a small trial” one can refer to the 63 orphan drug approvals in the EU from 2000 to 2010. Here 22 of 38 randomised controlled trials showed a total sample size below 50. Of course, one would assume that with a sample size of 50 or below long run arguments are likely to fail resulting in a reduced validity of the conventional statistical approaches to demonstrate the efficacy and safety of therapies.

Paediatric Trials

Specific to paediatric trials are ethical limitations as well as heterogeneity of age classes. This hampers clinical trials at least in size but evidence is needed for accepting new paediatric specific therapies. Motivated by the observation that market forces alone have proven insufficient to stimulate adequate research into medicinal products for the paediatric population, and their development and authorization, a new regulatory procedure, the so called paediatric investigation plan (PIP) has been taken over by a Paediatric Committee (PDCO) at the European Medicines Agency (EMA). The scope of PIPs may reach from the one extreme of a full programme (including pre-clinical research, pharmacokinetics, pharmacodynamics, dose finding studies and two fully powered pivotal Phase III studies) for diseases only existing in childhood to the other extreme of, for example, only a single (pharmacokinetic) case series in children. In the EU regulation, the option of fully or partially extrapolating knowledge and data from adults to paediatric populations is an obvious and widely applied approach to reduce the burden of drug development in children. In particular, sound statistical design and analysis methods to support extrapolation are underinvestigated so far.

Personalized Medicine

Recent consideration in the evaluation of new therapies take into account that the treatment effect varies between patients. This leads to the efforts to improve individual patient’s response by tailoring the treatment, the key idea in personalized medicine. This in the end could be understood as evaluation of the treatment, which works best in a particular patient. In the statistical sense, personalized medicine implies reduction of the variance resulting in new challenges with respect to study design and analysis. Of course, small population groups are linked to the popular vision of size one result.

Subgroups

Small population groups may occur as subgroups of responders to therapies, which might fail to succeed in the whole population. Subgroups may also be defined by individually tailored therapies or as regional subpopulation. Further small population subgroups may occur as instances of public health urgency. An EMA guideline to confirmatory investigate subgroups in clinical trials with therapy response is actually in the developing phase. Here it has to be realized that the resulting subgroups, whether identified in the analysis phase of trial or identified in planning phase might be considerable small. Of course, the statistical evaluation is affected by the risk of type-I-error inflation as well as lack of power resulting from the size of the subgroups.

The methodological framework on clinical trials in small populations from the regulator perspective is described in the EU by the EMA guidance and in the US by the draft guidance on rare disease. The major messages of the CHMP guidance [CHMP 2007] are that there exists no special method for designing, carrying out and analysing clinical trials in small population groups. Further, it is recommended to use as much as possible information for designing a clinical trial and extract as much as possible information from a clinical trial to make a valid benefit risk assessment possible. Additionally, the EMA guideline [EMA 2012] states that avoiding unnecessary clinical trials can be done by extrapolation, i.e. transfer knowledge from a large population to a small population.

As elaborated in the guideline, knowledge about the variability is essential for efficient study design and well planned use of the best available techniques to obtain and analyse information is crucial. However, sometimes bad design itself may be a source of additional variation or necessitate complex statistical models to extract all information from the data. Here, less conventional methodological approaches are mentioned, which increase the efficiency of the design and analysis for small population group clinical trials. It is stated that these methods are not often used because of increased complexity.

Starting from the actual regulatory guideline on small population clinical trials in the EU, IDeAl identified challenges in the statistical methodology that actually hamper the conduct of clinical trials in small population groups, such as reduced power of limited sample or population size, slower recruitment progress, elevated heterogeneity in patients’ outcome, difficulties in decision-making because of limited repeatability of trials, criticism of the validity of traditional statistical methods that are based on long run arguments or difficulties in measurement of patient outcome. These points are linked to the following research areas in small clinical trials:

• addressing pharmacological aspects,

• using prognostic information,

• pharmacogenetic information for tailored therapeutics including n-of-1 trials,

• extrapolation of information from large population to small population groups,

• design aspects, including optimal design and doses, choice of analysis,

• adaptive designs, including response adaptive methods and sequential designs,

• randomisation, including non-parametric methods,

• choice of endpoints,

• using Bayesian arguments and decision analysis.

IDeAl also recognized, that the developments in biostatistical methodology nowadays take frequently only a single aspect into consideration, e.g. adaptive designing a clinical trial is usually done without incorporating the information from randomization procedure or prognostic factors. The IDeAl approach is an integrative way of harmonized methodologies. In other words, there exists an unmet need to have tailored statistical design and analysis methods, where the population size is limited and or the sample size is small.

Summing up, the IDeAl approach of an integrative improvement with the methodology for clinical trials in small population groups covers the following areas:

• the assessment of randomization,

• the extrapolation of dose-response information,

• the study of adaptive trial designs,

• the development of optimal experimental designs in mixed models,

• the development of pharmacokinetic and individualized designs,

• the development of methods for simulation of clinical studies,

• the involvement and identification of genetic factors,

• decision-theoretic considerations as well as

• the evaluation of biomarkers, and surrogate endpoints.

In 2013, the IDeAl project was launched as a collaborative research activity of 10 European partners from 7 European countries accompanied by an external advisory board involving all relevant areas required to statistically design and analyse small population clinical trials.

3. Objectives

The project explored new methods for design and analysis of clinical studies, integrate and synthesise these into an effective strategy, so that the efficiency of clinical trials evaluating therapies for rare diseases can be significantly increased.

The objectives of IDeAl were to

• find the adequate randomization procedure for small population group trials by assessment of established randomisation procedures and formulation of a randomization based test.

• develop adequate statistical methodologies to extrapolate the dose-response information from source to target population.

• incorporate of external information into adaptively designed clinical studies for small population groups.

• develop optimal design in non-linear mixed models to analyse studies in small population groups.

• design pharmacogenetic small population group trials including cross-over trials, n-of-1 trials and enrichment trials.

• develop pharmacometrical methods to enable simulation clinical trials in small population groups based on non-linear mixed effects models.

• develop of new statistical models for prediction of the response to the therapy in small population group trials based on genetic factors and other covariates.

• to improve the rational basis for decisions, and help align different stakeholder perspectives.

• develop an efficient and feasible framework for biomarker and surrogate endpoints in small population group clinical trials.

• disseminate the newly developed statistical methodology of the IDEAL project.

Project Results:

Most statistical design and analysis methods for clinical trials have been developed in the setting of relatively large sample sizes, with confirmatory clinical trials often recruiting several hundreds or even thousands of patients. These methods may not be suitable to evaluate therapies in small populations. The general objective of the IDeAl project is broken down in 10 scientific work-packages focussing on the assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, as well as the evaluation of biomarkers and the dissemination of results. One additional work-package provide support for project management. The IDeAl project is accompanied by an advisory board of international experts with different professional backgrounds, representing both patients' interests, the views of the pharmaceutical industry as well as clinical and regulatory aspects.

In the following the achievements within the funding period ending on April, 30 2017 of the IDeAl project broken down by workpackages are described. Of course, there are several activities planned and will be conducted in the next future. Some of these are described in section 3.

1. Management Structure (WP1)

A comprehensive framework for the proper and trustful implementation of all contractual, scientific, administrative and financial tasks within the project work plan to achieve effective and efficient project coordination was built up by the coordinator. IDeAl is organized as a cooperative project of 11 work-packages. The leads of each work-package formed the steering committee which acts in various perspectives and have had various face to face meetings. Further Email contacts were implemented for fast decision makings. The consortium was accompanied by an external advisory board.

The management structure and mechanisms for communication were successfully implemented and practiced as project culture to guarantee high level of internal communication and a smooth information flow, to ensure knowledge exchange and monitoring of the research work between the project partners. The main communication means used were email correspondence and face-to-face meetings as well as telephone conferences. The most efficient way of communication within the consortium members was by email communication, intensified through bilateral phone calls and ad hoc appointments with the coordinator for open dialogue, strategic discussions and definition of future steps on research and dissemination.

The coordination of the project was very successful measured by means of the output of the project. All partners are involved in the impressive scientific and dissemination output, working together in close collaboration. As one partner once stated, IDeAl established a “new, unique scientific family” – a network of scientist from EU and outside working originally in different areas of clinical trials started working together with launch of the project. Of course, this network could start working only with an efficient management structure. All groups decided on the progress of the project work in a cooperative, seamless way. With this in the back, IDeAl could get in close contact and work together with other projects like asterix and InSPiRe without conflicting intellectual property rights.

Instruments supporting this process were the web-based project management platform ‘IDeAl Cloud’, the webpage and Email contacts. Furthermore, extending the proposal, the coordinator together with the work-package leads set up and chaired the “Young Scientists Working-Group” to further strengthen the scientific exchange and the internal communication between the principal investigators and all research assistants working under the IDeAL umbrella.

In addition, the scientific work progress of the project was accompanied by the advice of the IDeAL External Advisory Board consisting of international experts representing patients’ interests, views of the pharmaceutical industry as well as clinical and regulatory aspects. The board members particularly supported the consortium members in circulating the project results to the scientific community, in consultation and agreement with the coordinator.

During project run time, managing contracts between partners including the external advisory board, implementation and assistance of financial EU guidance, meeting organisation and making templates available are supportive measures.

2. Recommendations

2.1. Assessment of randomization procedures and randomisation based tests (WP2)

The gold standard in clinical trials to implement treatment allocation to patients is by using randomization. Here the element of chance in the allocation process is used to avoid or at least minimize the influence of bias on the estimate of the treatment difference. The properties of randomization procedures are well studied from the theoretical point of view, but little work has been done with respect to practical situations. In particular, most of the evaluations belong to the long run argument, which is hardly applicable in small clinical trials. On the other hand, the choice of the randomization procedure for a particular clinical trial is generally up to the scientist “feeling” and frequently not well motivated by scientific arguments. To avoid false decisions for a treatment effect caused by the lack of selecting, the best practice randomization procedure is searched for. To assess the value of randomization procedures for designing in particular in small clinical trials, a completely new methodology had to be developed.

Bias assessment of randomisation procedures

The assessment of bias in small population group trials calls for a completely new analysis tool. The statistical analysis of selection bias on the type-I-error probability was extended to a mathematical description of the model under misclassification (Langer, 2014). Recognizing that the biasing policy cannot be applied to time to event data, a modified biasing policy was developed (Rückbeil et al., 2017). Finally, the biasing policy was adapted to the specific research question in multi-arm clinical trials (Uschner, Hilgers, et al. 2017 submitted). Models for investigating the impact of chronological bias on the type-I-error probability were introduced by Tamm and Hilgers 2014. In the next step, a combined model was introduced to investigate the additive joint effect of selection and time trend bias on the type-I-error probability. Further another approach (Schindler, 2016) the so called linked assessment criterion, based on a normalized multi-criterion function, was developed which enables the combination of different criteria measured on different scales (e.g. balancing behaviour, correct guessing, etc.) to assess randomization procedures. The models applied to different randomisation procedures show the influence of the two biases on the type-I-error probability depending on the amount of bias. For instance, in a simulation study the risk of selection and chronological bias in two-arm parallel small population group trials based on simulated type-I-error rates deviates markedly depending on the size of the trial, the magnitude of the particular bias as well as the randomization procedure used.

Recommendation 1: Do not select a randomization procedure by arbitrary arguments, use scientific arguments taking into account the expected magnitude of bias.

Development of adequate randomisation procedures for small population groups

Although various randomization procedures have been proposed, no one procedure performs uniformly best. In the design phase of a clinical trial the scientist has to decide upon the "best practice" randomization procedure to be used, taking into account the practical research conditions of the trial with respect to the potential of bias. Up to now, less support has been available to guide the scientist in making this decision, i.e. weighting the properties of the randomization procedure with respect to practical conditions of the research question to be answered by the clinical trial.

Although there exist a large number of software products that assist the researcher to implement randomization, no tool, which covers a wide range of procedures and allows the comparative evaluation of the randomization procedures reflecting the specific clinical situation, has been proposed in the literature. A framework (Evaluation of Randomization procedures to clinical trial Design Optimization) to assess the impact of chronological and selection bias in a parallel group randomized clinical trial on the probability of a type-I-error to derive scientific arguments for the selection of an appropriate randomization procedure (Hilgers, Uschner et al. 2017 submitted) was developed. In order to conduct an ERDO (Uschner, Schindler et al. 2017) developed the R package randomizeR, which addressed this unmet need. The software randomizeR allows the generation of randomization sequences and the assessment of randomization procedures with respect to bias. A YouTube video and a manual facilitate the use of the freely available software.

Recommendation 2: In case of randomized clinical trial emphasis should be given to the selection of the used randomization procedure by following ERDO using randomizeR.

Development of randomisation tests for small population groups

(Kennes et al., 2015) provided an asymptotic likelihood ratio test to analyse randomized clinical trials that may be subject to selection bias for normally distributed responses. These results correlate well with the likelihood ratio test of Ivanova et al. (2005) for binary responses. Tamm and Hilgers (2014) stated, that unobserved time trends may induce a strong time trend bias in the treatment effect estimate and the test decision. According to our results, medium block sizes are sufficient to restrict chronological bias to an acceptable extent. Regardless of the block size, a blocked ANOVA should be used because the t-test is far too conservative, even for weak time trends. Similar Uschner, Hilgers, et al. (2017 submitted) proposed a biased corrected test for mutiarm clinical trials.

Finally, we propose exact randomization tests for small population group trials, based on the applied randomization procedures, i.e. randomization based inference. We found that randomization tests in small population group trials can be distorted by selection bias and developed a bias-corrected test based on a restriction of the reference set. The test yields unbiased p-values in the presence of selection bias, independent of the distribution of the responses. In addition, we propose an algorithm for the efficient generation of the restricted reference set. Finally, in randomization based inference the problem of dealing with missing observations is treated by Hilgers, Rosenberger, Heussen (2017 submitted).

Recommendation 3: In case of a randomized clinical trial, we recommend to conduct a sensitivity analysis to elaborate the impact of bias on the type-I-error probability.

2.2. Extrapolation dose response information (WP3)

The different reaction on medicinal drugs in various populations becomes of huge importance in clinical research. Here populations can be of small sizes. Consequently, the main objective of work-package 3 was the development of new statistical methodology for the extrapolation of dose response information and conclusions available from a given source population to make inference for another target population. By this, for example unnecessary studies can be avoided. In this context regression models are a very important tool to provide dose response information. In many cases the question occurs whether two dose response curves can be assumed as similar. This problem also appears in the situation of detecting non-inferiority and/or equivalence of different treatments (Liu et al., 2009). We derived new statistical procedures addressing the problem of comparing curves and extrapolating information, with a particular focus on trials with small sample sizes. The main achievements are the following:

New statistical measures for similarity of dose-response between a source and a target population

We started our work by improving the accuracy and the computational effort of confidence bands for the difference of two curves. The currently available methods (see for example Gsteiger et al., 2011) are based on the union-intersection test of Berger (1982) which yields procedures with extremely low power. In our approach, which is based on estimates of new measures of similarity (such as the maximum deviation) between two dose response curves, we achieved a clear improvement using bootstrap methods. Additionally, we developed a new statistical test for the hypothesis of similarity of dose response curves. The test decides for equivalence of the curves if an estimate of a distance is smaller than a given threshold, which is obtained by a (non-standard) constrained parametric bootstrap procedure. The finite sample properties are investigated by means of a simulation study (see Dette et al., 2017 and Möllenhoff, 2016, for the corresponding R package “TestingSimilarity”). These procedures have been developed in close cooperation with Professor Frank Bretz (Head of Biostatistics, Novartis), member of the Advisory Board, to ensure that all important features in drug development are addressed by the new methodology.

Recommendation 4: The comparison of dose response curves should be done by the bootstrap approach developed by Dette et al. (2017) instead of Gsteiger et al. (2011).

Extrapolation of efficacy and safety information

Our goal here was to quantify the information from the source population in order to extrapolate to the target population. We used the Minimum Effective Dose (MED) as a metric (see Ting, 2006) yielding a measure for similarity of dose response. The MED can be used to claim equivalence (to a certain amount) of information from the source and the target population. Confidence intervals and statistical tests were developed for this metric (see Bretz et al., 2017 submitted).

Recommendation 5: If the aim of the study is the extrapolation of efficacy and safety information, we recommend to consider and compare the MEDs of two given populations.

Robustness against incorrect model assumptions

Concerning the robustness of all these new techniques, we did numerous simulation studies investigating the sensitivity of the procedures with respect to misspecifications of the functional form (e.g. Schorning et al. 2016) . We could show a very robust performance of all derived methodology.

Recommendation 6: The derived methodology shows a very robust performance and can be used also in cases where no precise information about the functional form of the regression curves is available.

Minimisation of false claims through optimal experimental design and dissemination

Optimal designs for the comparison of curves have been developed, which minimizes the maximum width of the confidence band for the difference between two regression functions. In particular, it was demonstrated that the application of optimal designs instead of commonly used designs yields a reduction of the width of the confidence band by more than 50% (see Dette and Schorning, 2016, Dette, Schorning and Konstantinou, 2016).

Recommendation 7: In case of planning a dose-finding study comparing two populations, we recommend to use optimal designs in order to achieve substantially more precise results.

2.3. Adaptive design studies (WP4)

In adaptive designs accumulated data should be used to allow learning on the spot and, if necessary, redesign the ongoing trial at an adaptive interim analysis to increase the chances of success. Popular adaptations include changing the sample size, subgroup selection or dropping certain treatment groups. Such features are especially meaningful in small populations, where it is infeasible to conduct a series of (large) clinical trials. In Bauer et al. (2015) we summarized the developments over the past 25 years (see also Bauer et al. 2016). We reviewed the key methodological concepts, summarize regulatory and industry perspectives on such designs, and discuss case studies.

Development of evidence levels for small population groups

In small population groups, full independent development programs to demonstrate efficacy of an intervention are often not feasible. Therefore, there is a shortage of first-hand information that can support evidence in favour of a treatment. For example, children are regarded as a vulnerable population and in terms of medical research, they have to be protected from unnecessary risk. This inhibits research in children and leads to a situation, in which many medicines are registered for adults, but not for children. By EU regulation, paediatric investigation plans should be agreed on in early phases of drug development in adults. Here, extrapolation from adults (“source”) to children (“target”) is widely applied to reduce the burden and avoids unnecessary clinical trials in children. We proposed adaptive paediatric investigation plans explicitly foreseeing a re-evaluation of the early decision based on the information accumulated later from adults or elsewhere (Bauer and König, 2016).

We focused on the combination of target and source population data (extrapolation). We translated frequentist decision criteria (alpha-level boundaries for p-values) into the Bayesian framework (Hlavin, König et al., 2016). We introduced a “scepticism factor” to formulate a framework based on prior beliefs in order to investigate when the significance level for the test of the primary endpoint in confirmatory trials can be relaxed (and thus the sample size can be reduced) in the target population. The less sceptic one is that extrapolation from the source population is applicable, the higher the adjusted significance level will become for the pivotal trial in the target population and therefore the smaller the required sample sizes. Another way to adjust the significance level for efficacy testing is to incorporate safety data as well. We suggested a two-step safety selection and testing procedure for multi-armed clinical trials (Hlavin, Hampson and König, 2016).

Recommendation 8: In case of confirmatory testing, we recommend adapting the significance level by incorporating other information (e.g. using information from drug development programs in adults for designing and analyzing pediatric trials).

Adaptive designs for confirmatory model based decisions

We developed adaptive graph-based multiple testing procedure to allow testing of multiple objectives and designs adaptations in a confirmatory clinical trial (Klinglmüller et al., 2014). Because the adaptive test does not require knowledge of the multivariate distribution of test statistics, it is applicable in a wide range of scenarios including trials with multiple treatment comparisons, endpoints or subgroups, or combinations thereof. If, in the interim analysis, it is decided to continue the trial as planned, the adaptive test reduces to the originally planned multiple testing procedure. Only if adaptations are actually implemented, an adjusted test needs to be applied.

The MCPMod approach has recently attracted a lot of attention as it was the first statistical methodology, which has been ‘qualified’ by the European Medicines Agency. Originally, MCPMod has been developed for Phase IIb dose finding studies to characterize the dose response relationship under model uncertainty once a significant dose response signal has been established. We developed a new closed MCPMod methodology for confirmatory clinical trials to allow individuals claims that a drug has a positive effect for a specific dose (König et al., 2016). We applied the closed MCPMod methodology to adaptive two-stage designs by using an adaptive combination tests (Krasnozhon, Bornkamp et al., 2016).

In a recent review conducted by the European Medicines Agency (Hofer et al., 2017 submitted) it was shown that most of the adaptive design proposals were in oncology. Unfortunately, the important case of time-to-event endpoints was currently not addressed by the standard adaptive theory. We proposed an alternative frequentist adaptive test, which allows adaptations using all interim data (Magirr et al., 2016). We showed that other standard adaptive methods may ignore a substantial subset of the observed event times. We developed group sequential permutation tests for situations where underlying censoring mechanism would be different between the treatment groups (Brueckner et al., 2017 submitted).

Recommendation 9: In case of design modification during the conduct of a confirmatory clinical trial, we recommend using adaptive methods to ensure that the type-I-error is sufficiently controlled not to endanger confirmatory conclusions. Especially in clinical trial with multiple objectives special care has to be taken to address several sources of multiplicity.

Adaptive designs to enable comparative effectiveness analysis

Before a new drug can be prescribed by medical doctors to patients on a regular basis, its efficacy has to be demonstrated and the drug assessed by regulatory authorities, HTA and reimbursing bodies. We developed adaptive clinical trial designs to address both the needs of regulators and reimbursers simultaneously. Here questions like, is there a particular subgroup of patients, e.g. defined by genetic biomarkers, which benefit (more) from the experimental treatment; how can we incorporate historical data for decision making also enabling comparisons against control treatments have to be considered.

In the light of personalized (precision) medicine there is a huge debate whether subgroups, e.g. identified by a genetic biomarker and which are typically (very) small in size, may benefit (more) or not. We defined different utility functions to address the needs from a sponsor and public health perspective (Graf, Posch and König, 2014 and Ondra et al., 2016) to identify the optimal trial design. The optimization included, for example, the required sample sizes and the targeted population(s) for the trial (the full population or the targeted subgroup only) as well as the underlying multiple test procedure.

We showed there can be a substantial inflation of the type-I-error rate if investigators perform design adaptations such as treatment selection, sample size reassessment and change of randomisation allocation ratios and naively apply conventional frequentist hypothesis tests ignoring the adaptive nature of the trial (Graf, Bauer et al., 2014). We showed that response adaptive designs have several caveats such as inflation of the type-I-error rate or loss of power when the number of patients to be recruited is limited (Krasnozhon, Bornkamp et al., 2016). Instead of performing adaptations after each single observation, we suggest adaptive designs using adaptive combination tests, where design modifications are performed at a single interim analysis only.

Another important issue is, how could one perform comparative analyses if RCTs are not feasible or ethical? Due to various data sharing initiatives, there are now unprecedented opportunities as well as challenges (König et al., 2015). We propose a new framework for evidence generation called “threshold-crossing” (Eichler et al., 2016). The key issue for threshold-crossing is the upfront specification of an efficacy threshold based on existing RWD and/ or past RCT data. Based on the pre-defined threshold, efficacy is established a single arm thresholding trial. However, as the comparison in threshold design is against historical controls, it is prone to biases.

Recommendation 10: In case randomized control clinical trials are infeasible, we propose “threshold-crossing” designs within an adaptive development program as a way forward to enable comparison between different treatment options.

2.4. Optimal design in mixed models (WP5)

Nonlinear mixed effects models are used in model-based drug development to analyse all longitudinal data obtained during clinical trials. This is especially promising in small group trials as all collected measurements during the trial are kept to evaluate treatments. Therefore, finding good designs for these studies is important to get precise results and/or good power especially when there are limitations on the sample size and on the number of sample/visit per patient.

Following our pioneer work in optimal design based on the Fisher Information Matrix for non-linear mixed effects models with continuous data, the aims of this work-package were to

• Extend and evaluate this approach for longitudinal models with discrete data, repeated time to event and joint models

• Propose robust approaches with respect to parameter values as two-stage adaptive designs

• Propose robust approaches with respect to model uncertainty in design and analysis of pivotal trials analysed trough modelling

The goal was also to make the developments available in free software tools.

Evaluation of Fisher matrix for discrete and time to event longitudinal data

Evaluation of the Fisher Matrix for mixed models is often based on first-order linearization of the model which works poorly for discrete or time to event longitudinal data.

We developed two new methods to evaluate the Fisher Information Matrix. Both approaches use first Monte Carlo integration and then either Adaptive Gaussian Quadrature (MC-AGQ, Ueckert and Mentré, 2017) or Hamiltonian Monte Carlo (MC-HMC, Rivière et al., 2016). Both approaches were evaluated and compared on four different examples with continuous, binary, count or time to event repeated data.

We showed the adequacy of both approaches in the prediction of the standard errors using clinical trial simulation. The MC-AGQ approach is less computational demanding for models with few random effects, whereas MC-HMC computational effort increases only linearly with the number of random effects, hence more suitable for larger models. For both approaches, we show the importance of having large sampling number at the MC step. For the MC-AGQ method, we illustrated on the binary example the influence of the design (number of patients / number of repetitions) on the power to detect a treatment effect (Ueckert and Mentré, 2017).

The MC-HMC method was implemented in the R- package MIXFIM available in CRAN.

Recommendation 11: For evaluation of designs of studies with longitudinal discrete or time to event data, evaluation of the Fisher Information matrix should be done without linearization. Using the new approach MC-HMC (in MIXFIM) will provide adequate prediction of standard errors and allow to compare several designs.

Adaptive two-stage designs in non-linear mixed effects models

One limitation of the optimal design approach for nonlinear mixed effect model is the a priori knowledge needed on values of parameters. Adaptive designs are an alternative, increasingly developed for randomised clinical trial or dose-ranging studies, but rarely applied in nonlinear mixed effects model. Two-stage designs are more practical to implement in clinical settings than fully adaptive designs especially for small population groups.

We extended our package PFIM (www.pfim.biostat.fr) and released a new version PFIM4.0 to allow having a prior information matrix in design evaluation or optimisation, which therefore could be used for multi-stage designs.

We developed and evaluate by clinical simulation multi-stage designs using a pharmacokinetic/pharmacodynamics example with continuous longitudinal data with a total number of N=50 patients. After each cohort of patients, parameters were estimated and used for designing sampling times for the next cohort of patients keeping prior information already obtained to compute the Fisher Information Matrix.

We used first order linearization to evaluate the Fisher Information Matrix as implemented in PFIM which is suitable for continuous data. We studied the efficiency of single stage designs optimised with correct or wrong parameters. We then evaluated two stage designs, where the first cohort was designed from wrong parameters, varying the balance of patients in the two cohorts. We also studied the added value of designs with more stages.

We showed the good properties of adaptive two-stage designs when an initial guess on parameters is wrong (Lestini et al., 2015). In the studied example, the efficiency of the balanced two-stage design was almost as good as a one stage design that we would have obtained if the true parameters were known. With this small number of patients (N=50), the best two-stage design was the balanced design with equal number of patients in each cohort. Those results are consistent with those previously obtained (Dumont et al. 2016) with a simpler example. Having three or five stages, did not improve the efficiency of the designs and is more complex to implement.

The good properties of two-stage balanced design should hold for models with discrete longitudinal data (using then MC-AGQ or MC-HMC to compute the Fisher information matrix) but a clinical trial simulation was not performed.

Recommendation 12: When there is little information on the value of the parameters at the design stage, adaptive designs can be used. Two-stage balanced designs are a good compromise. The new version of PFIM can be used for adaptive design with continuous longitudinal data.

Model uncertainty in design for analyses of pivotal trials

It is important to contribute to the dissemination of model based analysis of pivotal clinical trials in drug evaluation for small population groups. These approaches allow using all individual information recorded, and therefore to decrease sample sizes. One main limitation, as seen by health authorities, is the possible lack of control of the type-I-error when performing model selection. Model averaging approaches are a good alternative. The idea of pre-specifying a number of candidate models is already applied in drug development, for instance for dose-response studies in the MCPMod approach, but was extended only recently for mixed effects models. Before the analysis step, it is needed to design studies which are adequate across a set of candidate nonlinear mixed effects models.

We proposed to use compound D-optimality criteria for designing studies, which are robust across a set of pre-specified model. We also proposed to be robust on the parameter values by defining prior distribution on each parameter and using the expected Fisher Information Matrix and hence using DE-optimality (for one model) or compound DE-optimality. As another integration is need to compute the expected Fisher Information Matrix over the distribution of the parameters, we extended the MC step in the MC-HMC method (Loingeville et al., 2017). We implemented this extension in a working version of MXFIM.

We evaluated those new developments on the count longitudinal data example where there is a model of the effect of dose on the Poisson parameter (Rivière et al., 2016, Ueckert and Mentré, 2017). We specified five different models of the dose effect (Loingeville et al., 2017). We optimised the two doses to be used, in addition to a placebo dose. We did it separately for each model, using D and then DE-optimality, and then across the five models (assuming equal weights of 1/5). We evaluated the loss of efficiency of each design for each model.

We found that the optimal doses varied from one model to the other and, for some models, also changed when robust approach on parameter is used (Loingeville et al., 2017). However, the loss of efficiency of using on design with another model can be important, for instance using the design optimal for a linear model when an Emax model is true lead to an efficiency of only 44%.

The robust design across the five models is a ‘compromised’ of the optimal design obtained for each model (Loingeville et al., 2017). Its efficiency is greater than 80% for each of the five models.

We developed an approach for one-stage design robust across a set of candidate models with uncertainty in parameters. We showed the robustness of the design obtained with this approach when finding optimal dose in a longitudinal count data model. The next steps are to make those new developments available in MIXFIM (presently available upon request), and to incorporate and evaluate this approach within two-stage adaptive designs.

Recommendation 13: When there is uncertainty in the model and on their parameters, a robust approach across candidate models should be used to design studies with longitudinal data.

1.5. Design of pharmaocogenetic trials (WP6)

In this work-package we have concentrated in particular on designs and analyses that pay close attention to sources of variation, exploiting, where appropriate, the predictive value of covariates but also the ability, for certain diseases, for patients to act as their own control. We have taken care to distinguish two major purposes of clinical trials: to establish whether treatments are effective and the wider and more difficult task of establishing when and for whom they are effective. We further pay attention to interpretation of the evidence resulting from a clinical trial (Senn, 2017b,c)

Necessary conditions for justifying a theranostic programme

The objective of this work-package was identifying under what conditions it was worth trying to identify a subpopulation for differential treatments.

We recognised early on, through interaction within the project, that excellent work was being done in WP9 that addressed a closely related problem, e.g. the problem of when should one stop trials in small populations. The reason that this is related is that if one should stop before one starts, it implies that studying the population in question is not worthwhile. We decided, therefore, to interact with WP9. Nevertheless, we have done quite a lot of work regarding one particular question of relevance in this context, namely how may evidence be found to establish whether further separate differentiation of a population, that would otherwise be treated the same, is warranted on the basis of response.

Recommendation 14: We recommend that response should not be defined using arbitrary and naïve dichotomies but that it should be analysed carefully paying due attention to components of variance and where possible using designs to identify them.

Development of within-patient trial designs

The object of this work-package was to put the analysis of n-of-1 trials on a firm, and logical foundation, bearing in mind two different purposes that they may have. First, establishing whether a treatment works at all (Lonergan et al. 2017). Second, establishing to what extent the effect varies from patient to patient. A thorough examination of this has been provided in a paper in PLOS One and associated code for analysis in R has been written (Araujo et al., 2017).

Recommendation 15: For the analysis of n-of-1 trials, we recommend using an approach that is a modified fixed-effects meta-analysis for the case where establishing the treatment works is the object and an approach through mixed models if variation in response to treatment is to be studied.

Development of between-patient trial designs

To extract as much information as possible from between-patient trials has a number of aspects, which are considered (see also Collignon et al., 2016). A side issue is that recent claims have been that significance test give positive results too easily, a situation, which if true, would have devastating consequences for trials in small populations. Some of the work has been devoted to addressing this and other work to a) making efficient use of covariates b) making appropriate use of historical information. This latter aspect of the work is ongoing.

Further work has considered the effect of sequential analysis on inferences. The attached diagram is taken from Senn (2014) and shows how the stopping rule does not have an influence on the inferences from a meta-analysis provided that the trials are weighted by information provided. Thus, inferences from combining small trials in rare diseases are unaffected by whether the trials were sequential or not.

Recommendation 16: When analysing between-patient studies we recommend avoiding information destroying transformations (such as dichotomies) and exploiting the explanatory power of covariates, which may be identified from ancillary studies and patient databases.

Sample Size determination

This task concerned sample size determination for clinical trials. The decision-theoretic aspect is covered by WP9. We have concentrated instead on addressing the challenge of n-of-1 trials, where many components of variation are involved and make sample size determination complex. Theory has been developed (Senn, 2017a) to cover this, practical approaches have been developed and code has been written in R®, SAS® and GenStat® to carry out this task.

Recommendation 17: In case of a conducting a series of n-of-1 trials we recommend paying close attention to the purpose of the study and calculating the sample size accordingly using the approach by (Senn, 2017a).

1.6. Simulation of clinical trials (WP7)

Analysis of clinical trial data using nonlinear mixed-effects models can have important advantages both with respect to the type of information gained and the statistical power for making inference (Karlsson et al., 2013). In general, the main disadvantage with a non-linear mixed effects (NLME) modelling approach is with the assumptions needed to create a NLME model. However, with the movement towards mechanistic models based on biological understanding (Danhof et al., 2008, Marshall et al., 2006), the validity of model assumptions becomes easier to evaluate. Mechanism based NMLE models can be of special interest in small population groups for multiple reasons (Lesko, 2012):

(1) practical limitations might severely hamper the possibility of sufficiently powering a study based on a statistical test making fewer assumptions

(2) When the population is a subset of subjects with a certain disease (e.g. pediatric) there can good opportunities for extrapolations based on prior information from a larger previously studied population (Gisleskog, 2002)

(3) The ability to pool data across studies, arms and periods can be essential to obtain enough information to assess the performance of a specific treatment in a small sub-population and this is often not possible without application of a model, and

(4) Application of NLME modelling as the primary analysis of clinical studies can also open up for innovative designs.

Improved methodology for power/sample size calculations with non-linear effects model based analysis

Power (the ability to identify a true drug effect of a certain size) is often of great importance in clinical trial planning and design, as it quantifies to which degree an experiment is able to distinguish a certain effect size from the null hypothesis. Determining the power of a study is easy when simple models for the description of the observations and a specific effect size are assumed, e.g. when the data are assumed to be normal and a fixed effect size is chosen, the power can be calculated analytically. However, for more complex, longitudinal models the joint distribution of the observations is less obvious and even the effect size might not be easily derivable. In this situation, usually no analytic derivation of the power can be obtained and one has to resort to Monte-Carlo simulations. Ideally, a Monte-Carlo study utilizes a model containing all available knowledge for a particular compound to simulate replicates of the trial and the intended analysis model (not necessarily equivalent to the simulation model) to analyze these replicates. For each analysed replicate the hypothesis test is carried out and the fraction of rejected null hypothesis provides an estimate for the power of the study. Clearly, this power estimate requires a large number of simulations and estimations to be stable, which can be time-consuming to obtain, especially when non-linear mixed effect models are used for the analysis.

A novel parametric power estimation (PPE) algorithm utilizing the theoretical distribution of the alternative hypothesis was developed in this work and compared to classical Monte-Carlo studies (fig. 1). The PPE algorithm estimates the unknown non-centrality parameter in the theoretical distribution from a limited number of Monte-Carlo simulation and estimations. Furthermore, from the estimated parameter a complete power versus sample size curve can be obtained analytically without additional simulations, drastically reducing runtimes for this computation (Ueckert et al., 2016).

A complicating factor in hypothesis testing with non-linear mixed effects models is to keep control of the type-I-error. One way to assess the actual significance level for the hypothesis test is to perform a permutation test. To facilitate this often computationally intensive procedure, a permutation test tool was developed within the free software PsN, http://psn.sourceforge.net and xpose, Lindbom et al., 2004, Keizer et al., 2013, Johnsson and Karlsson, 1999, Harling et al., 2016, Deng et al., 2015.

Recommendation 18: If fast computations of power curves are needed from a non-linear mixed effects model, we recommend using the parametric power estimation algorithm as implemented in the stochastic simulation and Estimation(SSE) tool of PsN (potentially with a type-I correction based on the “randtest” tool in PsN).

Demonstration of the value with mechanism based models in planning and analyzing studies in small population groups

We established proof-of-principle examples for how highly mechanistic systems pharmacology and/or systems biology models can be utilized in planning the analysis of clinical trials in small population groups. Based on simulations with the mechanism based models more parsimonious models suitable for estimation can utilized to understand drug effects and link to the mechanism based model (Wang et al., 2016, Wellhagen et al., 2015).

Recommendation 19: The simulation methods described above can be utilized to investigate the effects of using different, smaller, more parsimonious models to evaluate data from complicated biological systems prior to running a clinical study.

Handling model uncertainty in small population group clinical trial simulations

Model uncertainty is, for natural reasons, largest when based on estimation in a small sample size (e.g. small population groups) and at the same time a small sample size represents an extra challenge in accurately characterizing that uncertainty. Five projects were undertaken to investigate different aspects of model uncertainty of NLME models:

(1) Assessing Parameter Uncertainty Distributions Using Sampling Importance Resampling,

(2) Delta objective function value distributions as a method to diagnose uncertainty distributions, and

(3) Preconditioning of Nonlinear Mixed Effects Models for Stabilization of the Covariance Matrix and

(4) model-averaging and

(5) model based adaptive optimal design.

(1) Sampling Importance Resampling (SIR) (Rubin, 1988) was implemented in a non-linear mixed effect modeling free software as a user-friendly script. Investigation on optimal SIR settings was performed and tested on 30 real data examples. Diagnostics to judge SIR convergence were developed and can be applied to compare different uncertainty distributions. SIR now constitutes a powerful alternative to estimate and utilize parameter uncertainty, especially in the context of small populations (Dosne, Bergstrand et al., 2016).

(2) Confidence intervals determined by bootstrap and stochastic simulation and re-estimation were compared. This analysis showed that with regard to providing uncertainty estimates, bootstrap may be unsuitable for non-linear mixed effects analyses where datasets commonly would be considered “large enough”. The bootstrap delta objective function value distribution provides an easy way to assess if bootstrap results in parameter vectors contradicted by the original data (Dosne, Niebecker, Karlsson, 2016).

(3) A preconditioning method for NLME models to increase the computational stability of the variance-covariance matrix. Preconditioning is a widely used technique to increase the computational stability for numerically solving large sparse system of linear equations (Benzi, 2002). An automated preconditioning routine was made available as a part of the software package Perl-speaks-NONMEM (PsN). The results demonstrated that the variance-covariance matrix and the R-matrix can give a strong indication on the non-estimability of the model parameters if computed correctly, while other methods may not be able to do so (Aoki et al., 2016).

(4) Model averaging methods were investigated in the case of dose selection studies (phase IIb). The proposed method reduces the analysis bias originating from the model selection bias of single model structure based analysis. The proposed method can increase the probability of making correct decisions at the end of trials compared to conventional ANOVA-based Study Protocols (Aoki et al., 2014, Aoki et al., 2017).

(5) Model based adaptive optimal designs (MBAOD) were investigated for bridging studies from adults to children, and were able to reduce model parameter uncertainty. Comparing the relative estimation error of the final parameters estimates showed that MBAOD performed equally to traditional design approaches, while requiring fewer children to fulfill a commonly used precision criteria in most of the simulations (Strömberg and Hooker, 2015, 2016, 2017).

Recommendation 20: We recommend the use of Sampling Importance Resampling to characterize the uncertainty of non-linear mixed effects model parameter estimates in small sample size studies. Non-estimability of parameters may be assessed using preconditioning. The use of the bootstrap model averaging method (Method 2) (Aoki et al., 2016) is recommended when conducting model-based decision-making after a trial. Robust Model based adaptive optimal designs may be used to improve model certainty in clinical trials.

1.7. Genetic factors influencing the response to the therapy (WP8)

The power of clinical trials in small population group trials is diminished by patient’s heterogeneity. Currently it is possible to gather lots of so called “omics” (genomics, proteomics, metabolomics) data, which could be useful to describe this heterogeneity and increase the power of clinical trials as well as to define the groups of patients for personalized therapies. However, due to the relatively small sample sizes, the high dimensional “omics” data require extensive pre-processing. The main goal here is the reduction of the effective model size, so that the model parameters can be precisely estimated with the limited number of patients. Within the work on the IDeAl project, several new statistical methods for the dimensionality reduction and identification of important genetic predictors were developed. The simulation studies confirm good properties of these methods in the context of predicting the patients’ response to the treatment. Also, new theoretical mathematical results were obtained, which allow to identify the range of biological scenarios under which the popular methods of identification of important predictors are effective.

Bayesian methods for identification of genetic pathways involved in the development of disease and the response to the therapy

We developed a new approximate Bayesian methodology for identification of genetic pathways. The method clusters genes into pathways using high-dimensional gene expression data and in principle can be applied for dimensionality reduction of any type of “omics” data. The method is based on a non-trivial application of K-means algorithm, where the centre of each cluster is formed by a set of “principal components” and the distance of a given gene to a cluster centre is determined the value of the BIC criterion in the respective multiple regression model. The dimensionality of a given pathways (number of principal components) is estimated using the PEnalized SEmi-Integrated Likelihood method (PESEL) described in detail in (Sobczyk et al., 2016) and implemented in the R package “PESEL”. The number of pathways is estimated using the modified Bayesian Information Criterion (mBIC), which allows for incorporating a prior biological knowledge. The full methodology is implemented in the R package “varclust” (Sobczyk and Josse, 2016) and allows to analyse data sets much larger than ones which can be analysed with other competitive methods.

Recommendation 21: We recommend using “varclust” for clustering of gene expression data and extraction of a relatively small number of potential predictors of patients’ response to the treatment based on gene expression data.

Development and application of high dimensional model selection for identification of regulatory regions influencing detected pathways

Several methods of gene mapping were developed, which can be used for identification of regulatory regions as well as for identification of genes influencing important patients’ characteristics. We proposed a new method for identification of important genes in admixed populations (Szulc et al., 2017). The method, based on mBIC, allows to enhance the power of gene identification by using both the genotype and ancestry information of genetic markers. Two new convex methods, SLOPE and group SLOPE, for gene mapping were developed (Bogdan et al., 2015, Brzyski et al., 2017 and Brzyski et al., 2017 submitted), which allow to control the fraction of false discoveries when the true number of important genes is small or moderately large. Method “group SLOPE”, turns out to be specifically interesting in the context of identifying rare recessive variants, which might be the related to development of rare diseases. The practical limitations of convex methods of identifying important predictors are mathematically explained in Su et al. (2017 submitted).

mBIC2 and geneSLOPE can efficiently localize influential genes while controlling fraction of false discoveries. Similarly to LASSO, SLOPE allows for FDR control when the number of true causal genes is small or moderately large.

Recommendation 22: It is recommended to use the information on the ancestry of genetic markers when mapping genes in admixed population. It is also recommended to use both regular and group SLOPE, since regular SLOPE has a higher power of detection of additive gene effects, while group SLOPE allows for identification of rare recessive variants.

Statistical model relating response to the therapy in small population group trials based on identified genetic factors and other covariates, as well as their interactions.

The developed methods (mBIC2 and SLOPE) were used to estimate the genetic background and gene-treatment interaction and to predict the patients’ response to the treatment. Subsequently, a procedure for identifying the patients responsive to the treatment was proposed. New methods were compared to the classical approaches based on the single marker tests and least squares estimation in the full model as well as to the modern technique of adaptive least absolute shrinkage and selection operator (LASSO). Partial results of this study are reported in Frommlet et al. (2017 in preparation).

SLOPE, mBIC2 and adaptive LASSO have much better predictive properties than the methods based on single marker tests and the least-squares approach based on all available genetic data.

Single marker tests are very inefficient when the number of causal variants, k, is moderate or large, while the least squares approach works badly when k is small. mBIC2 and SLOPE have predictive properties similar to the ones of adaptive LASSO, with mBIC2 performing the best (having the largest precision in predicting the prognostic index and identifying responsive patients) when the number of genetic markers is larger than the sample size.

SLOPE and mBIC2 achieve these good predictive properties using much less biomarkers than adaptive LASSO, which selects many uninformative SNPs. Comparing SLOPE and mBIC2 we can observe that the methods work similarly for small k, while for larger k the predictive properties of mBIC2 are better.

Recommendation 23: If model building is based on highly correlated gene expression data, we recommend the use of SLOPE due to its computational tractability and good predictive properties.

1.8. Decision analysis (WP9)

The IDEAL project has covered several methodological areas concerning the design and analysis of clinical trials for small population groups. Work-package 9 (WP9) adds to this by analyzing decision making in trial design contexts. Furthermore, this work-package studies the interactions of different decision making stakeholders, and it provides recommendations for regulators, reimbursers and trial sponsors.

We first analyzed decision rules that varying stakeholders may have. These type of decision models are used in all subsequent manuscripts: Jobjörnsson et al. (2016) consider a sponsor’s Phase III go/no go decision and choice of sample size. Given a successful trial, it also models the sponsor’s pricing and the reimburser’s reaction to that. We next analysed the relation of sponsor’s willingness to invest to a population of candidate drugs, lay out the public incentivizing structure, in terms of requirements on clinical evidence (Miller and Burman, 2016 submitted). When a potentially predictive biomarker is present, we model how the design of the trial will affect expected public benefit as well as commercial value (Ondra et al., 2016). Further aspects of adaptations are considered (Ondra et al., 2017 in preparation). Dosing and sizing is modelled, and a decision theoretic framework for programme optimization is sketched (Burman, 2015). A pure societal perspective is set up in Jobjörnsson et al. (2016), where the goal function is simply to maximize the total health benefit in a limited population. In addition to several of the aspects studied in other WP9 publications, the thesis by Jobjörnsson (2016; Section 3.3) models the impact of in-transparency in the regulators’ benefit-risk evaluation on optimal decisions taken by the commercial sponsor.

A general suggestion is to formulate decision rules in a formal Bayesian decision theoretic framework. Even sub-optimal decisions can be modelled (Jobjörnsson et al., 2016) explicitly assessing the uncertainty from one stakeholder’s point of view of how another stakeholder will make decisions in different scenarios.

In the different publications, we have delivered guidance regarding how to formulate decision rules for varying stakeholders. The second deliverable was a software tool to allow numeric solutions of a wide variety of trial design optimization problems, using a Bayesian decision theoretic approach. The R package BDPOPT has been utilized for further research within IDeAl and it is also made publicly available (Jobjörnsson, 2015). Results in term of design optimization is provided in the different publications, for the varying situations they are studying. As seen in Jobjörnsson (2016) regarding regulatory rules, and in Jobjörnsson et al. (2016) regarding reimbursement rules, failure to communicate precise rules to other stakeholders, may lead to suboptimal design and development decisions by sponsors. One recommendation is to increase transparency in regulatory and payer decisions.

The methodology used in the work-package is based on decision theory. It has a distinct flavor of social science, when addressing policy issues, when discussing the formulation of utilities, and in assumptions about (so called) rational agents. This methodology also has some relevance to the important ethical issues around experimentation on human beings. We find that what is best for a patient, who may be included in a clinical trial, may be quite different from what gives the highest overall societal utility. We argue that the well-being of the individual patient must have priority (see Burman’s presentation at the EMA meeting, March 2017; cf. Ondra et al., 2016, page 14).

The third deliverable concerns investment decisions. It is perhaps not surprising that we find that rational sponsors are more keen on investing in drugs with larger market potential, and that sample sizes also tend to increase. We find that this behavior is partly optimal also from a public health perspective. However, there is often a discrepancy between sponsor and societal optimality. In the Ondra et al. (2016) model, larger sample sizes are generally favored from a public health view. Designs motivated by public health consideration will more often focus on the biomarker positive subpopulation. By applying mechanism design, explicitly considering how regulations will affect sponsor decisions, societal rules can be optimized. In the Miller and Burman (2016 submitted) framework, the sample size decrease with lower prevalence of the disease. Also, the regulatory requirements should be tailored to the population size. It is recommended that societal decision rules should be determined based on an understanding, and explicit modelling, of how they will inter-depend with commercial drug developing decisions.

Recommendation 24: Formulate decision rules in a formal Bayesian decision theoretic framework.

Recommendation 25: Societal decision rules (regulation, reimbursement) should be determined based on explicit modelling of how they will inter-depend with commercial drug developing decisions.

Recommendation 26: Increase transparency in regulatory and payer decisions.

Recommendation 27: The well-being of the individual trial patient must have priority.

1.9. Biomarker surrogate endpoints (WP10)

The major objective of WP10 was to develop an efficient and feasible framework for biomarker and surrogate endpoints in small population groups clinical trials. Including a proper incorporation of missing-data aspects, design aspects like randomisation methodology, optimal design, adaptive designs, decision theory, mixed models, cross-over trials as well as incorporating genetic markers and dose response information should be considered to a maximal extent. Simulation-based and other efficient estimation and evaluation methods should be used.

A viable framework for biomarker and surrogate endpoint evaluation in small population groups

Causal inference concepts have been used in the surrogate marker evaluation literature, but these developments were largely independent of meta-analytic and information-theoretic approaches. Yet, it is valuable to integrate all these frameworks to arrive at an optimal surrogate marker evaluation framework. Therefore, Alonso, Van der Elst, and Molenberghs (2015) proposed a causal-inference based for the evaluation of surrogate endpoints. The relationship between the causal-inference framework and two existing frameworks was examined: the relationship with the meta-analytic paradigm by Alonso et al. (2015) and Van der Elst et al. (2016), and the relationship with the information-theoretic framework in Alonso et al. (2016). The results are also presented in the book by Alonso et al. (2017). In particular, Chapter 15 is devoted to surrogate endpoints in rare diseases.

Recommendation 28: In case of small trials, which are in particular variable in size, we recommend the use of the causal inference framework, combined with efficient computational methods.

Surrogate endpoints and missing values

Missing data frequently arise in clinical trials, but the sensitivity of the different surrogate marker evaluation methods for missingness had not yet been studied. A large body of theory and methods to deal with missing values has been developed in other areas of statistics (e.g. in the context of longitudinal data analysis and in survey research), and a number of these results are valuable for surrogate endpoint evaluation as well. The conventional meta-analytic and information-theoretic framework imply maximum likelihood estimation, which is valid when missingness is assumed missing at random. However, maximum likelihood may be prohibitive in small studies, and therefore pseudo-likelihood and inverse probability weighting methods have been developed for missing data, that allow the use of the efficient computational methods and based on pseudo-likelihood. Results are presented in Hermans, Birhanu, Sotto et al. (2017).

Recommendation 29: In case of the evaluation of surrogate endpoints in small trials subject to missingness, we recommend the use of pseudo-likelihood estimation with proper inverse probability weighted and doubly robust corrections.

The incorporation of design aspects

To ensure the most efficient use of markers, having markers available is an important start but absolutely not sufficient. Specific design aspects have to be taken into account (adaptive designs, cross-over trials). Optimising the validation studies from various angles need to be undertaken (using randomisation methodology, optimal design results, and decision theory). Also, the use of state-of-the-art (non-linear) mixed model methodology need to be incorporated. For this, it is important to have at one’s disposition efficient and stable estimation strategies (Flórez Poveda et al., 2017 submitted).

Recommendation 30: In case of hierarchical and otherwise complex designs, we recommend using principled, yet fast and stable, two-stage approaches.

The use of genetic information

The book by Alonso et al. (2017) describes results on the use of genetic markers and genomics based markers. There are specific challenges that make the traditional validation framework less appropriate, in particular the fact of having huge amount of data, but with relatively little replication (see also Nasiri et al., 2017). This was addressed in a concerted effort, targeting biomarkers that realistically can be used in this context. In particular, Chapters 16 and 17 in the book are relevant in this context. The book in general and these chapters are accompanied by user-friendly SAS macros, R functions, and Shiny Apps.

Recommendation 31: In case of genetic and otherwise high-dimensional markers, we recommend the use the methodology expressly developed for this context, in conjunction with the software tools made available.

Incorporating dose-response information

Dose-response information is extremely valuable in the context of markers in general and surrogate endpoints in particular. It allows to study the differential effect of surrogates as a function of dose. Such differential aspects had been acknowledged, but not properly studied. This problem has been placed in the broader context of multivariate and even high-dimensional surrogate endpoints and studied in Alonso et al. (2017; Chapter 16) in the so-called QSTAR framework.

Recommendation 32: In case of a surrogate with dose-response or otherwise multivariate information present, we recommend to use the Quantitative Structure Transcription Assay Relationship framework results.

Efficient computational methods, simulation-based and other

When surrogate markers are evaluated, the use of multiple units (centers, trials, etc.) is needed, no matter which paradigm is used. It is well-known that full likelihood estimation is usually prohibitive in such complex hierarchical settings, in particular when trials are of unequal (and small) sizes. This phenomenon has been examined by van der Elst et al.(2016). Based on this, Hermans, Birhanu et al. (2017, 2017 submitted) propose solutions for simple but generic longitudinal settings with units of unequal size; these solutions are based on weighting methods. These articles and references therein provide a theoretical basis. Further, Flórez Poveda et al. (2017) provide a theoretical and practical examination of such weighting methods for the specific context of surrogate endpoints. Associated with all of this, throughout the book by Alonso et al. (2017), SAS macros, R functions, and Shiny Apps are provided that implement these methods in a user-friendly way.

Recommendation 33: In case of the evaluation of surrogate endpoints in small studies, we recommend using weighting based methods, because the methodology has been shown to work well theoretically, because it has been implemented in user-friendly SAS and R software, and because its practical performance is fast and stable.

3. Beyond IDeAl DoW

As described in the previous chapters, IDeAl has contributed to the most important areas of statistical design and analysis of small population clinical trials with a significant number of new results. This already refines the actual methodologies. However, IDeAl description of work program (DoW) stimulates further research within the group, which was addressed simultaneously. This new research go far beyond the initial IDeAl research plans. Some of these further results have already been summarized in scientific publications, whereas some other are still work in progress and therefore in the preparation phase. Among these several presentations are planned in the future

• Invited talk by Stephen Senn “Randomisation isn’t perfect but doing better is harder than you think.” 3 May 2017, Fourth Bayesian, Fiducial, and Frequentist Conference (BFF4)

• Invited talk by Frank Bretz “Threshold-crossing: A Useful Way to Establish the Counterfactual in Clinical Trials?” at BBS Spring Seminar The use of external data for decision making May 5, 2017

• Invited talk by Franz König “Threshold-crossing: A Useful Way to Establish the Counterfactual in Clinical Trials?” at PSI Conference, 16 May 2017.

• Invited talk by Stephen Senn “Thinking Statistically. What Counts and What Doesn't?” at CASI 2017, 37th Conference on Applied Statistics. Ireland. 15.-17. May 2017.

• Invited talk by Ralf-Dieter Hilgers about IDeAl and Randomization intitled “IDeAl Randomization” on May 17th, 2017 at the Department of Biometrie and Clinical Research - Seminar über neuere Methoden der Biometrie

• Several presenations and posters at the ISCB, July, 9th-13th 2018 in Vigo Spain

• Invited talk by Ralf-Dieter Hilgers about IDeAl findings special within the contributed session proposal on Small Clinical Trial for the CEN-ISBS Vienna 2017 meeting in Vienna 28.8-1.9.2017

• Invited talk by Ralf-Dieter Hilgers as invited speaker about IDeAl findings and future at the asterix end symposium 18th-19th, 2017 in Zaandam

• Invited talk by Ralf-Dieter Hilgers as key-note speaker with a presentation entitled Statistical designs of small population trials and member of Panel Discussion at Novartis, Basel October 16th -17th 2017

• Invited Tutorial “Regulatory statistics with some European perspectives” by Franz König, Martin Posch and Frank Bretz, December 2017. The 73rd Deming Conference on Applied Statistics. 7. December 2017, Atlantic City, USA. http://www.demingconference.com/

• Invited Short Course “Adaptive designs and multiple testing”. by F. König, M. Posch and F. Bretz December 2017. The 73rd Deming Conference on Applied Statistics. 7. December 2017, Atlantic City, USA. http://www.demingconference.com/

• Presentation at the Biometrisches Kolloquium (German Region) March, 2018 in Frankfurt/Main within the session “Rare Diseases”

• Presentation at the conference “Design of Experiments: New Challenges” April 30th to May 4th 2018 at the Centre International de Rencontres Mathematiques, Marseille

• Organisation of a session by Nicole Heussen about randomization at the Ninth International Workshop on Simulation will be held in Barcelona in 18-22 June, 2018

Some joint research led to recommendations in special applications. For instance, the lack of standards for reporting clinical trials using a crossover layout for evaluation of analgesic treatment for chronic pain resulted in a paper published in PAIN (Gewandter et al., 2016). There it is recommended to pay special interest to missing data, analysis method, reporting sensitivity analysis e.g. treatment x period interaction. Statistical design considerations in first in human studies, which usually are supposed to be of small size, and are necessary in all drug development programs were discussed in Bird et al., 2017. The 6 key issues highlighted in the paper are dose determination, availability of pharmacokinetic results, dosing interval, stopping rules, appraisal by safety committee, and clear algorithm required if combining approvals for single and multiple ascending dose studies. Further, in basal cell carcinoma the lack of knowledge based on historical data leads to the recommendation for a registry (Rübben et al., 2016).

In the area of personalized medicine, interindividual differences in the magnitude of response to an exercise training program (subject-by-training interaction; “individual response”) have received increasing scientific interest and are investigated form the statistical perspective (Heckensteden et al., 2015).

The following description of concomitant research is structured according to IDeAls finding, that there are three levels of actions necessary to improve statistical design and analysis methodology in small population clinical trials (Hilgers, König et al., 2016).

The first level belongs to the rigor use of the actually known and available best design and analysis methods. As linear mixed effects model become well known nowadays, they can be used successfully for the evaluation of endpoints in the longitudinal data setting. Van der Elst described useful linear mixed effects models to estimate the reliability of repeatedly measured endpoints (Van der Elst, Molenberghs, Hilgers et al., 2016). Of course this is one piece of endpoint evaluation and Van der Elst applied the methodology successfully to a small with respect to the participants – trial. However, linear mixed effects models are also useful to describe changes in treatment effects, by using the slope rather than difference in change. The former is more sensitive to changes and shows perhaps smaller variation, which makes it easier to detect differences between treatments. This was observed by applying linear mixed effects model to model the SARA 2 year registry data of the European Friedreich’s Ataxia Consortium for Translational Studies (EFACTS) (Reetz et al., 2016).

Limitations in well know approaches are the second level and with respect to that the use of randomization based inference in linear mixed effects model was considered. As IDeAl has shown that there are differences between the randomization procedures, the question arises, how the inference in linear mixed effects model with longitudinal data using the population based versus the randomization based inference coincide in small trials. (Burger, 2017 in preparation)

Another question addresses the unknown value of stratified analysis and randomization in small population clinical trials. As stratified randomization is recommended in particular in small clinical trials (CHMP 2007), the question whether this is useful in the design phase, i.e. implemented as stratified or covariate adaptive randomization procedure or in the analysis phase, i.e. implemented by including covariables in the statistical model has to be answered. This is currently investigated for dichotomous endpoint by Fitzner et al. (2017 in preparation).

Adaptive enrichment designs have received attention as they have the potential to make drug development process for personalized medicine more efficient. In Sugitani et al. (2017, submitted) we investigated different flexible alpha allocating strategies allowing the testing of an overall population and a targeted subgroup. We showed that by allowing an adaptive interim analysis to decide whether the full or only the targeted subgroup should be included for the remainder of the trial, the power can be substantially improved.

The third level concerns the development of new statistical approaches to design and analysis small population clinical trials. As randomization based inference are shown to be useful in particular in small clinical trials the problem how to deal with missing observations is not understood in randomization based inference. The questions traces back to the formulation of the so called reference set and was recently investigated in a paper (Hilgers, Rosenberger and Heussen, 2017 submitted)

Second the missing value are sometimes occur as a result of undetectable measurements. The problem is of relevance in pharmacometrics and that happens to appear in most clinical trials. The problem to derive estimates in this setting in multicentre trials where various laboratories are included in the analysis of measurements is investigated in a paper by Berger (Berger et al., 2017 in preparation).

New medicines for children should be subject to rigorous examination whilst taking steps to avoid unnecessary experimentation. Extrapolating from adult data can reduce uncertainty about a drug’s effects in younger patients meaning smaller trials may suffice. Assuming that conduct of this trial is conditional on having demonstrated a significant beneficial effect in adults, this goal is achieved by adopting a Bayesian approach to incorporate these adult data into the design and analysis of the paediatric trial (Hampson et al., 2017 in preparation)

The design of a combined seamless phase I/II trial as recommended by the Japan Interventional Radiology in Oncology Study Group (Kobayashi et al., 2009) is elaborated within a bachelor’s thesis under supervision by Ralf-Dieter Hilgers and Marcia Rückbeil. The design is intended to address questions of safety and efficacy while keeping the samples size very small. Furthermore, another master’s thesis supervised by Ralf-Dieter Hilgers and Marcia Rückbeil tackles the problem of how to incorporate registry data in a randomized clinical trial using a frequentist approach.

And finally, it has to be noted that big data aspects are present in small population research as well. The bridge between big data and small population clinical trials was build up resulting in recommendations for an European Union action plan in Auffray et al., (2016).

Ralf-Dieter Hilgers is currently working on two papers summarizing IDeAl results at different levels. First, the synergy paper together with Kit Roes (asterix) and Nigel Stallard (InSPiRe) developing a joint statement of all three project as an overall perspective. Further, to sum up the IDeAl view more specific a second paper of the IDeAl group is under preparation.

1.4 Summary

To summarize IDeAl’s results

• In WP 2 we developed a new methodology for the selection of the best practice randomization procedure and subsequent analysis for a small population clinical trial taking possible bias into account.

• In WP 3 we developed a new optimized design and analysis strategy for comparing dose response profiles to extrapolate clinical trial results from a large to a small population.

• In WP 4 we developed statistical methods to adapt the significance level and allow confirmatory decision-making in clinical trials with vulnerable, small populations.

• In WP 5 we developed design evaluation methods enabling small clinical trials to be analysed through modelling of continuous or discrete longitudinal outcomes.

• In WP 6 we developed approaches to planning and analysing trials for identifying individual response and examining treatment effects in small populations.

• In WP 7 We developed new methods for sample size calculation, type 1 error control, model averaging and parameter precision in small populations group trials within non-linear mixed effects modelling.

• In WP 8 we developed new methods for identifying biomarkers and prognostic scores based on high dimensional genetic data in small population group trials.

• In WP 9 we evaluated how to optimise the overall value of drug development to patients, to regulators and to society under opacity in regulatory and payer rules as well as in very rare diseases.

• In WP 10 we developed methodology to evaluate potential surrogate markers and to analyse data from small numbers of small trials, with emphasis on fast and easy computational strategies

From these results we derive the following list of recommendations:

Recommendation 1. Do not select a randomization procedure by arbitrary arguments, use scientific arguments taking into account the expected magnitude of bias.

Recommendation 2. In case of randomized clinical trial emphasis should be given to the selection of the used randomization procedure by following ERDO using randomizeR.

Recommendation 3. In case of a randomized clinical trial, we recommend to conduct a sensitivity analysis to elaborate the impact of bias on the type-I-error probability.

Recommendation 4. The comparison of dose response curves should be done by the bootstrap approach developed by Dette et al. (2017) instead of Gsteiger et al. (2011).

Recommendation 5. If the aim of the study is the extrapolation of efficacy and safety information, we recommend to consider and compare the MEDs of two given populations.

Recommendation 6. The derived methodology shows a very robust performance and can be used also in cases where no precise information about the functional form of the regression curves is available.

Recommendation 7. In case of planning a dose-finding study comparing two populations, we recommend to use optimal designs in order to achieve substantially more precise results.

Recommendation 8. In case of confirmatory testing, we recommend adapting the significance level by incorporating other information (e.g. using information from drug development programs in adults for designing and analysing pediatric trials).

Recommendation 9. In case of design modification during the conduct of a confirmatory clinical trial, we recommend using adaptive methods to ensure that the type-I-error is sufficiently controlled not to endanger confirmatory conclusions. Especially in clinical trial with multiple objectives special care has to be taken to address several sources of multiplicity.

Recommendation 10. In case randomized control clinical trials are infeasible, we propose “threshold-crossing” designs within an adaptive development program as a way forward to enable comparison between different treatment options.

Recommendation 11. For evaluation of designs of studies with longitudinal discrete or time to event data, evaluation of the Fisher Information matrix should be done without linearization. Using the new approach MC-HMC (in MIXFIM) will provide adequate prediction of standard errors and allow to compare several designs.

Recommendation 12. When there is little information on the value of the parameters at the design stage, adaptive designs can be used. Two-stage balanced designs are a good compromise. The new version of PFIM can be used for adaptive design with continuous longitudinal data.

Recommendation 13. When there is uncertainty in the model and on their parameters, a robust approach across candidate models should be used to design studies with longitudinal data.

Recommendation 14. We recommend that response should not be defined using arbitrary and naïve dichotomies but that it should be analysed carefully paying due attention to components of variance and where possible using designs to identify them.

Recommendation 15. For the analysis of n-of-1 trials, we recommend using an approach that is a modified fixed-effects meta-analysis for the case where establishing the treatment works is the object and an approach through mixed models if variation in response to treatment is to be studied.

Recommendation 16. When analysing between-patient studies we recommend avoiding information destroying transformations (such as dichotomies) and exploiting the explanatory power of covariates, which may be identified from ancillary studies and patient databases.

Recommendation 17. In case of a conducting a series of n-of-1 trials we recommend paying close attention to the purpose of the study and calculating the sample size accordingly using the approach by (Senn, 2017).

Recommendation 18. If fast computations of power curves are needed from a non-linear mixed effects model, we recommend using the parametric power estimation algorithm as implemented in the stochastic simulation and Estimation(SSE) tool of PsN (potentially with a type-I correction based on the “randtest” tool in PsN).

Recommendation 19. The simulation methods described above can be utilized to investigate the effects of using different, smaller, more parsimonious models to evaluate data from complicated biological systems prior to running a clinical study.

Recommendation 20. We recommend the use of Sampling Importance Resampling to characterize the uncertainty of non-linear mixed effects model parameter estimates in small sample size studies. Non-estimability of parameters may be assessed using preconditioning. The use of the bootstrap model averaging method (Method 2) (Aoki et al., 2016) is recommended when conducting model-based decision-making after a trial. Robust Model based adaptive optimal designs may be used to improve model certainty in clinical trials.

Recommendation 21. We recommend using “varclust” for clustering of gene expression data and extraction of a relatively small number of potential predictors of patients’ response to the treatment based on gene expression data.

Recommendation 22. It is recommended to use the information on the ancestry of genetic markers when mapping genes in admixed population. It is also recommended to use both regular and group SLOPE, since regular SLOPE has a higher power of detection of additive gene effects, while group SLOPE allows for identification of rare recessive variants.

Recommendation 23. If model building is based on highly correlated gene expression data, we recommend the use of SLOPE due to its computational tractability and good predictive properties.

Recommendation 24. Formulate decision rules in a formal Bayesian decision theoretic framework.

Recommendation 25. Societal decision rules (regulation, reimbursement) should be determined based on explicit modelling of how they will inter-depend with commercial drug developing decisions.

Recommendation 26. Increase transparency in regulatory and payer decisions.

Recommendation 27. The well-being of the individual trial patient must have priority.

Recommendation 28. In case of small trials, which are in particular variable in size, we recommend the use of the causal inference framework, combined with efficient computational methods.

Recommendation 29. In case of the evaluation of surrogate endpoints in small trials subject to missingness, we recommend the use of pseudo-likelihood estimation with proper inverse probability weighted and doubly robust corrections.

Recommendation 30. In case of hierarchical and otherwise complex designs, we recommend using principled, yet fast and stable, two-stage approaches.

Recommendation 31. In case of genetic and otherwise high-dimensional markers, we recommend the use the methodology expressly developed for this context, in conjunction with the software tools made available.

Recommendation 32. In case of a surrogate with dose-response or otherwise multivariate information present, we recommend to use the Quantitative Structure Transcription Assay Relationship framework results.

Recommendation 33. In case of the evaluation of surrogate endpoints in small studies, we recommend using weighting based methods, because the methodology has been shown to work well theoretically, because it has been implemented in user-friendly SAS and R software, and because its practical performance is fast and stable.

The following list contains references not included in the appendix.

(1) Benzi, M. (2002): Preconditioning techniques for large linear systems: a survey. Journal of Computational Physics 182, 418–477.

(2) Berger, R. L. (1982): Multiparameter hypothesis testing and acceptance sampling. Technometrics 24, 295–300.

(3) COMMITTEE FOR MEDICINAL PRODUCTS FOR HUMAN USE (2009): Guideline on clinical trials in small populations. [Online] 2006. [Cited: February 1, 2013.] www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC 500003615.pdf.

(4) Danhof, M., de Lange, E. C. M., Della Pasqua, O. E., Ploeger, B. A. & Voskuyl, R. A. (2008): Mechanism-based pharmacokinetic-pharmacodynamic (PK-PD) modeling in translational drug research. Trends in pharmacological sciences 29, 186–91

(5) Dumont, C., Chenel, M., Mentré, F. (2016): Two-stage adaptive designs in nonlinear mixed effects models: application to pharmacokinetics in children. Communications in Statistics - Simulation and Computation, 45: 1511-25.

(6) Gisleskog, P. O., Karlsson, M. O. & Beal, S. L. (2002): Use of prior information to stabilize a population data analysis. Journal of pharmacokinetics and pharmacodynamics 29, 473–505

(7) Gsteiger, S. , Bretz, F. and Liu, W. (2011): Simultaneous Confidence Bands for Nonlinear Regression Models with Application to Population Pharmacokinetic Analyses, Journal of Biopharmaceutical Statistics, 21(4), 708- 725.

(8) Ivanova, A., Barrier, R. C., and Berger, V. W. (2005): Adjusting for observable selection bias in block randomized trials. Statistics in Medicine 24, 1537–1546.

(9) Jonsson, E. N. & Karlsson, M. O. (1999): Xpose--an S-PLUS based population pharmacokinetic/pharmacodynamic model building aid for NONMEM. Computer methods and programs in biomedicine 58, 51–64 (1999).

(10) Karlsson, K. E., Vong, C., Bergstrand, M., Jonsson, E. N. & Karlsson, M. O. (2013): Comparisons of Analysis Methods for Proof-of-Concept Trials. CPT: Pharmacometrics & Systems Pharmacology 2, e23

(11) Keizer, R. J., Karlsson, M. O. & Hooker, A. (2013): Modeling and Simulation Workbench for NONMEM: Tutorial on Pirana, PsN, and Xpose. CPT: pharmacometrics & systems pharmacology 2, e50

(12) Kobayashi, T. et al. (2009): Phase I/II clinical study of percutaneous vertebroplasty (PVP) as palliation for painful malignant vertebral compression fractures (PMVCF): JIVROSG-0202. Annals of Oncology 20: 1943-1947.

(13) Lesko, L. J. (2012): Drug Research and Translational Bioinformatics. Clinical Pharmacology & Therapeutics 91, 960–962

(14) Lestini, G., Dumont, C., Mentré, F. (2015): Influence of the size of cohorts in adaptive design for nonlinear mixed effects models: an evaluation by simulation for a pharmacokinetic and pharmacodynamic model for a biomarker in oncology. Pharmaceutical Research, 32:3159–69.

(15) Lindbom, L., Ribbing, J. & Jonsson, E. N. (2004): Perl-speaks-NONMEM (PsN)—a Perl module for NONMEM related programming. Computer Methods and Programs in Biomedicine 75, 85–94

(16) Liu, W., Bretz, F., Hayter, A. J. and Wynn, H. (2009): Assessing nonsuperiority, noninferiority, or equivalence when comparing two regression models over a restricted covariate region., Biometrics, 65 (4). pp. 1279-1287

(17) Marshall, S., Macintyre, F., James, I., Krams, M. & Jonsson, N. E. (2006): Role of mechanistically-based pharmacokinetic/pharmacodynamic models in drug development : a case study of a therapeutic protein. Clinical pharmacokinetics 45, 177–97

(18) Rubin, D. B. (1988): Using the SIR Algorithm to Simulate Posterior Distributions. Bayesian Statistics 3, 395–402

(19) Ting, N. (2006), Dose Finding in Drug Development, New York: Springer.

Potential Impact:

1. Potential impact

Ultimately, the main positive impact of IDeAl will be to patients with rare diseases, patients in other small groups (such as children or biomarker-defined subpopulations) and – through spill-over effects – also to patients in larger disease populations. The positive effects for individuals will partly be mediated by progress in trial methodology, partly by improved regulatory and reimbursement decisions. Methodological progress will lead to more cost-effective and reliable trials, facilitating clinical research and paving the way for medical and pharmaceutical advances. Building a more rational and transparent basis for societal decisions will incentivize commercial investments and lead to the marketing of a larger number of pharmaceuticals.

Science

The development of statistical and pharmacometric methodology is the backbone of the IDeAl project. Successful methodological development is a necessary, but not sufficient, condition for meaningful impact to patients and to the society. We will start by discussing IDeAl’s role for the relevant methodological sciences, and will return to discussing important contributions to the wider society below.

Direct scientific impact can be indicated by the number of publications and presentations, which journals have been targeted and by the number of citations of different articles. Bibliometric measurements have well-known pros and cons, and we will only use such approaches briefly to give some indication of the quality of the output. At the time of writing, 63 articles with support from the IDeAl grant have been published or are accepted for publication in scientific journals from a range of different disciplines.

Articles have been published in some of the very best statistical journals, such as Annals of Statistics, Journal of the American Statistical Association (JASA), Biostatistics, Biometrics, and Statistics in Medicine. IDeAl has published seven articles in the latter journal, which can be said to be the premier statistical journal dedicated to medical applications. Journals in Statistics and related fields have traditionally had much lower Impact Factors than many other disciplines, that is, rapid citations are relatively in-frequent. As the project recently ended, the vast majority of articles are published this year or the last. It is therefore worth noting that several IDeAl articles in statistical journals have already been cited frequently. For example, Google Scholar lists 50 citations for Bogdan et al. (Ann Appl Stat, 2015); and 20+ citations each for Alonso et al. (Biostatistics, 2015), Bauer et al. (Stat Med, 2016) and König et al. (Biom J, 2015).

Articles have also been published in leading journals in a range of quantitative disciplines outside Statistics, such as Pharmacometrics (J Pharmacokin Pharmacodyn; Clin Pharmacol Ther), Health Economics (JHE) and Epidemiology (Eur J Epid). This indicates that the research within IDEAL has impact more broadly within methodological fields. For example, there are already 99 citations of Greenland et al. (Eur J Epid, 2016), not much more than one year after publication.

Also a couple of medical journals have been targeted by IDeAl sponsored publications. For example, both Gewandter et al. (Pain, 2014) and Hecksteden et al. (J Appl Physiol, 2015) have more than 30 citations each in the database of Google Scholar. Further IDeAl contributed to publications in the high ranking medical journals Lancet Neurology (Reetz, 2016) and JAMA Dermatology (Rübben, 2016) as well as rare disease specific one (Hilgers, Roes and Stallard 2016, Orphanet Journal).

Work from the IDeAl project has generated more than 170 presentations at conferences, workshops, etc. and exposed a large number of other scientists to the ideas and methodology developed within the project. A number of PhD students and career-young scientists have been trained through IDeAl -related work, and have moved on to serve the scientific community.

The Drug Information Association (DIA) is the leading professional organization for individuals working with drug discovery, development and regulations, and has over 18 000 members. DIA’s Adaptive Design Scientific Working Group (ADSWG) is arguable the world’s leading working group for issues around the design of clinical trials. The IDeAl project has presented twice at the ADSWG Key Opinion Leader (KOL) lecture series. Two of the IDeAl work package leaders, König and Burman, proposed at the beginning of the IDeAl project that DIA’s ADSWG should form a subteam to work on designs for small populations. This subteam has collected scientists from industry, academia and regulatory agencies, mainly from US and Europe. To date, the subteam has published almost 20 scientific articles, underlining the impact and global leadership that IDeAl has exercised globally and cross disciplines.

Clinical research

The methodology developed and refined within the IDeAl project will influence clinical research in a number of ways, ranging from the development and validation of genetic biomarkers, over extrapolation techniques, to optimizing trial design and analysis.

The importance of biomarkers is rapidly increasing in clinical research. This trend may be especially important for rare diseases. First, these diseases often have a stronger and simpler genetic causality than what is likely the case for large-population health problem, such as obesity, type II diabetes mellitus and smoking addiction. Secondly, the use of response biomarkers (as opposed to e.g. genetically based predictive biomarkers) may be more important when the limited patient population size makes it infeasible to use dichotomous hard endpoints or survival time as the primary variable in confirmatory trials. Thirdly, modern drug development often aims at personalizing treatments; biomarkers are used to define subpopulations for which different treatments may be optimal. Existing methods for the identification of important genes have been refined and new methods have been added to the arsenal of the geneticists. As an example, the “group SLOPE” approach (Brzyski et al., 2017 submitted) is especially useful for the understanding of the genetic origin of rare diseases. To aid the extrapolation of clinical data from one population to another, e.g. from an adult population to a much smaller pediatric one, one important step can be to compare the dose response curves for a response biomarker, which is easier to measure than the hard clinical endpoint. IDeAl has improved the previous standard methodology for comparing two such curves, and showed that the efficiency can be considerably increased using a bootstrap approach (Dette et al., 2017). Response biomarkers should preferably be validated and IDeAl has been working on integrating different approaches to find optimal ways for this. For small populations groups, a causal inference framework has proven especially useful (Alonso et al., 2017). This book also propose methods that are tailored to the specific problems of validating genetic markers, based on data from several studies. As opposed to this, the two papers by Ondra et al. (2016, 2017 in preparation) demonstrate how the design of a single clinical trial should be optimized when there exists a potentially important predictive biomarker that can be used to qualify the population.

Such enrichment designs, which can be either fixed or adaptive, as well as validating response variables with improved signal-to-noise ratio, are two examples of the design-related part of IDeAl’s work. The design of clinical trials in small population groups can be made more efficient and effective in a number of other ways. On one hand, much of the methodology to increase study power can be applied in large populations as well as in small. On the other hand, the balance is shifted from robust inference to effective inference when going from larger to increasingly smaller population sizes: For large cardiovascular and metabolic diseases, it may be feasible to randomize over 10 000 patients in a parallel group design to address mortality (and other hard endpoints). Such a design is robust in the sense that a significant result relies on very few non-verifiable conclusions. Mortality, life or death, is undeniable an important outcome to the patient, and there is a relatively small risk of bias. That the sample size needs to be huge to achieve sufficient power, and that the corresponding trial cost may be hundreds of million euros, may be acceptable if there are millions of eligible patients and the potential value of a successful treatment is great. For a much smaller patient population, the trial size will by necessity be much smaller. In order to be able to compare treatment options, it is therefore crucial to obtain as much information per patient as possible. This means that basing the inference on some reasonable assumptions may be justified if the signal-to-noise ratio can be improved. IDeAl methodology to improve this includes response biomarkers (Van der Elst et al., 2016), pharmacometric modelling (Karlsson et al., 2013) and cross-over or n-of-1 designs (Hecksteden et al., 2015). Repeated measures for each patients can be utilized through pharmacokinetic / pharmacodynamics (PK/PD) modelling to learn more from each patient. With small sample sizes, it is increasingly challenging to estimate the parameters in this type of non-linear models. IDeAl has therefore improved computational methods to allow more reliable estimations so that the potential gains in terms of sample size reduction can indeed be realized by clinical research groups conducting small trials. Optimal design methodology can be applied to further enhance the information retrieval. Such methodology is combined with adaptive design features to decrease the reliance on a priori assumptions, and novel computational methodology is provided (Loingeville et al., 2017). Although many clinical researchers are well aware of the benefits of conducting randomized clinical experiments, standard randomization procedures, which may be perfectly applicable to large trials, are often used also when the sample size is very small. IDeAl has worked tirelessly to raise awareness (e.g. Reetz et al., 2016, Gewandter et al., 2014, Greenland et al., 2016, Hilgers, Roes and Stallard 2016, Jonker et al. 2016, Lendrem et al., 2015, Hilgers et al., 2017), throughout the academic and industrial communities, of the importance of tailoring the randomization to the specific trial circumstances, including the sample size and risk of selection bias. The wider adoption of the ERDO framework, developed by IDeAl, will lead to more rational randomized patient allocation procedures, giving trial results that are more robust to selection bias and to inflation of the conditional type-I-error rate.

To facilitate applications in clinical research, IDeAl has developed several software packages, e.g. for randomization (Uschner et al., 2017); identification of genetic pathways (Brzyski et al., 2017 in preparation); surrogate markers (Alonso et al., 2017), extrapolation and curve comparisons (Möllenhoff, 2016); decision theoretic design optimization (Jobjörnsson, 2015); n-of-1 trials (Araujo et al., 2017), optimal designs for nonlinear mixed effects models (Riviere and Mentré, 2015).

Industry

Pharmaceutical and biotech industry in Europe will benefit from methodology that improves the identification of genetic / genomic markers, facilitating target identification in drug discovery. The finding of new mechanisms of action will allow a wider range of drug classes to be developed, tested and, if found to be beneficial in terms of benefit/risk, be marketed. A significant hindrance for the development of new drug classes is the enormous costs associated with drug development, costs that to a large extent are attributable to clinical trials. The IDeAl-developed methods, briefly discussed under the “Clinical research” above, that increase trial efficiency, power, and probability of success, constitute a partial antidote to accelerating trial costs. Furthermore, research within IDeAl has shown how the expected net present value can be maximized, by tuning design parameters as sample size and trial prevalence (Ondra et al., 2017 in preparation). The pricing of a new pharmaceutical has also been optimized (Jobjörnsson et al., 2016).

Commercial drug development is heavily dependent on EU regulations, EMA decisions and national reimbursement decisions. IDeAl has demonstrated that if pharmaceutical companies experience an intransparency in such societal decision rules, such as uncertainty of how benefit/risk and cost/effectiveness are weighted, the industry will not be able to design the best possible trial programmes (Jobjörnsson et al., 2016; Jobjörnsson, 2016).

Regulatory processes and health care systems

It is therefore important that regulators and payers strive for greater clarity. To be concrete, the relative importance of an anticipated safety outcome compared to the intended positive effect of a pharmaceutical can be discussed and defined at an early stage. Also, EMA and price regulators can work together to understand what is most important to the patients and the wider community. In case European regulators and payers can formulate clearer rules and align decisions between them, both industry and patients will benefit. This was one of the messages that IDeAl sent to regulators and payers when the European Medicines Agency (EMA) arranged the “Seventh Framework Programme small-population research methods projects and regulatory application workshop”, 29-30 March 2017 at the EMA headquarters in London.

This workshop was set up to discuss ideas from IDeAl and the related EU projects asterix and InSPiRe. The workshop, which is recorded and can be viewed from www.ema.europa.eu summarized much of the regulatory related output of the projects, and sparked much discussion. In her closing remarks, Anja Schiel, the chair of EMA’s Biostatistics Working Party commented on the value of the three projects: “I must say: Yes, cost effectiveness is on your side if I see what you put as an output and what it costed. That was a cheap gain, to put it mildly”.

IDeAl has directly interacted with regulatory representatives through a number of other channels as well. Examples include the comments on several regulatory documents, including draft guidance, the IRDiRC-EMA workshop on rare diseases, and the inclusion of regulators in the ADSWG small population working group. It could also be mentioned that one of the IDeAl researchers are currently seconded at the EMA as a statistical reviewer. And finally, IDeAl is the starting point for further research together with regulators, e.g. the FDA-project on evaluation of model-based bioequivalence statistical approaches of Holger Dette with France Mentré. Finally, the revision of the EMA guidance for “Clinical trials in small populations” is announced for 2018 and the findings of the three projects asterix, IDeAl and InSPiRe will be the backbone of the revision.

Patients and community

IDeAl was set out to improve the methodology for clinical trials in small population groups. That means that the most direct results are in terms of methodological advances, first communicated through publications and conference presentations. The positive results, regarding identification of genetic markers, improving trial efficiency, and optimising design, to mention a few, will gradually be taken up by clinical researchers. Our dissemination activities are raising awareness in the wide clinical community through a number of different routes. The smaller group of European regulators has been more directly targeted e.g. through the two EMA workshops. The European industry is reached partly through direct research contacts with statistical methodology groups at major companies, partly through many presentations at the key European pharmaceutical statistics meetings, and partly through the more widely targeted dissemination activities.

Novel and improved IDeAl methodology will therefore be used by different stakeholders to optimise trial design, analysis and regulations. The ultimate benefiters of this will be current and future patients with rare diseases or belonging to other small (sub)populations. Primarily, the patients will benefit from new medical treatments, the development of which is facilitated through IDeAl’s results on genetics as well as incentivising trials by cutting costs and improving quality. Secondarily, IDeAl methodology to reduce sample sizes will expose fewer patients to clinical experimentation and also lead to novel drugs faster reaching the patients in need. Trial patients will also benefit if IDeAl’s ideas on improving trial ethics gains wider attraction.

The community directly benefits from the improvement of the health of EU citizens. Indirectly this may translate into economic gains. Mechanism design approaches may also help optimise the society’s allocation of resources. Finally, the build-up of personal competence will aid European industry and research institutions. This clearly shows, that IDeAl contributes to the research in rare disease and as Dr. Irene Norstedt (Head of Innovative and Personalised Medicine Unit, European Commission) stated at the “Conference on the Development and Access of Medicines for Rare Diseases” held under the Maltese Presidency of the Council of the European Union in Valetta, March 21th 2017 stated, that EU assessed IDeAl as one of the lighthouse project in rare disease research funded by EU programs.

The former makes clear, that the EU funding kick of a snowball of methodological research in small population clinical trials, which currently expresses European leadership. The FDA project of Holger Dette and France Mentré on model-based methods to analyse bioequivalence studies is a positive sign on the one hand, however, potentially bears the risk, that European leadership get lost if further funding will not be implemented.

2 Main dissemination activities (WP11)

From the very beginning, IDeAl used different means to disseminate the results in various areas. The organizational structure was a key factor to successfully implement the dissemination activities at a worldwide level. IDeAl’s dissemination activities address different levels with a highlight list of progression in statistical methodology of small population clinical trials:

• networking leading to the connection to five other EU projects and involvement in the IRDIRC steering committee of the IRDIRC task force on small population clinical trials

• written output including actually 63 peer reviewed publication with new research findings and using social media channels as well as the IDeAl website,

• oral output including more than 170 presentations

• awareness sessions including short courses, tutorial and an IDeAl webinar series (with 11 webinars which are accessible via the IDeAl website)

• input to regulatory documents and guidelines with comments of four guidelines, actively giving input to the EMA “Extrapolation guideline” and conducting a joint workshop at the EMA

The spread of IDeAl work package leads over Europe and in cooperation of the external advisory board offers challenges and opportunities. The IDeAl group accepted the challenges and used the opportunities to disseminate the results and ideas to the stakeholders. Some finding need time to being implemented while braking barriers of traditional thinking in clinical trial methodology

The dissemination activity start from the inner part of the project network via the external advisory board. Researchers of the IDeAl project effectuated stays abroad to present the first results of our on-going project at international conferences and workshops, both internal and external (see highlight sections on WPs). Short-term face-to-face visits to other partners have been conducted, e.g. Ralf-Dieter Hilgers and Nicole Heussen (UKA) visited UHasselt, MUW, Carl-Fredrik Burman and Sebastian Jobjörnsson (CTH) visited MUW, France Mentré (INSERM), Sebastian Ueckert (UU), Marie-Karelle Riviere (INSERM) visited UHasselt, Malagorzata Bogdan (PWR) visited Chiara Sabatti (EAB, Stanford University). Ralf-Dieter Hilgers visited Professor Rosenberger and vice versa working on selection of best practice randomization procedures. The results were presented at the FDA on a video streamed talk on May, 6th, 2016. Professor Nicole Heussen presented her findings about missing mechanism acting in randomization based inference at the George Mason University on April, 29th 2016. Diane Uschner presented the randomizeR software at the NIH on October, 28th 2015.

Furthermore, the IDeAl project also stimulated additional research visits in the opposite direction. William F. Rosenberger (EAB) stayed for 3 month at the Department of Medical Statistics at the RWTH Aachen University funded by a Fulbright grand to work with several groups of the IDeAl Consortium, e.g. he visited Holger Dette and Franz König during his stay. His research visit was funded independent from the IDeAl grant. Gernot Wassmer (EAB) visited MUW to work on an invited paper on adaptive designs (research stay visit externally funded) c.f. section ‘publications resulting from the IDeAl project in the period’.

The IDeAl consortium established a “Young Scientist research group” which meets more than 10 times at various conferences and IDeAl meetings promoting the interactions between the workpackages. The young scientist group also serves as an investment in the future of dissemination of the IDeAl findings. Nine young scientist already get a position in pharmaceutical industry and by regulators.

The consortium organized dedicated sessions at conferences, e.g. MCP 2015 in Hyderabad, as well as full conferences. Highlights were the Design and Analysis of Experiments in Healthcare workshop in Cambridge, 2015 and together with asterix and InSPiRe the joint Small Population Symposium in 2014 in Vienna, the Seventh Framework Programme (FP7) Small-population Research Methods Projects and Regulatory Application workshop at the EMA in 2017.

The IDeAl consortium also organized two half-day seminars together with the International Biometric Society (IBS) – Viennese Section, whereby the seminar on ‘Innovative Methods in Drug Development’ in 2016 was co-hosted with the FP7 project Asterix.

The one week workshop on “Design and Analysis of Experiments in Healthcare” was organized by Rosemary Bailey (EAB), Ralf-Dieter Hilgers and Holger Dette at the Isaac Newton Institute, Cambridge, UK in July 2015. It was accompanied with a one day industry workshop on “Design of Experiments in Drug Development”. Here the challenges and results of IDeAl were discussed with selected experts in the field of statistical design from all over the world.

These activities led to including IDeAl as part of International Rare Diseases Research Consortium (IRDiRC) task force to advance progress in the field of “Small Population Clinical Trials”. Ralf-Dieter Hilgers was nominated as member of the steering committee in 2016. The IDeAl consortium participated to the Joint Workshop on “Small Population Clinical Trials Challenges in the Field of Rare Diseases” organized by IRDiRC hosted by the EMA on March 3rd 2016 and contributed to the IRDiRC report “Small Population Clinical Trials: Challenges in the Field of Rare Diseases” (July 2016).

IDeAl cooperated with 4 other EU research projects, EFACTS (FP7 Health 242193) addressed to the Friedreich Ataxia disease (see Reetz et al., 2016), the IMI project DDMoRe (FP7 IMI 115156 see Rivière et al.,2016), and the two other FP7 funded projects aiming to refine the statistical methods in small population group trials asterix (FP7 Health 603160) and InSPiRe (FP7 Health 602144), see publications Auffray et al. (2016), Graf, Posch and König (2014), König et al. (2015), Hlavin, König et al. (2016), Eichler et al. (2016), Magirr et al. (2016), Jonker et al. (2016), with the seminal paper of Hilgers, Roes and Stallard (2016).

The IDeAl group has published 63 papers (status June 2017) in peer reviewed journals, of which 34 are open access, and 11 of them in collaboration with a member of the external advisory board. The scope of the publications ranges from review articles to expert opinions, from applications to theoretical papers and layman description.

The papers received a lot of attention. For example, a joint paper including co-authors from InSPiRe, asterix and the European Medicines Agency on sharing clinical trial data (König et al., 2015) is currently listed as one of the most accessed and cited papers in Biometrical Journal. Three IDeAl papers were among the most accessed papers in the prestigious journal “Statistics in Medicine a paper on “Mastering variation: variance components and personalised medicine” (Senn, 2015), on extrapolation (Hlavin et al., 2016) and the featured article on 25 years of adaptive design (Bauer et al., 2015). The latter was also listed among the most cited in Statistics in Medicine in 2015 & 16. In Pharmaceutical Statistics the paper on adaptive paediatric investigation plans (Bauer and König, 2016) on most accessed in 2016.

As pointed out in one of the papers (Hilgers, König et al., 2016 ), the scope ranges from (i) advocating the rigor use of best available statistical methods to (ii) showing limitations when traditional methods fail to finally (iii) developing new statistical designs and analysis methods for small population clinical trials. Among review articles the joint paper with asterix and InSPiRe coordinators Kit Roes and Nigel Stallard in the Orphanet Journal show the options for development beyond the EMA guideline (Hilgers, Roes, Stallard, 2016). Applications of the best available statistical analysis methods to Friedreich Ataxia registry data leads to a joint publication in Lancet Neurology with the FP7 funded EFACTS project (Reetz, 2016). Of course, there are a number of various contributions to the biostatistical field in form of publications, but the consortium does not rest at that point. The project description in the May 2017 issue of impact – “Multidisciplinary health research” reaches among the 35 000 readers across Europe, North and South America, Asia-Pacific and Africa a core audience including national and regional funding agencies, research funding bodies, national, regional and local government, public sector organisations, policy and legislation organisations, universities, research institutes, research centres, NGOs and key industry/private sectors. However, IDeAl’s work does not end with the project lifetime. There are still at least 11 papers in the preparations phase and additional 14 already submitted.

In addition to publishing papers in “traditional” journals, the IDeAl consortium also had several blogs to disseminate results and have discussions in a timely manner.

Another important mean to inform the interested public was the electronic bi-annual newsletter, which was prepared by the IDeAl team. There have been seven issues, which were sent per email to over 200 registered e-mail addresses in 28 countries. Furthermore, all newsletter have been published on the IDeAl webpage. The information covered for example new research findings, list of accepted papers in peer-reviewed journals, information about meetings and talks as well as new software tools. The description of almost all software tools is included in the description of the publications including the theoretical background. Fifteen software packages, which are published on the CRAN website coming with a manual. All software tools can be accessed via the project website.

IDeAl used also social media account on TWITTER (account “@ideal_fp7”; >120 followers) and LinkedIn (group “IDEAL - FP7 Project “; >50 followers) to inform the broad public and important stakeholders on the progress and to promote IDeAl related events and main research activities. Also press releases have been launched to promote meetings and highlights like awards for research papers (see webpage).

With more than 170 oral presentations IDeAl reaches all relevant stakeholders. This includes presentations at the European Medicine Agency (EMA, 2014, 2016), the Food and Drug Administration (FDA) in 2016, at the Committee of Orphan Medicinal Products (COMP, 2017), EURORDIS (2014), Pharmaceuticals and Medical Devices Agency, Japan (PMDA) in 2015, at the Adaptive design working party of drug information association (DIA ADSWG) in 2014, Key Opinion Leaders (KOL) in 2016 and national institute of health (NIH, 2015). To bring research into practice it was important to provide sufficient training to clinicians, statisticians, sponsors and investigators of clinical trials. In addition to (usually short) oral presentations at conferences, we have successfully provided tailored trainings (e.g. short courses, workshops, tutorials, or summer schools) together with important and relevant learnt societies such as EURORDIS, ISCB or IBS. In total 21 workshops/short course and tutorials held by IDeAl members. Some of them are still available as videos, like the talks of workshop at the Isaac Newton Institute 2015.

In 2016 the IDeAl consortium launched its own IDeAl webinar series including 11 online lectures. The videos are still available via the project website linking to the IDeAl youtube channel. The webinars inform about the main research results of each work package. In average 50 participants from various stakeholders followed the webinars leading to an increased visibility of the project, while various researchers used to service to download the streamed videos from the website. Additional webinars were given for example to the DIA adaptive design working group, the American Statistical Association (ASA), the Royal Statistical Society (RSS) and Statisticians in the Pharmaceutical Industry (PSI). To give online webinar helped to reach out to a broad and international audience in an easy accessible way.

Finally, IDeAl had regular interactions with regulatory bodies throughout the conduct of the project (>15 occasions). IDeAl has commented on 5 EMA guidelines, has contributed to the reflection paper on extrapolation of efficacy and safety in paediatric medicine development, has contributed to IRDiRC’s report on small population clinical trials in rare diseases and conducted a joint workshop together with asterix and InSPiRe at the EMA in March 2017 to agree on relevant regulatory standards and methods in small population clinical trials. IDeAl members were invited to give special lectures to other international regulatory bodies including the US FDA (Hilgers and Heussen in 2016) and the Japanese regulatory agency PMDA (Bauer, König together with Posch in 2015).

In March 2017 Ralf-Dieter Hilgers was invited to give a presentation at the Strategic Review and Learning meeting during the Maltese Presidency of the Committee for Orphan Medicinal Products (COMP) in Malta. His talk entitled “Innovative statistical design methodologies for clinical trials in small populations focussing on rare diseases” reported on IDeAl research findings and was of high interest to the COMP.

In 2016 the senior medical director of the EMA initiated a joint project on how to incorporate external data sources (form other RCTS but also RWD) proposing threshold-crossing designs (Eichler et al., 2016). Furthermore, Franz König contributed to a review on experience gained on marketing authorization in Europe (Hofer et al. 2017, in preparation).

Although not explicitly funded by the project, IDeAl starts with dissemination of the results and contribution to applied research project. Ralf-Dieter Hilgers was involved in two eRARE project proposal of being one is currently funded (PDerm, PI: Ralf Ludwig, Lübeck). The contribution to analyse the data of the Friedreich Ataxia (EFACTS) was also presented at the third IRDiRC conference February 2017 in Paris. Another opportunity to discuss the findings in the oncology community, was the invited talk at the German cancer center (DKFZ) in January 2017.

3. Exploitation of results

The exploitation activities cover various fields

• contributions to regulatory guideline in the past and future

• deliver software codes

• scientific publications and presentation at conferences which last in the future

• applying new methodology in clinical trials in academia and industry

• Including new methods in Courses at Universities

The contributions to regulatory guideline in the past and future is one of the main concerns of IDeAl findings and dissemination. The perspective of a revision of the CHMP “Guideline on clinical trials in small populations” underlies the grand proposal and is expected for 2018, when all three projects come to an end. Perhaps one of the work package lead of asterix is expected to take the lead for the revision and IDeAl being closely related to asterix will bring the findings into the discussion, i.e. the regulatory context. IDeAl already commented on related guidance, e.g. 'Draft Guideline on evaluation of anticancer medicinal products in man'.

Further, IDeAl stays in contact with regulators at various occasions, e.g. IDeAl participated to the development of a “Framework of collaboration between the European Medicines Agency and academia”. IDeAl findings will help to implement the guidance on “Extrapolation of efficacy and safety in paediatric medicine development”, which is actually under discussion. Additionally, some of the research finding may have the potential to change regulatory requirements and may be selected as certification procedures for micro-, small- and medium-sized enterprises (SMEs) (see http://www.ema.europa.eu/). Within this context one may in particular think about the developments in WP2 – selection of the best practice randomization procedure, WP3 – extrapolation of dose response information, WP9 – decision theoretic evaluation of drug legislation and WP10 – evaluation of surrogate endpoints. Certification of procedure would corroborate the European leadership in design and analysis of small population clinical trials and supports IRDiRC targets. And some activities lead to funding outside EU, for instance the FDA project of Holger Dette and France Mentré about the “Evaluation of Model-Based BioEquivalence (MBBE) statistical approaches for sparse designs PK studies”. However, the funding of the fees for certification of procedures is currently unclear.

Exploitation also belongs to the delivered software codes. IDeAls software code has reached a lot of interest. The following list describes the delivered programs in brief.

1. Araujo, A. (2016): R-Code “Statistical Analysis of Series of N-of-1 Trials Using R”, http://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/nof1_rand_cycles_v8.pdf

2. Brzyski, D. Peterson, C., Candes, E.J. Bogdan, M., Sabatti, C., Sobczyk, P. (2016): R package "geneSLOPE" for genome-wide association studies with SLOPE. https://cran.r-project.org/web/packages/geneSLOPE/index.html

3. Graf, A., Bauer, P., Glimm, E., König, F. (2014): R-Code to calculate worst case type-I-error inflation in multiarmed clinical trials, http://onlinelibrary.wiley.com/doi/10.1002/bimj.201300153/suppinfo

4. Jobjörnsson, S. (2015): R package "bdpopt" for optimization of Bayesian Decision Problems. https://cran.r-project.org/web/packages/bdpopt/index.html

5. Hlavin, G. (2016): application for extrapolation to adjust significance level based on prior information, http://www.ideal-apps.rwth-aachen.de:3838/Extrapolation/

6. Möllenhoff,K. (2015): R package "TestingSimilarity" for testing similarity of dose response curves. https://cran.r-project.org/web/packages/TestingSimilarity/

7. Riviere, M.K. Mentré, F. (2015): R package “MIXFIM” for the evaluation and optimization of the Fisher Information Matrix in Non-Linear Mixed Effect Models using Markov Chains Monte Carlo for both discrete and continuous data. https://cran.r-project.org/web/packages/MIXFIM/

8. Schindler, D., Uschner, D., Manolov, M, Pham, M., Hilgers, R.-D. Heussen, N. (2016): R package "randomizR" on Randomization for clinical trials. https://cran.r-project.org/web/packages/randomizeR/

9. Senn, S, (2014): R, GenStat and SAS Code for Sample Size Considerations in N-of-1 trials, http://www.ideal.rwth-aachen.de/wp-content/uploads/2014/02/Sample-Size-Considerations-for-N-of-1-trials.zip

10. Sobczyk, P., Josse, J., Bogdan, M. (2015): R package "varclust" for dimensionality reduction via variables clustering. https://psobczyk.shinyapps.io/varclust_online/

11. Sobczyk, P., Josse, J., Bogdan, M. (2017): R package "pesel" Automatic estimation of number of principal components in PCA with PEnalized SEmi-integrated Likelihood (PESEL). https://github.com/psobczyk/pesel

12. Szulc, P., Frommlet, F., Tang, H., Bogdan, M. (2017): R application for joint genotype and admixture mapping in admixed populations, http://www.math.uni.wroc.pl/~mbogdan/admixtures/

13. Van der Elst, W., Alonso, A., Molenberghs, G. (2017): R package "EffectTreat" on the Prediction of Therapeutic Success. https://cran.r-project.org/web/packages/EffectTreat/index.html

14. Van der Elst, W., Meyvisch, P., Alonso, A., Ensor, H.M. Weir, C.J. Molenberghs, G. (2017): R Package "Surrogate" for evaluation of surrogate endpoints in clinical trials. https://cran.r-project.org/web/packages/Surrogate/

15. Van der Elst, W., Molenberghs, G., Hilgers, R.-D. Heussen, N. (2016): R package "CorrMixed" for the estimation of within subject correlations based on linear mixed effects models. https://cran.r-project.org/web/packages/CorrMixed/index.html

The downloads for most codes vary between 100 and 300 per month in the last 2 years whereas effecttreat has peak downloads of 830 and surrogate of 2515.

Scientific publications and presentations at conferences which last in the future are already descripted in chapter “Beyond DoW”.

And of course, the IDeAl findings will be included in regular courses at Universities, in particular in the educational programs at medical faculties as well as in the consultation of clinical trials. An example for the latter is the involvement of Ralf-Dieter Hilgers as responsible biostatistician in the European wide randomized, double-blind, placebo-controlled, parallel-group, multicentre study of the efficacy and safety of nicotinamide in patients with Friedreich´s ataxia (NICOFA) lead by Professor Jörg Schulz, RWTH-Aachen University.

The IDeAl project, as well as the asterix and InSPiRe projects, has resulted in development of innovative methodology for the statistical design and analysis of small population clinical trials and reflect current and future European leadership in this area. Some methods will rapidly be or are already being implemented, whilst for other more ground-breaking methods, further work will be needed prior to implementation in practice. The three project coordinators, Ralf-Dieter Hilgers, Kit Roes and Nigel Stallard continue to collaborate and are highly motivated to work together to advance knowledge and promote best practice in this area. Ongoing close work includes development of a proposal for a European Reference Network for statistical methodology in design and analysis of small population clinical trials, as suggested by Gerard Long (Eurordis).

List of Websites:

Website: www.ideal.rwth-aachen.de

Participants contact:

Coordinator: Professor Ralf-Dieter Hilgers (WP2), Department of Medical Statistics, University Clinic Aachen, RWTH Aachen University, Pauwelsstr. 19, 52074 Aachen, Germany. rhilgers@ukaachen.de

Professor Holger Dette (WP3), Ruhr University Bochum, Germany

Professor Franz König (WP4), University of Vienna, Austria

Professor France Mentré (WP5), Institut National de la Santé et de la Recherche Medicale, France

Professor Stephen Senn (WP6), Luxembourg Institute of Health (former Centre de Recherche Public de la Santé), Luxembourg

Professor Mats Karlsson (WP7), Uppsala University, Sweden

Professor Malgorzata Bogdan (WP8), Polytechnika Wroclawska, Poland

Professor Carl-Fredrik Burman (WP9), Chalmers and University of Technology, Sweden

Professor Geert Molenberghs (WP10), University Hasselt, Belgium

Professor Christoph Male (WP11), University of Vienna, Austria