## Final Report Summary - ISMPH (Inference for a Semi-Markov Process using Hazards Speciﬁcation)

During this project, we focused on the three main objectives and three training objectives of the project. We had the opportunity to pay a great attention to modeling lifetime data arising from Survival Analysis using the cure rate models; this family of models allows for a proportion of items/individuals in the population under study, to never experience any recurrence of the event of interest. These items/individuals are called cured or non-susceptibles or long-term survivors or immune. Cure rate models have been extensively studied during the last few decades, due to the substantial improvement in treatments for a range of diseases, but also due to the fact that in many social phenomena a number of individuals are not susceptible to the event of interest. Applications of cure rate models may be found in biomedical studies, criminology, finance, industrial reliability, etc (see e.g. [1]).

The most popular cure rate models can be defined through a competing cause scenario, where the random variables corresponding to the time-to-event due to each competing cause are considered as independent and identically distributed random variables while the total number of competing causes is an unobservable latent discrete random variable. It is worth mentioning that the models studied in the classical survival analysis may be viewed as special cases of this current family. The estimation of the cured proportion (the proportion of items which are not subject to the event of interest or equivalently, their number of competing causes is zero) and inference about the recurrence (failure) time of the susceptibles are the most important problems in this area.

After an extensive review of the relevant literature, our study focused on cases where the number of competing causes obeys a Bernoulli, Poisson, COM-Poison distribution, Weighted Poisson, or generalized discrete Linnik distribution. For example, in the last two cases, a new weighted Poisson distribution motivated by discrete asymmetric triangular distribution was introduced. This distribution offers great flexibility to work with over/underdispersed and/or zero inflated/deflated data, as well. Special cases include the well studied promotion time cure rate model and the COM-Poisson cure rate model (this work has already submitted for consideration for publication in referred journal). Moreover, a flexible family of cure rate models, appropriately relaxing some assumptions of the classical promotion time (proportional hazards) cure rate model and using the Linnik distribution was also studied. Special cases of this model are among others the promotion time, the geometric (proportional odds) and the negative binomial cure rate model. In addition, this current model generalizes specific families of transformation cure rate models and some well studied destructive cure rate models (this work has already submitted for consideration for publication in referred journal and it is in accordance with the main training objective A of the project, i.e. "to gain a thorough insight in the theory of modeling lifetime data")

Furthermore, a logistic regression model was used to model the probability of an individual to be cured. The common hazard function h(t) of the time-to-event due to i-th competing cause was studied under parametric, non-parametric and semi-parametric framework (this is in accordance with the main training objective B, i.e. "to study the hazard model in relation with and its connection to the transition times of a semi-Markov process and a set of relevant covariates"); specifically, a Weibull distribution or a piecewise linear function or Cox’s proportional hazard model was assumed for h(t) (this is also in accordance with the main objective A of the project, i.e. "To study the hazard models and their connection to the transition times of a semi-Markov process, taking into account several forms for the baseline hazard function and the link function").

It is worth mentioning that our efforts mainly focused on models where the link function leads to an improper population hazard function. Note that it is plausible to assume that the population survival function under the existence of cure items becomes an improper survival function since it levels off at a value greater than zero, as time goes to infinity. Hence, under the above described competing cause scenario, it is not hard to see that the common hazard function h(t) of the time-to-event due to i-th competing cause plays the role of the baseline hazard function while the form of the latent discrete random variable determines the link function of our model (i.e. different discrete distributions generates specific families of link functions). For example, a population (improper) proportional hazard structure exists when the latent discrete random variable is Poisson distributed and covariates are modelled only through the mean of the Poisson distribution (this model is also known as promotion time cure rate model).

Specifically, the effectiveness of the proposed methodology and cure rate models was also illustrated with a real data set on cutaneous melanoma; at the same time, additional data sets from social and health sciences have been studied. Some of them are currently being analyzed by using the new models and some other have already been prepared for future use (this is also in accordance with the main training objective C, i.e. "the enhancement of applicant’s skills on dealing with real data sets which come from Health and Social Sciences and Industrial Processes").

As a result of our efforts, 4 papers ([2]-[5]) have been prepared with two of them being already published in refereed journals whereas the others have been submitted for publication. Furthermore, in the framework of the activity "Search for applications in several research areas ..." mentioned in Part B of the project (WPC), one more paper (discussion paper) was published with reliability and statistical quality control content ([2]); this work revealed that the discrete version of models such as Cox’s proportional hazard model, may offer an effective/alternative way of studying the theory of start-up demonstration tests. Moreover, additional work on some other generalizations of the above models and non/semi-parametric ways of estimating the parameters of the models is underway; the first numerical results are quite promising. A fully non-parametric way of studying the cure rate models is still one of the next goals; more attention will also be given to a destructive scheme (the destructive cure rate models) under the above new models. The models and the methodology proposed by this current project contribute to the study of lifetime data emerged from many real-life problems; let us consider, for example, how useful is (both for financial and social reasons): to adequately predict the time between two non-fatal cardiovascular events and the factors that affect this phenomenon; to adequately predict the time between two failures of a system (a communicating system, a computing network etc) and its relation with a set of external/internal factors; to adequately predict the time that a student needs to pass to the next level of knowledge or succeed in an educational process and how this result is affected by the socio-economic status of parents.

References:

1. Maller,R.A. and Zhou,X. (1996). Survival Analysis with Long-term Survivors. J.Wiley & Sons, NY.

2. Balakrishnan, N., Koutras, M.V. and Milienos, F.S. (2014). Start-up demonstration tests: models, methods and applications, with some unifications. Applied Stochastic Models in Business and Industry, 30, 373-413 (discussion paper).

3. Balakrishnan, N., Koutras, M.V., Milienos, F.S. and Pal, S. (2015). Piecewise linear approximations for cure rate models and associated inferential issues. Methodology and Computing in Applied Probability, to appear.

4. Koutras, M.V. and Milienos, F.S. (2016). A flexible family of transformation cure rate models. submitted for publication.

5. Balakrishnan, N., Koutras, M.V. and Milienos, F.S. (2016). A weighted Poisson distribution and its application to cure rate models. submitted for publication.

The most popular cure rate models can be defined through a competing cause scenario, where the random variables corresponding to the time-to-event due to each competing cause are considered as independent and identically distributed random variables while the total number of competing causes is an unobservable latent discrete random variable. It is worth mentioning that the models studied in the classical survival analysis may be viewed as special cases of this current family. The estimation of the cured proportion (the proportion of items which are not subject to the event of interest or equivalently, their number of competing causes is zero) and inference about the recurrence (failure) time of the susceptibles are the most important problems in this area.

After an extensive review of the relevant literature, our study focused on cases where the number of competing causes obeys a Bernoulli, Poisson, COM-Poison distribution, Weighted Poisson, or generalized discrete Linnik distribution. For example, in the last two cases, a new weighted Poisson distribution motivated by discrete asymmetric triangular distribution was introduced. This distribution offers great flexibility to work with over/underdispersed and/or zero inflated/deflated data, as well. Special cases include the well studied promotion time cure rate model and the COM-Poisson cure rate model (this work has already submitted for consideration for publication in referred journal). Moreover, a flexible family of cure rate models, appropriately relaxing some assumptions of the classical promotion time (proportional hazards) cure rate model and using the Linnik distribution was also studied. Special cases of this model are among others the promotion time, the geometric (proportional odds) and the negative binomial cure rate model. In addition, this current model generalizes specific families of transformation cure rate models and some well studied destructive cure rate models (this work has already submitted for consideration for publication in referred journal and it is in accordance with the main training objective A of the project, i.e. "to gain a thorough insight in the theory of modeling lifetime data")

Furthermore, a logistic regression model was used to model the probability of an individual to be cured. The common hazard function h(t) of the time-to-event due to i-th competing cause was studied under parametric, non-parametric and semi-parametric framework (this is in accordance with the main training objective B, i.e. "to study the hazard model in relation with and its connection to the transition times of a semi-Markov process and a set of relevant covariates"); specifically, a Weibull distribution or a piecewise linear function or Cox’s proportional hazard model was assumed for h(t) (this is also in accordance with the main objective A of the project, i.e. "To study the hazard models and their connection to the transition times of a semi-Markov process, taking into account several forms for the baseline hazard function and the link function").

It is worth mentioning that our efforts mainly focused on models where the link function leads to an improper population hazard function. Note that it is plausible to assume that the population survival function under the existence of cure items becomes an improper survival function since it levels off at a value greater than zero, as time goes to infinity. Hence, under the above described competing cause scenario, it is not hard to see that the common hazard function h(t) of the time-to-event due to i-th competing cause plays the role of the baseline hazard function while the form of the latent discrete random variable determines the link function of our model (i.e. different discrete distributions generates specific families of link functions). For example, a population (improper) proportional hazard structure exists when the latent discrete random variable is Poisson distributed and covariates are modelled only through the mean of the Poisson distribution (this model is also known as promotion time cure rate model).

Specifically, the effectiveness of the proposed methodology and cure rate models was also illustrated with a real data set on cutaneous melanoma; at the same time, additional data sets from social and health sciences have been studied. Some of them are currently being analyzed by using the new models and some other have already been prepared for future use (this is also in accordance with the main training objective C, i.e. "the enhancement of applicant’s skills on dealing with real data sets which come from Health and Social Sciences and Industrial Processes").

As a result of our efforts, 4 papers ([2]-[5]) have been prepared with two of them being already published in refereed journals whereas the others have been submitted for publication. Furthermore, in the framework of the activity "Search for applications in several research areas ..." mentioned in Part B of the project (WPC), one more paper (discussion paper) was published with reliability and statistical quality control content ([2]); this work revealed that the discrete version of models such as Cox’s proportional hazard model, may offer an effective/alternative way of studying the theory of start-up demonstration tests. Moreover, additional work on some other generalizations of the above models and non/semi-parametric ways of estimating the parameters of the models is underway; the first numerical results are quite promising. A fully non-parametric way of studying the cure rate models is still one of the next goals; more attention will also be given to a destructive scheme (the destructive cure rate models) under the above new models. The models and the methodology proposed by this current project contribute to the study of lifetime data emerged from many real-life problems; let us consider, for example, how useful is (both for financial and social reasons): to adequately predict the time between two non-fatal cardiovascular events and the factors that affect this phenomenon; to adequately predict the time between two failures of a system (a communicating system, a computing network etc) and its relation with a set of external/internal factors; to adequately predict the time that a student needs to pass to the next level of knowledge or succeed in an educational process and how this result is affected by the socio-economic status of parents.

References:

1. Maller,R.A. and Zhou,X. (1996). Survival Analysis with Long-term Survivors. J.Wiley & Sons, NY.

2. Balakrishnan, N., Koutras, M.V. and Milienos, F.S. (2014). Start-up demonstration tests: models, methods and applications, with some unifications. Applied Stochastic Models in Business and Industry, 30, 373-413 (discussion paper).

3. Balakrishnan, N., Koutras, M.V., Milienos, F.S. and Pal, S. (2015). Piecewise linear approximations for cure rate models and associated inferential issues. Methodology and Computing in Applied Probability, to appear.

4. Koutras, M.V. and Milienos, F.S. (2016). A flexible family of transformation cure rate models. submitted for publication.

5. Balakrishnan, N., Koutras, M.V. and Milienos, F.S. (2016). A weighted Poisson distribution and its application to cure rate models. submitted for publication.