# ASIARESIST Résumé de rapport

Project ID:
ICA4-CT-2001-10028

Financé au titre de:
FP5-INCO 2

Pays:
Italy

## Statistical analysis for bacterial resistance data

At the end of the project, the data contained into the DB are summarised in the following tables. There are 826 strains in all of which 623 are confirmed. 166 come from Malaysia, 360 from Thailand, and 300 from Vietnam. Of the confirmed strains, 558 are Chloramphenicol resistant, and 65 are Chloramphenicol sensible.

A first data analysis through the Project Central Data Base was performed during the writing of the paper entitled "Intra- and inter-laboratory performance of antibiotic disk diffusion susceptibility testing of bacterial control strains of relevance for monitoring aquaculture environments".

In order to single out sets of raw data in which some error occur, the data analysis was performed, and the reply was yes:

- we have calculated for each group of raw data on a single strain and antibiotic average and standard deviation;

- we have applied the Gaussian model with these parameters;

- we have calculated the Euclidean distance of the real data from the estimations obtained with this model;

- when this distance was higher than an optimized binarizing threshold (Otsu method) the raw data can be considered as non normally distributed (that is with sum type of errors inside);

- for normally distributed sets we calculate the confidence interval for the average of the population with the t-student test;

- for non normally distributed sets we calculated the confidence interval for the average of the population with Chebyshev theorem;

After that, we calculated for each couple strain-antibiotic the group of statistically similar set:

- when all involved sets are normal we can use the one factor -fixed effects ANOVA test combined with the multiple range test of Duncan;

- when some sets (even only one) are not normal we can use Kruskal-Wallis test for independent samples;

Different types of data analysis were performed by each partner using the search engine and our statistical elaboration, these results are reported in their respective final reports.

A first data analysis through the Project Central Data Base was performed during the writing of the paper entitled "Intra- and inter-laboratory performance of antibiotic disk diffusion susceptibility testing of bacterial control strains of relevance for monitoring aquaculture environments".

In order to single out sets of raw data in which some error occur, the data analysis was performed, and the reply was yes:

- we have calculated for each group of raw data on a single strain and antibiotic average and standard deviation;

- we have applied the Gaussian model with these parameters;

- we have calculated the Euclidean distance of the real data from the estimations obtained with this model;

- when this distance was higher than an optimized binarizing threshold (Otsu method) the raw data can be considered as non normally distributed (that is with sum type of errors inside);

- for normally distributed sets we calculate the confidence interval for the average of the population with the t-student test;

- for non normally distributed sets we calculated the confidence interval for the average of the population with Chebyshev theorem;

After that, we calculated for each couple strain-antibiotic the group of statistically similar set:

- when all involved sets are normal we can use the one factor -fixed effects ANOVA test combined with the multiple range test of Duncan;

- when some sets (even only one) are not normal we can use Kruskal-Wallis test for independent samples;

Different types of data analysis were performed by each partner using the search engine and our statistical elaboration, these results are reported in their respective final reports.