Species Sensitivity Distribution Estimation from Uncertain (QSAR-based) Effects Data
Tom Aldenberg and Emiel Rorije
In environmental risk assessment, Species Sensitivity Distributions (SSDs) can be applied to estimate a PNEC (Predicted No-Effect Concentration) for a chemical substance, when sufficient data on species toxicities are available. The European Chemicals Agency (ECHA) recommendation is 10 biological species. The question addressed in this paper, is whether QSAR-predicted toxicities can be included in SSDbased PNEC estimates, and whether any modifications need to be made to account for the uncertainty in the QSAR-model estimates. This problem is addressed from a probabilistic modelling point of view. From classical analysis of variation (ANOVA), we review how the error-in-data SSD problem is similar to separation into between-group and within-group variance. ECHA guidance suggests averaging similar endpoint data for a species, which is consistent with group means, as in ANOVA. This exercise reveals that error-indata reduces the estimation of the between species variation, i.e. the SSD variance, rather than enlarging it. A Bayesian analysis permits the assessment of the uncertainty of the SSD mean and variance parameters for given values of mean species toxicity error. This requires a hierarchical model. Prototyping this model for an artificial five-species data set seems to suggest that the influence of data error is relatively minor. Moreover, when neglecting this data error, a slightly conservative estimate of the SSD results. Hence, we suggest including (model-predicted) data as model point estimates and handling the SSD as usual. The Bayesian simulation of the error-in-data SSD leads to predictive distributions, being an average of posterior spaghetti plot densities or cumulative distributions. We derive new predictive extrapolation constants with several improvements over previous median uncertainty log10HC5 estimates, in that they are easily calculable from spreadsheet Student-t functions and based on a more realistic uniform prior for the SSD standard deviation. Other advantages are that they are single-number extrapolation constants and they are more sensitive to small sample size.