Exploring and validating statistical reliability in forensic conservation genetics.

Published online
24 Jan 2019
Content type

Buschbom, J.
Contact email(s)
buschbom@posteo.de & fg@thuenen.de

Publication language


Safeguarding biological diversity from evolutionary lineages to ecosystems is a major undertaking for humanity. Forensic conservation genetics for the protection of wild flora and fauna aims to provide statistical inference tools and services for the enforcement of local to global conservation and management strategies. This paper reviews statistical criteria that provide insight into and assess the reliability of conclusions drawn from statistical inference. The translation of these fundamental criteria into practice is illustrated with applications from evolutionary and forensic genetics, specifically focusing on the inference of geographic origin using population assignment approaches. The key concept that end-users of statistical results (e. g., conservation activists, certification participants, courts and juries evaluating a proposed expert conclusion) need to understand and take into account, is the statistical reliability of an inferred estimate. Its reliability sets a result into an appropriate context, defines the scope (space of applicability) for which it is valid, and thus provides perspective. In statistical terms, this context corresponds to the probability density surfaces of sample, parameter and result spaces. As measures of dispersion (e. g., size, range, variance), the reviewed ancillary statistical criteria convey the structure and characteristics of these surfaces. They make accessible the more insight into reliability, the more their scope is as continuous and wide as possible. Validity (convergence and consistency, measured by precision and accuracy, supplemented by robustness and congruence), efficiency ("speed") and sufficiency ("power"), as well as, model specification (definition, selection and assessment) and hypothesis falsification quantify how confident one can be in the parameter values, support measures, predictive data and test decisions returned by statistical methods. In this way, statistical reliability forms the subject, process and goal of the statistical validation of reference datasets and inference approaches. This review introduces each ancillary criterion, and discusses general strategies for practical application, as well as, available implementations towards genetic population assignment. It points out the fundamental importance of genome-wide sequence information for reference samples from across the distribution range in non-model organisms. Such reference datasets provide the information-rich genomic data that is required for the development of sufficient and versatile statistical inference approaches. Together they form the prerequisites for arriving at accurate, decisive and reliable tools for conservation, management and law enforcement. The validation of their quality characteristics lays the basis for a widespread practical acceptance and use of such tools also in non-model organisms.

Key words