RESUMO
Methods and implementations of DNA-based identification are well established in several forensic contexts. However, assessing the statistical power of these methods has been largely overlooked, except in the simplest cases. In this paper we outline general methods for such power evaluation, and apply them to a large set of family reunification cases, where the objective is to decide whether a person of interest (POI) is identical to the missing person (MP) in a family, based on the DNA profile of the POI and available family members. As such, this application closely resembles database searching and disaster victim identification (DVI). If parents or children of the MP are available, they will typically provide sufficient statistical evidence to settle the case. However, if one must resort to more distant relatives, it is not a priori obvious that a reliable conclusion is likely to be reached. In these cases power evaluation can be highly valuable, for instance in the recruitment of additional family members. To assess the power in an identification case, we advocate the combined use of two statistics: the Probability of Exclusion, and the Probability of Exceedance. The former is the probability that the genotypes of a random, unrelated person are incompatible with the available family data. If this is close to 1, it is likely that a conclusion will be achieved regarding general relatedness, but not necessarily the specific relationship. To evaluate the ability to recognize a true match, we use simulations to estimate exceedance probabilities, i.e. the probability that the likelihood ratio will exceed a given threshold, assuming that the POI is indeed the MP. All simulations are done conditionally on available family data. Such conditional simulations have a long history in medical linkage analysis, but to our knowledge this is the first systematic forensic genetics application. Also, for forensic markers mutations cannot be ignored and therefore current models and implementations must be extended. All the tools are freely available in Familias (http://www.familias.no) empowered by the R library paramlink. The above approach is applied to a large and important data set: 'The missing grandchildren of Argentina'. We evaluate the power of 196 families from the DNA reference databank (Banco Nacional de Datos Genéticos, http://www.bndg.gob.ar. As a result we show that 58 of the families have poor statistical power and require additional genetic data to enable a positive identification.
Assuntos
Impressões Digitais de DNA , Bases de Dados de Ácidos Nucleicos , Funções Verossimilhança , Linhagem , Algoritmos , Argentina , Humanos , Probabilidade , SoftwareRESUMO
The statistical interpretation of the forensic genetic evidence requires the use of allelic frequency estimates in the reference population for the studied markers. Differences in the genetic make up of the populations can be reflected in statistically different allelic frequency distributions. One can easily figure out that collecting such information for any given population is not always possible. Therefore, alternative approaches are needed in these cases in order to compensate for the lack of information. A number of statistics have been proposed to control for population stratification in paternity testing and forensic casework, Fst correction being the only one recommended by the forensic community. In this study we aimed to evaluate the performance of Fst to correct for population stratification in forensics. By way of simulations, we first tested the dependence of Fst on the relative sizes of the sub-populations, and second, we measured the effect of the Fst corrections on the Paternity Index (PI) values compared to the ones obtained when using the local reference database. The results provide clear-cut evidence that (i) Fst values are strongly dependent on the sampling scheme, and therefore, for most situations it would be almost impossible to estimate real values of Fst; and (ii) Fst corrections might unfairly correct PI values for stratification, suggesting the use of local databases whenever possible to estimate the frequencies of genetic profiles and PI values.