Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
2.
Med Phys ; 44(6): 2207-2222, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28382718

RESUMEN

PURPOSE: The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. METHODS: In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive µ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics. A limited simulation validation of the method was performed. RESULTS: CORCBM and CORROC2 were applied to two datasets containing nine readers each contributing paired interpretations. CORCBM successfully fitted the data for all readers, whereas CORROC2 failed to fit a degenerate dataset. All fits were visually reasonable. All CORCBM fits were proper, whereas all CORROC2 fits were improper. CORCBM and CORROC2 were in agreement (a) in declaring only one of the nine readers as having significantly different performances in the two modalities; (b) in estimating higher correlations for diseased cases than for nondiseased ones; and (c) in finding that the intermodality correlation estimates for nondiseased cases were consistent between the two methods. All CORCBM fits yielded higher area under curve (AUC) than the CORROC2 fits, consistent with the fact that a proper ROC model like CORCBM is based on a likelihood-ratio-equivalent decision variable, and consequently yields higher performance than the binormal model-based CORROC2. The method gave satisfactory fits to four simulated datasets. CONCLUSIONS: CORCBM is a robust method for fitting paired ROC datasets, always yielding proper ROC curves, and able to fit degenerate datasets.


Asunto(s)
Algoritmos , Funciones de Verosimilitud , Curva ROC , Área Bajo la Curva , Humanos , Modelos Estadísticos , Programas Informáticos
4.
Radiology ; 282(1): 236-250, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27439324

RESUMEN

Purpose To conduct a multi-institutional, multireader study to compare the performance of digital tomosynthesis, dual-energy (DE) imaging, and conventional chest radiography for pulmonary nodule detection and management. Materials and Methods In this binational, institutional review board-approved, HIPAA-compliant prospective study, 158 subjects (43 subjects with normal findings) were enrolled at four institutions. Informed consent was obtained prior to enrollment. Subjects underwent chest computed tomography (CT) and imaging with conventional chest radiography (posteroanterior and lateral), DE imaging, and tomosynthesis with a flat-panel imaging device. Three experienced thoracic radiologists identified true locations of nodules (n = 516, 3-20-mm diameters) with CT and recommended case management by using Fleischner Society guidelines. Five other radiologists marked nodules and indicated case management by using images from conventional chest radiography, conventional chest radiography plus DE imaging, tomosynthesis, and tomosynthesis plus DE imaging. Sensitivity, specificity, and overall accuracy were measured by using the free-response receiver operating characteristic method and the receiver operating characteristic method for nodule detection and case management, respectively. Results were further analyzed according to nodule diameter categories (3-4 mm, >4 mm to 6 mm, >6 mm to 8 mm, and >8 mm to 20 mm). Results Maximum lesion localization fraction was higher for tomosynthesis than for conventional chest radiography in all nodule size categories (3.55-fold for all nodules, P < .001; 95% confidence interval [CI]: 2.96, 4.15). Case-level sensitivity was higher with tomosynthesis than with conventional chest radiography for all nodules (1.49-fold, P < .001; 95% CI: 1.25, 1.73). Case management decisions showed better overall accuracy with tomosynthesis than with conventional chest radiography, as given by the area under the receiver operating characteristic curve (1.23-fold, P < .001; 95% CI: 1.15, 1.32). There were no differences in any specificity measures. DE imaging did not significantly affect nodule detection when paired with either conventional chest radiography or tomosynthesis. Conclusion Tomosynthesis outperformed conventional chest radiography for lung nodule detection and determination of case management; DE imaging did not show significant differences over conventional chest radiography or tomosynthesis alone. These findings indicate performance likely achievable with a range of reader expertise. © RSNA, 2016 Online supplemental material is available for this article.


Asunto(s)
Nódulos Pulmonares Múltiples/diagnóstico por imagen , Nódulos Pulmonares Múltiples/terapia , Intensificación de Imagen Radiográfica/métodos , Imagen Radiográfica por Emisión de Doble Fotón , Radiografía Torácica , Adulto , Anciano , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Persona de Mediana Edad , Sensibilidad y Especificidad , Suecia , Tomografía Computarizada por Rayos X , Estados Unidos , Pantallas Intensificadoras de Rayos X
5.
Med Phys ; 43(5): 2548, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-27147365

RESUMEN

PURPOSE: The free-response receiver operating characteristic (FROC) method is being increasingly used to evaluate observer performance in search tasks. Data analysis requires definition of a figure of merit (FOM) quantifying performance. While a number of FOMs have been proposed, the recommended one, namely, the weighted alternative FROC (wAFROC) FOM, is not well understood. The aim of this work is to clarify the meaning of this FOM by relating it to the empirical area under a proposed wAFROC curve. METHODS: The weighted wAFROC FOM is defined in terms of a quasi-Wilcoxon statistic that involves weights, coding the clinical importance, assigned to each lesion. A new wAFROC curve is proposed, the y-axis of which incorporates the weights, giving more credit for marking clinically important lesions, while the x-axis is identical to that of the AFROC curve. An expression is derived relating the area under the empirical wAFROC curve to the wAFROC FOM. Examples are presented with small numbers of cases showing how AFROC and wAFROC curves are affected by correct and incorrect decisions and how the corresponding FOMs credit or penalize these decisions. The wAFROC, AFROC, and inferred ROC FOMs were applied to three clinical data sets involving multiple reader FROC interpretations in different modalities. RESULTS: It is shown analytically that the area under the empirical wAFROC curve equals the wAFROC FOM. This theorem is the FROC analog of a well-known theorem developed in 1975 for ROC analysis, which gave meaning to a Wilcoxon statistic based ROC FOM. A similar equivalence applies between the area under the empirical AFROC curve and the AFROC FOM. The examples show explicitly that the wAFROC FOM gives equal importance to all diseased cases, regardless of the number of lesions, a desirable statistical property not shared by the AFROC FOM. Applications to the clinical data sets show that the wAFROC FOM yields results comparable to that using the AFROC FOM. CONCLUSIONS: The equivalence theorem gives meaning to the weighted AFROC FOM, namely, it is identical to the empirical area under weighted AFROC curve.


Asunto(s)
Modelos Estadísticos , Curva ROC , Algoritmos , Área Bajo la Curva , Mama/diagnóstico por imagen , Enfermedades de la Mama/diagnóstico por imagen , Calcinosis/diagnóstico por imagen , Simulación por Computador , Interpretación Estadística de Datos , Conjuntos de Datos como Asunto , Humanos , Mamografía/instrumentación , Mamografía/métodos , Modelos Anatómicos , Fantasmas de Imagen , Tomografía de Emisión de Positrones/instrumentación , Tomografía de Emisión de Positrones/métodos , Programas Informáticos , Tomografía Computarizada por Rayos X/instrumentación , Tomografía Computarizada por Rayos X/métodos
6.
Phys Med ; 32(4): 568-74, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-27061872

RESUMEN

PURPOSE: To investigate the relationship between image quality measurements and the clinical performance of digital mammographic systems. METHODS: Mammograms containing subtle malignant non-calcification lesions and simulated malignant calcification clusters were adapted to appear as if acquired by four types of detector. Observers searched for suspicious lesions and gave these a malignancy score. Analysis was undertaken using jackknife alternative free-response receiver operating characteristics weighted figure of merit (FoM). Images of a CDMAM contrast-detail phantom were adapted to appear as if acquired using the same four detectors as the clinical images. The resultant threshold gold thicknesses were compared to the FoMs using a linear regression model and an F-test was used to find if the gradient of the relationship was significantly non-zero. RESULTS: The detectors with the best image quality measurement also had the highest FoM values. The gradient of the inverse relationship between FoMs and threshold gold thickness for the 0.25mm diameter disk was significantly different from zero for calcification clusters (p=0.027), but not for non-calcification lesions (p=0.11). Systems performing just above the minimum image quality level set in the European Guidelines for Quality Assurance in Breast Cancer Screening and Diagnosis resulted in reduced cancer detection rates compared to systems performing at the achievable level. CONCLUSIONS: The clinical effectiveness of mammography for the task of detecting calcification clusters was found to be linked to image quality assessment using the CDMAM phantom. The European Guidelines should be reviewed as the current minimum image quality standards may be too low.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Mamografía/métodos , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Calcinosis/diagnóstico por imagen , Calcinosis/metabolismo , Calcinosis/patología , Femenino , Guías como Asunto , Humanos , Mamografía/normas , Intensificación de Imagen Radiográfica/métodos
7.
Eur Radiol ; 26(3): 874-83, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26105023

RESUMEN

OBJECTIVE: To compare the performance of different types of detectors in breast cancer detection. METHODS: A mammography image set containing subtle malignant non-calcification lesions, biopsy-proven benign lesions, simulated malignant calcification clusters and normals was acquired using amorphous-selenium (a-Se) detectors. The images were adapted to simulate four types of detectors at the same radiation dose: digital radiography (DR) detectors with a-Se and caesium iodide (CsI) convertors, and computed radiography (CR) detectors with a powder phosphor (PIP) and a needle phosphor (NIP). Seven observers marked suspicious and benign lesions. Analysis was undertaken using jackknife alternative free-response receiver operating characteristics weighted figure of merit (FoM). The cancer detection fraction (CDF) was estimated for a representative image set from screening. RESULTS: No significant differences in the FoMs between the DR detectors were measured. For calcification clusters and non-calcification lesions, both CR detectors' FoMs were significantly lower than for DR detectors. The calcification cluster's FoM for CR NIP was significantly better than for CR PIP. The estimated CDFs with CR PIP and CR NIP detectors were up to 15% and 22% lower, respectively, than for DR detectors. CONCLUSION: Cancer detection is affected by detector type, and the use of CR in mammography should be reconsidered. KEY POINTS: The type of mammography detector can affect the cancer detection rates. CR detectors performed worse than DR detectors in mammography. Needle phosphor CR performed better than powder phosphor CR. Calcification clusters detection is more sensitive to detector type than other cancers.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Calcinosis/diagnóstico por imagen , Mamografía/instrumentación , Anciano , Detección Precoz del Cáncer/instrumentación , Detección Precoz del Cáncer/métodos , Femenino , Humanos , Mamografía/métodos , Tamizaje Masivo/instrumentación , Tamizaje Masivo/métodos , Persona de Mediana Edad , Agujas , Variaciones Dependientes del Observador , Curva ROC , Intensificación de Imagen Radiográfica/métodos
8.
AJR Am J Roentgenol ; 203(2): 387-93, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25055275

RESUMEN

OBJECTIVE. The objective of our study was to investigate the effect of image processing on the detection of cancers in digital mammography images. MATERIALS AND METHODS. Two hundred seventy pairs of breast images (both breasts, one view) were collected from eight systems using Hologic amorphous selenium detectors: 80 image pairs showed breasts containing subtle malignant masses; 30 image pairs, biopsy-proven benign lesions; 80 image pairs, simulated calcification clusters; and 80 image pairs, no cancer (normal). The 270 image pairs were processed with three types of image processing: standard (full enhancement), low contrast (intermediate enhancement), and pseudo-film-screen (no enhancement). Seven experienced observers inspected the images, locating and rating regions they suspected to be cancer for likelihood of malignancy. The results were analyzed using a jackknife-alternative free-response receiver operating characteristic (JAFROC) analysis. RESULTS. The detection of calcification clusters was significantly affected by the type of image processing: The JAFROC figure of merit (FOM) decreased from 0.65 with standard image processing to 0.63 with low-contrast image processing (p = 0.04) and from 0.65 with standard image processing to 0.61 with film-screen image processing (p = 0.0005). The detection of noncalcification cancers was not significantly different among the image-processing types investigated (p > 0.40). CONCLUSION. These results suggest that image processing has a significant impact on the detection of calcification clusters in digital mammography. For the three image-processing versions and the system investigated, standard image processing was optimal for the detection of calcification clusters. The effect on cancer detection should be considered when selecting the type of image processing in the future.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Calcinosis/diagnóstico por imagen , Mamografía/métodos , Intensificación de Imagen Radiográfica/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Anciano , Biopsia , Femenino , Humanos , Persona de Mediana Edad , Reino Unido
9.
Acad Radiol ; 21(4): 538-45, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24594424

RESUMEN

RATIONALE AND OBJECTIVES: The purpose of this study was to compare lesion-detection performance when interpreting computed tomography (CT) images that are acquired for attenuation correction when performing single photon emission computed tomography/computed tomography (SPECT/CT) myocardial perfusion studies. In the United Kingdom, there is a requirement that these images be interpreted; thus, it is necessary to understand observer performance on these images. MATERIALS AND METHODS: An anthropomorphic chest phantom with inserted spherical lesions of different sizes and contrasts was scanned on five different SPECT/CT systems using site-specific CT protocols for SPECT/CT myocardial perfusion imaging. Twenty-one observers (0-4 years of CT experience) searched 26 image slices (17 abnormal, containing 1-3 lesions, and 9 normal, containing no lesions) for each CT acquisition. The observers marked and rated perceived lesions under the free-response paradigm. Four analyses were conducted using jackknife alternative free-response receiver operating characteristic (JAFROC) analysis: (1) 20-pixel acceptance radius (AR) with all 21 readers, abbreviated to 20/ALL analysis, (2) 40-pixel AR with 21 readers (40/ALL), (3) 20-pixel AR with 14 readers experienced in CT (20/EXP), and (4) 20-pixel AR with 7 readers with no CT experience (20/NOT). The significance level of the test was set so as to conservatively control the overall probability of a type I error to <0.05. RESULTS: The mean JAFROC figure of merit (FOM) for the five CT acquisitions for the 20/ALL study were 0.602, 0.639, 0.372, 0.475, and 0.719 with a significant difference in lesion-detection performance evident between all individual treatment pairs (P < .0001) with the exception of the 1-2 pairing, which was not significant (these differed only in milliamp seconds). System 5, which had the highest performance, had the smallest slice thickness and the largest matrix size. For the other analyses, the system orderings remained unchanged, and the significance of FOM difference findings remained identical to those for 20/ALL, with one exception: for 20/EXP analysis the 1-2 difference became significant with the higher milliamp seconds superior. Improved detection performance was associated with a smaller slice thickness, increased matrix size, and, to a lesser extent, increased tube charge. CONCLUSIONS: Protocol variations for CT-based attenuation correction (AC) in SPECT/CT imaging have a measurable impact on lesion-detection performance. The results imply that z-axis resolution and matrix size had the greatest impact on lesion detection, with a weaker but detectable dependence on the product of milliamp and seconds.


Asunto(s)
Algoritmos , Hallazgos Incidentales , Neoplasias Pulmonares/diagnóstico por imagen , Fantasmas de Imagen , Intensificación de Imagen Radiográfica/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , Artefactos , Competencia Clínica , Humanos , Variaciones Dependientes del Observador , Radiografía Torácica/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Tomografía Computarizada de Emisión de Fotón Único/métodos
10.
Acad Radiol ; 20(7): 915-9, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23583665

RESUMEN

In the receiver operating characteristic paradigm the observer assigns a single rating to each image and the location of the perceived abnormality, if any, is ignored. In the free-response receiver operating characteristic paradigm the observer is free to mark and rate as many suspicious regions as are considered clinically reportable. Credit for a correct localization is given only if a mark is sufficiently close to an actual lesion; otherwise, the observer's mark is scored as a location-level false positive. Until fairly recently there existed no accepted method for analyzing the resulting relatively unstructured data containing random numbers of mark-rating pairs per image. This report reviews the history of work in this field, which has now spanned more than five decades. It introduces terminology used to describe the paradigm, proposed measures of performance (figures of merit), ways of visualizing the data (operating characteristics), and software for analyzing free-response receiver operating characteristic studies.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Interpretación Estadística de Datos , Mamografía/estadística & datos numéricos , Curva ROC , Femenino , Humanos , Mamografía/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Reproducibilidad de los Resultados , Programas Informáticos
11.
Radiology ; 268(1): 46-53, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23481165

RESUMEN

PURPOSE: To establish the extent to which test set reading can represent actual clinical reporting in screening mammography. MATERIALS AND METHODS: Institutional ethics approval was granted, and informed consent was obtained from each participating screen reader. The need for informed consent with respect to the use of patient materials was waived. Two hundred mammographic examinations were selected from examinations reported by 10 individual expert screen readers, resulting in 10 reader-specific test sets. Data generated from actual clinical reports were compared with three test set conditions: clinical test set reading with prior images, laboratory test set reading with prior images, and laboratory test set reading without prior images. A further set of five expert screen readers was asked to interpret a common set of images in two identical test set conditions to establish a baseline for intraobserver variability. Confidence scores (from 1 to 4) were assigned to the respective decisions made by readers. Region-of-interest (ROI) figures of merit (FOMs) and side-specific sensitivity and specificity were described for the actual clinical reporting of each reader-specific test set and were compared with those for the three test set conditions. Agreement between pairs of readings was performed by using the Kendall coefficient of concordance. RESULTS: Moderate or acceptable levels of agreement were evident (W = 0.69-0.73, P < .01) when describing group performance between actual clinical reporting and test set conditions that were reasonably close to the established baseline (W = 0.77, P < .01) and were lowest when prior images were excluded. Higher median values for ROI FOMs were demonstrated for the test set conditions than for the actual clinical reporting values; this was possibly linked to changes in sensitivity. CONCLUSION: Reasonable levels of agreement between actual clinical reporting and test set conditions can be achieved, although inflated sensitivity may be evident with test set conditions.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Mamografía , Competencia Profesional , Toma de Decisiones , Diagnóstico Diferencial , Femenino , Humanos , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
12.
Acad Radiol ; 19(12): 1474-83, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23040503

RESUMEN

RATIONALE AND OBJECTIVES: Studies of medical image interpretation have focused on either assessing radiologists' performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing their eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures of merit (FOMs) analogous to the area under the receiver operating characteristic (ROC) curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between FOMs derived from free-response ROC (FROC) and ET data. METHODS: Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800 ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap. RESULTS: The AFROC mark-ratings' FOM was largest at 0.734 (CI 0.65-0.81) followed by the dwell at 0.460 (0.34-0.59) and then by the approach-rate FOM 0.336 (0.25-0.46). The differences between the FROC mark-ratings' FOM and the perceptual FOMs were significant (P < .05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63-0.88), dwell vs. approach-rate 0.703 (0.60-0.79) and rating vs. approach-rate 0.606 (0.53-0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (P = .008). CONCLUSIONS: Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.


Asunto(s)
Medidas del Movimiento Ocular , Mamografía , Curva ROC , Interpretación de Imagen Radiográfica Asistida por Computador , Femenino , Humanos
13.
Med Phys ; 39(6): 3202-13, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22755704

RESUMEN

PURPOSE: This study aims to investigate if microcalcification detection varies significantly when mammographic images are acquired using different image qualities, including: different detectors, dose levels, and different image processing algorithms. An additional aim was to determine how the standard European method of measuring image quality using threshold gold thickness measured with a CDMAM phantom and the associated limits in current EU guidelines relate to calcification detection. METHODS: One hundred and sixty two normal breast images were acquired on an amorphous selenium direct digital (DR) system. Microcalcification clusters extracted from magnified images of slices of mastectomies were electronically inserted into half of the images. The calcification clusters had a subtle appearance. All images were adjusted using a validated mathematical method to simulate the appearance of images from a computed radiography (CR) imaging system at the same dose, from both systems at half this dose, and from the DR system at quarter this dose. The original 162 images were processed with both Hologic and Agfa (Musica-2) image processing. All other image qualities were processed with Agfa (Musica-2) image processing only. Seven experienced observers marked and rated any identified suspicious regions. Free response operating characteristic (FROC) and ROC analyses were performed on the data. The lesion sensitivity at a nonlesion localization fraction (NLF) of 0.1 was also calculated. Images of the CDMAM mammographic test phantom were acquired using the automatic setting on the DR system. These images were modified to the additional image qualities used in the observer study. The images were analyzed using automated software. In order to assess the relationship between threshold gold thickness and calcification detection a power law was fitted to the data. RESULTS: There was a significant reduction in calcification detection using CR compared with DR: the alternative FROC (AFROC) area decreased from 0.84 to 0.63 and the ROC area decreased from 0.91 to 0.79 (p < 0.0001). This corresponded to a 30% drop in lesion sensitivity at a NLF equal to 0.1. Detection was also sensitive to the dose used. There was no significant difference in detection between the two image processing algorithms used (p > 0.05). It was additionally found that lower threshold gold thickness from CDMAM analysis implied better cluster detection. The measured threshold gold thickness passed the acceptable limit set in the EU standards for all image qualities except half dose CR. However, calcification detection varied significantly between image qualities. This suggests that the current EU guidelines may need revising. CONCLUSIONS: Microcalcification detection was found to be sensitive to detector and dose used. Standard measurements of image quality were a good predictor of microcalcification cluster detection.


Asunto(s)
Calcinosis/diagnóstico por imagen , Mamografía/métodos , Intensificación de Imagen Radiográfica/métodos , Neoplasias de la Mama/complicaciones , Neoplasias de la Mama/diagnóstico por imagen , Calcinosis/complicaciones , Humanos , Procesamiento de Imagen Asistido por Computador , Fantasmas de Imagen , Control de Calidad , Curva ROC , Dosis de Radiación
14.
Phys Med Biol ; 57(10): 2873-904, 2012 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-22516804

RESUMEN

Laboratory receiver operating characteristic (ROC) studies, that are often used to evaluate medical imaging systems, differ from 'live' clinical interpretations in several respects which could compromise their clinical relevance. The aim was to develop methodology for quantifying the clinical relevance of a laboratory ROC study. A simulator was developed to generate ROC ratings data and binary clinical interpretations classified as correct or incorrect for a common set of images interpreted under clinical and laboratory conditions. The area under the trapezoidal ROC curve (AUC) was used as the laboratory figure-of-merit and the fraction of correct clinical decisions as the clinical figure-of-merit. Conventional agreement measures (Pearson, Spearman, Kendall and kappa) between the bootstrap-induced fluctuations of the two figures of merit were estimated. A jackknife pseudovalue transformation applied to the figures of merit was also investigated as a way to capture agreement existing at the individual image level that could be lost at the figure-of-merit level. It is shown that the pseudovalues define a relevance-ROC curve. The area under this curve (rAUC) measures the ability of the laboratory figure-of-merit-based pseudovalues to correctly classify incorrect versus correct clinical interpretations. Therefore, rAUC is a measure of the clinical relevance of an ROC study. The conventional measures and rAUC were compared under varying simulator conditions. It was found that design details of the ROC study, namely the number of bins, the difficulty level of the images, the ratio of disease-present to disease-absent images and the unavoidable difference between laboratory and clinical performance levels, can lead to serious underestimation of the agreement as indicated by conventional agreement measures, even for perfectly correlated data, while rAUC showed high agreement and was relatively immune to these details. At the same time rAUC was sensitive to factors such as intrinsic correlation between the laboratory and clinical decision variables and differences in reporting thresholds that are expected to influence agreement both at the individual image level and at the figure-of-merit level. Suggestions are made for how to conduct relevance-ROC studies aimed at assessing agreement between laboratory and clinical interpretations. The method could be used to evaluate the clinical relevance of alternative scalar figures of merit, such as the sensitivity at a predifined specificity.


Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Modelos Teóricos , Área Bajo la Curva , Humanos , Laboratorios , Variaciones Dependientes del Observador , Curva ROC
15.
Semin Nucl Med ; 41(6): 401-18, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21978444

RESUMEN

A common task in medical imaging is assessing whether a new imaging system, or a variant of an existing one, is an improvement over an existing imaging technology. Imaging systems are generally quite complex, consisting of several components-for example, image acquisition hardware, image processing and display hardware and software, and image interpretation by radiologists- each of which can affect performance. Although it may appear odd to include the radiologist as a "component" of the imaging chain, because the radiologist's decision determines subsequent patient care, the effect of the human interpretation has to be included. Physical measurements such as modulation transfer function, signal-to-noise ratio, are useful for characterizing the nonhuman parts of the imaging chain under idealized and often unrealistic conditions, such as uniform background phantoms and target objects with sharp edges. Measuring the performance of the entire imaging chain, including the radiologist, and using real clinical images requires different methods that fall under the rubric of observer performance methods or "ROC" analysis, that involve collecting rating data on images. The purpose of this work is to review recent developments in this field, particularly with respect to the free-response method, where location information is also collected.


Asunto(s)
Diagnóstico por Computador/métodos , Diagnóstico por Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Curva ROC , Área Bajo la Curva , Reacciones Falso Negativas , Reacciones Falso Positivas , Humanos , Variaciones Dependientes del Observador , Fantasmas de Imagen , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Análisis y Desempeño de Tareas
16.
Nucl Instrum Methods Phys Res A ; 648 Supplement 1: S297-S301, 2011 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-21804679

RESUMEN

A frequent problem in imaging is assessing whether a new imaging system is an improvement over an existing standard. Observer performance methods, in particular the receiver operating characteristic (ROC) paradigm, are widely used in this context. In ROC analysis lesion location information is not used and consequently scoring ambiguities can arise in tasks, such as nodule detection, involving finding localized lesions. This paper reviews progress in the free-response ROC (FROC) paradigm in which the observer marks and rates suspicious regions and the location information is used to determine whether lesions were correctly localized. Reviewed are FROC data analysis, a search-model for simulating FROC data, predictions of the model and a method for estimating the parameters. The search model parameters are physically meaningful quantities that can guide system optimization.

18.
Acad Radiol ; 17(5): 628-38, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20380980

RESUMEN

RATIONALE AND OBJECTIVES: Sample-size estimation is an important consideration when planning a receiver operating characteristic (ROC) study. The aim of this work was to assess the prediction accuracy of a sample-size estimation method using the Monte Carlo simulation method. MATERIALS AND METHODS: Two ROC ratings simulators characterized by low reader and high case variabilities (LH) and high reader and low case variabilities (HL) were used to generate pilot data sets in two modalities. Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the ratings yielded estimates of the modality-reader, modality-case, and error variances. These were input to the Hillis-Berbaum (HB) sample-size estimation method, which predicted the number of cases needed to achieve 80% power for 10 readers and an effect size of 0.06 in the pivotal study. Predictions that generalized to readers and cases (random-all), to cases only (random-cases), and to readers only (random-readers) were generated. A prediction-accuracy index defined as the probability that any single prediction yields true power in the 75%-90% range was used to assess the HB method. RESULTS: For random-case generalization, the HB-method prediction-accuracy was reasonable, approximately 50% for five readers and 100 cases in the pilot study. Prediction-accuracy was generally higher under LH conditions than under HL conditions. Under ideal conditions (many readers in the pilot study) the DBM-MRMC-based HB method overestimated the number of cases. The overestimates could be explained by the larger modality-reader variance estimates when reader variability was large (HL). The largest benefit of increasing the number of readers in the pilot study was realized for LH, where 15 readers were enough to yield prediction accuracy >50% under all generalization conditions, but the benefit was lesser for HL where prediction accuracy was approximately 36% for 15 readers under random-all and random-reader conditions. CONCLUSION: The HB method tends to overestimate the number of cases. Random-case generalization had reasonable prediction accuracy. Provided about 15 readers were used in the pilot study the method performed reasonably under all conditions for LH. When reader variability was large, the prediction-accuracy for random-all and random-reader generalizations was compromised. Study designers may wish to compare the HB predictions to those of other methods and to sample-sizes used in previous similar studies.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Curva ROC , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Tamaño de la Muestra , Sensibilidad y Especificidad
19.
AJR Am J Roentgenol ; 194(2): 469-74, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20093611

RESUMEN

OBJECTIVE: Orthopedic injury and intracranial hemorrhage are commonly encountered in emergency radiology, and accurate and timely diagnosis is important. The purpose of this study was to determine whether the diagnostic accuracy of handheld computing devices is comparable to that of monitors that might be used in emergency teleconsultation. SUBJECTS AND METHODS: Two handheld devices, a Dell Axim personal digital assistant (PDA) and an Apple iPod Touch device, were studied. The diagnostic efficacy of each device was tested against that of secondary-class monitors (primary class being clinical workstation display) for each of two image types-posteroanterior wrist radiographs and slices from CT of the brain-yielding four separate observer performance studies. Participants read a bank of 30 wrist or brain images searching for a specific abnormality (distal radial fracture, fresh intracranial bleed) and rated their confidence in their decisions. A total of 168 readings by examining radiologists of the American Board of Radiology were gathered, and the results were subjected to receiver operating characteristics analysis. RESULTS: In the PDA brain CT study, the scores of PDA readings were significantly higher than those of monitor readings for all observers (p < or = 0.01) and for radiologists who were not neuroradiology specialists (p < or = 0.05). No statistically significant differences between handheld device and monitor findings were found for the PDA wrist images or in the iPod Touch device studies, although some comparisons approached significance. CONCLUSION: Handheld devices show promise in the field of emergency teleconsultation for detection of basic orthopedic injuries and intracranial hemorrhage. Further investigation is warranted.


Asunto(s)
Lesiones Encefálicas/diagnóstico por imagen , Computadoras de Mano , Presentación de Datos , Urgencias Médicas , Radiología/instrumentación , Interfaz Usuario-Computador , Traumatismos de la Muñeca/diagnóstico por imagen , Humanos , Curva ROC , Programas Informáticos , Tomografía Computarizada por Rayos X
20.
AJR Am J Roentgenol ; 192(6): W271-4, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19457787

RESUMEN

OBJECTIVE: In this experimental study we assessed the diagnostic performance of digital linear slit scanning radiography compared with computed radiography (CR) for the detection of urinary calculi in an anthropomorphic phantom imitating patients weighing approximately 58-88 kg. CONCLUSION: Compared with CR, linear slit scanning radiography is superior for the detection of urinary stones and may be used for pretreatment localization and follow-up at a lower patient exposure.


Asunto(s)
Carga Corporal (Radioterapia) , Protección Radiológica/métodos , Intensificación de Imagen Radiográfica/métodos , Tomografía Computarizada por Rayos X/métodos , Cálculos Urinarios/diagnóstico por imagen , Humanos , Fantasmas de Imagen , Protección Radiológica/instrumentación , Intensificación de Imagen Radiográfica/instrumentación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA