RESUMO
AIMS: Cancers of unknown primary sites account for 3%-5% of all malignant neoplasms. Current diagnostic workflows based on immunohistochemistry and imaging tests have low accuracy and are highly subjective. We aim to develop and validate a gene-expression classifier to identify potential primary sites for metastatic cancers more accurately. METHODS: We built the largest Reference Database (RefDB) reported to date, composed of microarray data from 4429 known tumour samples obtained from 100 different sources and divided into 25 cancer superclasses formed by 58 cancer subclass. Based on specific profiles generated by 95 genes, we developed a gene-expression classifier which was first trained and tested by a cross-validation. Then, we performed a double-blinded retrospective validation study using a real-time PCR-based assay on a set of 105 metastatic formalin-fixed, paraffin-embedded (FFPE) samples. A histopathological review performed by two independent pathologists served as a reference diagnosis. RESULTS: The gene-expression classifier correctly identified, by a cross-validation, 86.6% of the expected cancer superclasses of 4429 samples from the RefDB, with a specificity of 99.43%. Next, the performance of the algorithm for classifying the validation set of metastatic FFPE samples was 83.81%, with 99.04% specificity. The overall reproducibility of our gene-expression-classifier system was 97.22% of precision, with a coefficient of variation for inter-assays and intra-assays and intra-lots <4.1%. CONCLUSION: We developed a complete integrated workflow for the classification of metastatic tumour samples which may help on tumour primary site definition.