RESUMO
Breast cancer is the most common malignant neoplasm and the leading cause of cancer mortality among women globally. Current prediction models based on risk factors are inefficient in specific populations, so an appropriate and calibrated breast cancer prediction model for Cuban women is essential. This article proposes a conceptual model for breast cancer risk estimation for Cuban women using machine learning algorithms and risk factors. The model has three main components: knowledge representation, risk estimation modeling, and risk predictor evaluation. Nine of the most common machine learning algorithms were used to generate risk predictors using the proposed model. Two data sources served as case studies: the first comprised data collected from Cuban women, and the second included data from US Hispanic women obtained from the Breast Cancer Surveillance Consortium dataset. The results show that the model effectively estimates breast cancer risk and could be a valuable tool for early detection of breast cancer and identification of patients at risk. According to the first experiment results, the best predictor of breast cancer risk for the Cuban female population corresponds to the Random Forest algorithm with a weighted score of 5.981, a training accuracy of 0.996 and a training AUC of 0.997. In a second experiment, it was demonstrated that the risk predictors generated by the proposed model using data from Cuban women obtained better AUC and accuracy values compared to the predictors generated by using the US Hispanic population, potentially generalizable to other Hispanic populations. Implementing this model could be an economically viable alternative to reduce the mortality rate of this type of cancer in Latin American countries such as Cuba.