RESUMEN
Human Immunodeficiency virus (HIV) and its clinical entity, the Acquired Immunodeficiency Syndrome (AIDS) continue to represent an important health burden worldwide. Although great advances have been made towards determining the way viral genetic diversity affects clinical outcome, genetic association studies have been hindered by the complexity of their interactions with the human host. This study provides an innovative approach for the identification and analysis of epidemiological associations between HIV Viral Infectivity Factor (Vif) protein mutations and four clinical endpoints (Viral load and CD4 T cell numbers at time of both clinical debut and on historical follow-up of patients. Furthermore, this study highlights an alternative approach to the analysis of imbalanced datasets, where patients without specific mutations outnumber those with mutations. Imbalanced datasets are still a challenge hindering the development of classification algorithms through machine learning. This research deals with Decision Trees, Naïve Bayes (NB), Support Vector Machines (SVMs), and Artificial Neural Networks (ANNs). This paper proposes a new methodology considering an undersampling approach to deal with imbalanced datasets and introduces two novel and differing approaches (MAREV-1 and MAREV-2). As theses approaches do not involve human pre-determined and hypothesis-driven combinations of motifs having functional or clinical relevance, they provide a unique opportunity to discover novel complex motif combinations of interest. Moreover, the motif combinations found can be analyzed through traditional statistical approaches avoiding statistical corrections for multiple tests.
Asunto(s)
Infecciones por VIH , VIH-1 , Humanos , Secuencias de Aminoácidos , Productos del Gen vif del Virus de la Inmunodeficiencia Humana/genética , Productos del Gen vif del Virus de la Inmunodeficiencia Humana/metabolismo , Teorema de Bayes , Mutación , Aprendizaje Automático , VIH-1/metabolismoRESUMEN
BACKGROUND: The virion infectivity factor (Vif) is an accessory protein, which is essential for HIV replication in host cells. Vif neutralizes the antiviral host protein APOBEC3 through recruitment of the E3 ubiquitin ligase complex. METHODOLOGY: Fifty thousand Vif models were generated using the ab initio relax protocol of the Rosetta algorithm from sets of three- and nine-residue fragments using the fragment Monte Carlo insertion-simulated annealing strategy, which favors protein-like features, followed by an all-atom refinement. In the protocol, a constraints archive was used to define the spatial relationship between the side chains from Cys/His residues and zinc ions that formed the zinc-finger motif that is essential for Vif function. We also performed centroids analysis and structural analysis with respect to the formation of the zinc-finger, and the residue disposal in the protein binding domains. Additionally, molecular docking was used to explore details of Vif-A3G and Vif-EloBC interactions. Furthermore, molecular dynamics simulation was used to evaluate the stability of the complexes Vif-EloBC-A3G and Vif-EloC. PRINCIPAL FINDINGS: The zinc in the HCCH domain significantly alters the folding of Vif and changes the structural dynamics of the HCCH region. Ab initio modeling indicated that the Vif zinc-finger possibly displays tetrahedral geometry as suggested by Mehle et al. (2006). Our model also showed that the residues L146 and L149 of the BC-box motif bind to EloC by hydrophobic interactions, and the residue P162 of the PPLP motif is important to EloB binding. CONCLUSIONS/SIGNIFICANCE: The model presented here is the first complete three-dimensional structure of the Vif. The interaction of Vif with the A3G protein and the EloBC complex is in agreement with empirical data that is currently available in the literature and could therefore provide valuable structural information for advances in rational drug design.