Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Neural Netw ; 167: 233-243, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37660672

RESUMEN

Domain shifts in the training data are common in practical applications of machine learning; they occur for instance when the data is coming from different sources. Ideally, a ML model should work well independently of these shifts, for example, by learning a domain-invariant representation. However, common ML losses do not give strong guarantees on how consistently the ML model performs for different domains, in particular, whether the model performs well on a domain at the expense of its performance on another domain. In this paper, we build new theoretical foundations for this problem, by contributing a set of mathematical relations between classical losses for supervised ML and the Wasserstein distance in joint space (i.e. representation and output space). We show that classification or regression losses, when combined with a GAN-type discriminator between domains, form an upper-bound to the true Wasserstein distance between domains. This implies a more invariant representation and also more stable prediction performance across domains. Theoretical results are corroborated empirically on several image datasets. Our proposed approach systematically produces the highest minimum classification accuracy across domains, and the most invariant representation.


Asunto(s)
Aprendizaje Automático
2.
Neural Comput ; 35(7): 1288-1339, 2023 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-37187163

RESUMEN

We consider the scenario of deep clustering, in which the available prior knowledge is limited. In this scenario, few existing state-of-the-art deep clustering methods can perform well for both noncomplex topology and complex topology data sets. To address the problem, we propose a constraint utilizing symmetric InfoNCE, which helps an objective of the deep clustering method in the scenario of training the model so as to be efficient for not only noncomplex topology but also complex topology data sets. Additionally, we provide several theoretical explanations of the reason that the constraint can enhances the performance of deep clustering methods. To confirm the effectiveness of the proposed constraint, we introduce a deep clustering method named MIST, which is a combination of an existing deep clustering method and our constraint. Our numerical experiments via MIST demonstrate that the constraint is effective. In addition, MIST outperforms other state-of-the-art deep clustering methods for most of the commonly used 10 benchmark data sets.

3.
Neural Netw ; 144: 394-406, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34562813

RESUMEN

Uncertainty evaluation is a core technique when deep neural networks (DNNs) are used in real-world problems. In practical applications, we often encounter unexpected samples that have not seen in the training process. Not only achieving the high-prediction accuracy but also detecting uncertain data is significant for safety-critical systems. In statistics and machine learning, Bayesian inference has been exploited for uncertainty evaluation. The Bayesian neural networks (BNNs) have recently attracted considerable attention in this context, as the DNN trained using dropout is interpreted as a Bayesian method. Based on this interpretation, several methods to calculate the Bayes predictive distribution for DNNs have been developed. Though the Monte-Carlo method called MC dropout is a popular method for uncertainty evaluation, it requires a number of repeated feed-forward calculations of DNNs with randomly sampled weight parameters. To overcome the computational issue, we propose a sampling-free method to evaluate uncertainty. Our method converts a neural network trained using dropout to the corresponding Bayesian neural network with variance propagation. Our method is available not only to feed-forward NNs but also to recurrent NNs such as LSTM. We report the computational efficiency and statistical reliability of our method in numerical experiments of language modeling using RNNs, and the out-of-distribution detection with DNNs.


Asunto(s)
Redes Neurales de la Computación , Teorema de Bayes , Método de Montecarlo , Reproducibilidad de los Resultados , Incertidumbre
4.
Neural Comput ; 31(8): 1718-1750, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31260393

RESUMEN

In this letter, we propose a variable selection method for general nonparametric kernel-based estimation. The proposed method consists of two-stage estimation: (1) construct a consistent estimator of the target function, and (2) approximate the estimator using a few variables by ℓ1-type penalized estimation. We see that the proposed method can be applied to various kernel nonparametric estimation such as kernel ridge regression, kernel-based density, and density-ratio estimation. We prove that the proposed method has the property of variable selection consistency when the power series kernel is used. Here, the power series kernel is a certain class of kernels containing polynomial and exponential kernels. This result is regarded as an extension of the variable selection consistency for the nonnegative garrote (NNG), a special case of the adaptive Lasso, to the kernel-based estimators. Several experiments, including simulation studies and real data applications, show the effectiveness of the proposed method.


Asunto(s)
Aprendizaje Automático , Adulto , Simulación por Computador , Diabetes Mellitus/clasificación , Femenino , Humanos , Modelos Logísticos , Neoplasias/clasificación , Síndrome de Paro Post-Cardíaco/clasificación , Insuficiencia Renal Crónica/clasificación , Estadísticas no Paramétricas
5.
Entropy (Basel) ; 21(7)2019 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-33267416

RESUMEN

The quality of online services highly depends on the accuracy of the recommendations they can provide to users. Researchers have proposed various similarity measures based on the assumption that similar people like or dislike similar items or people, in order to improve the accuracy of their services. Additionally, statistical models, such as the stochastic block models, have been used to understand network structures. In this paper, we discuss the relationship between similarity-based methods and statistical models using the Bernoulli mixture models and the expectation-maximization (EM) algorithm. The Bernoulli mixture model naturally leads to a completely positive matrix as the similarity matrix. We prove that most of the commonly used similarity measures yield completely positive matrices as the similarity matrix. Based on this relationship, we propose an algorithm to transform the similarity matrix to the Bernoulli mixture model. Such a correspondence provides a statistical interpretation to similarity-based methods. Using this algorithm, we conduct numerical experiments using synthetic data and real-world data provided from an online dating site, and report the efficiency of the recommendation system based on the Bernoulli mixture models.

6.
Entropy (Basel) ; 21(8)2019 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-33267508

RESUMEN

We propose a new clustering method based on a deep neural network. Given an unlabeled dataset and the number of clusters, our method directly groups the dataset into the given number of clusters in the original space. We use a conditional discrete probability distribution defined by a deep neural network as a statistical model. Our strategy is first to estimate the cluster labels of unlabeled data points selected from a high-density region, and then to conduct semi-supervised learning to train the model by using the estimated cluster labels and the remaining unlabeled data points. Lastly, by using the trained model, we obtain the estimated cluster labels of all given unlabeled data points. The advantage of our method is that it does not require key conditions. Existing clustering methods with deep neural networks assume that the cluster balance of a given dataset is uniform. Moreover, it also can be applied to various data domains as long as the data is expressed by a feature vector. In addition, it is observed that our method is robust against outliers. Therefore, the proposed method is expected to perform, on average, better than previous methods. We conducted numerical experiments on five commonly used datasets to confirm the effectiveness of the proposed method.

7.
Neural Netw ; 95: 44-56, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28886404

RESUMEN

This paper develops a general framework of statistical inference on discrete sample spaces, on which a neighborhood system is defined by an undirected graph. The scoring rule is a measure of the goodness of fit for the model to observed samples, and we employ its localized version, local scoring rules, which does not require the normalization constant. We show that the local scoring rule is closely related to a discrepancy measure called composite local Bregman divergence. Then, we investigate the statistical consistency of local scoring rules in terms of the graphical structure of the sample space. Moreover, we propose a robust and computationally efficient estimator based on our framework. In numerical experiments, we investigate the relation between the neighborhood system and estimation accuracy. Also, we numerically evaluate the robustness of localized estimators.


Asunto(s)
Redes Neurales de la Computación , Interpretación Estadística de Datos , Funciones de Verosimilitud
8.
Neural Netw ; 94: 173-191, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28797759

RESUMEN

We propose a unified formulation of robust learning methods for classification and regression problems. In the learning methods, the hinge loss is used with outlier indicators in order to detect outliers in the observed data. To analyze the robustness property, we evaluate the breakdown point of the learning methods in the situation that the outlier ratio is not necessarily small. Although minimization of the hinge loss with outlier indicators is a non-convex optimization problem, we prove that any local optimal solution of our learning algorithms has the robustness property. The theoretical findings are confirmed in numerical experiments.


Asunto(s)
Redes Neurales de la Computación , Máquina de Vectores de Soporte
9.
Neural Comput ; 29(5): 1406-1438, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28333592

RESUMEN

Nonconvex variants of support vector machines (SVMs) have been developed for various purposes. For example, robust SVMs attain robustness to outliers by using a nonconvex loss function, while extended [Formula: see text]-SVM (E[Formula: see text]-SVM) extends the range of the hyperparameter by introducing a nonconvex constraint. Here, we consider an extended robust support vector machine (ER-SVM), a robust variant of E[Formula: see text]-SVM. ER-SVM combines two types of nonconvexity from robust SVMs and E[Formula: see text]-SVM. Because of the two nonconvexities, the existing algorithm we proposed needs to be divided into two parts depending on whether the hyperparameter value is in the extended range or not. The algorithm also heuristically solves the nonconvex problem in the extended range. In this letter, we propose a new, efficient algorithm for ER-SVM. The algorithm deals with two types of nonconvexity while never entailing more computations than either E[Formula: see text]-SVM or robust SVM, and it finds a critical point of ER-SVM. Furthermore, we show that ER-SVM includes the existing robust SVMs as special cases. Numerical experiments confirm the effectiveness of integrating the two nonconvexities.

10.
Neural Comput ; 26(11): 2541-69, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25058701

RESUMEN

Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.


Asunto(s)
Inteligencia Artificial , Modelos Teóricos , Conducta de Reducción del Riesgo , Máquina de Vectores de Soporte , Algoritmos , Administración Financiera , Humanos
11.
Neural Netw ; 57: 29-38, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24914491

RESUMEN

We propose a unified machine learning model (UMLM) for two-class classification, regression and outlier (or novelty) detection via a robust optimization approach. The model embraces various machine learning models such as support vector machine-based and minimax probability machine-based classification and regression models. The unified framework makes it possible to compare and contrast existing learning models and to explain their differences and similarities. In this paper, after relating existing learning models to UMLM, we show some theoretical properties for UMLM. Concretely, we show an interpretation of UMLM as minimizing a well-known financial risk measure (worst-case value-at risk (VaR) or conditional VaR), derive generalization bounds for UMLM using such a risk measure, and prove that solving problems of UMLM leads to estimators with the minimized generalization bounds. Those theoretical properties are applicable to related existing learning models.


Asunto(s)
Algoritmos , Máquina de Vectores de Soporte , Modelos Teóricos , Riesgo
12.
Neural Comput ; 25(10): 2734-75, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23777524

RESUMEN

We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, this procedure does not necessarily work well because the first step is performed without regard to the second step, and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this letter, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a nonparametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be used in L²-distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.


Asunto(s)
Inteligencia Artificial , Minería de Datos/estadística & datos numéricos , Algoritmos , Australia , Bases de Datos Factuales , Diabetes Mellitus/epidemiología , Alemania , Humanos , Modelos Logísticos , Programas Informáticos
13.
Neural Comput ; 25(5): 1324-70, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23547952

RESUMEN

Divergence estimators based on direct approximation of density ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is a challenging task in practice. In this letter, we use relative divergences for distribution comparison, which involves approximation of relative density ratios. Since relative density ratios are always smoother than corresponding ordinary density ratios, our proposed method is favorable in terms of nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach.

14.
Neural Comput ; 25(3): 759-804, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-23272917

RESUMEN

A wide variety of machine learning algorithms such as the support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA) exist for binary classification. The purpose of this letter is to provide a unified classification model that includes these models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVMs become applicable to MPM and FDA, and vice versa. For example, we can obtain nonconvex variants of MPM and FDA by mimicking Perez-Cruz, Weston, Hermann, and Schölkopf's (2003) extension from convex ν-SVM to nonconvex Eν-SVM. Another benefit is to provide theoretical results concerning these learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and prove that the model is a good approximation for the worst-case minimization of an expected loss with respect to the uncertain probability distribution. We also propose a nonconvex optimization algorithm that can be applied to nonconvex variants of existing learning methods and show promising numerical results.


Asunto(s)
Máquina de Vectores de Soporte
15.
Neural Netw ; 24(7): 735-51, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21571502

RESUMEN

The goal of the two-sample test (a.k.a. the homogeneity test) is, given two sets of samples, to judge whether the probability distributions behind the samples are the same or not. In this paper, we propose a novel non-parametric method of two-sample test based on a least-squares density ratio estimator. Through various experiments, we show that the proposed method overall produces smaller type-II error (i.e., the probability of judging the two distributions to be the same when they are actually different) than a state-of-the-art method, with slightly larger type-I error (i.e., the probability of judging the two distributions to be different when they are actually the same).


Asunto(s)
Análisis de los Mínimos Cuadrados , Algoritmos
16.
Neural Netw ; 24(2): 183-98, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21059481

RESUMEN

Methods for directly estimating the ratio of two probability density functions have been actively explored recently since they can be used for various data processing tasks such as non-stationarity adaptation, outlier detection, and feature selection. In this paper, we develop a new method which incorporates dimensionality reduction into a direct density-ratio estimation procedure. Our key idea is to find a low-dimensional subspace in which densities are significantly different and perform density-ratio estimation only in this subspace. The proposed method, D(3)-LHSS (Direct Density-ratio estimation with Dimensionality reduction via Least-squares Hetero-distributional Subspace Search), is shown to overcome the limitation of baseline methods.


Asunto(s)
Inteligencia Artificial , Análisis de los Mínimos Cuadrados , Modelos Teóricos
17.
Neural Netw ; 23(7): 843-64, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20542407

RESUMEN

The purpose of this paper is to study loss functions in multiclass classification. In classification problems, the decision function is estimated by minimizing an empirical loss function, and then, the output label is predicted by using the estimated decision function. We propose a class of loss functions which is obtained by a deformation of the log-likelihood loss function. There are four main reasons why we focus on the deformed log-likelihood loss function: (1) this is a class of loss functions which has not been deeply investigated so far, (2) in terms of computation, a boosting algorithm with a pseudo-loss is available to minimize the proposed loss function, (3) the proposed loss functions provide a clear correspondence between the decision functions and conditional probabilities of output labels, (4) the proposed loss functions satisfy the statistical consistency of the classification error rate which is a desirable property in classification problems. Based on (3), we show that the deformed log-likelihood loss provides a model of mislabeling which is useful as a statistical model of medical diagnostics. We also propose a robust loss function against outliers in multiclass classification based on our approach. The robust loss function is a natural extension of the existing robust loss function for binary classification. A model of mislabeling and a robust loss function are useful to cope with noisy data. Some numerical studies are presented to show the robustness of the proposed loss function. A mathematical characterization of the deformed log-likelihood loss function is also presented.


Asunto(s)
Inteligencia Artificial , Aprendizaje , Modelos Estadísticos , Redes Neurales de la Computación , Algoritmos , Funciones de Verosimilitud
18.
BMC Bioinformatics ; 10 Suppl 1: S52, 2009 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-19208155

RESUMEN

BACKGROUND: Although microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions. RESULTS: To elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization. CONCLUSION: We proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Fenómenos Biológicos , Análisis por Conglomerados , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos
19.
Neural Comput ; 21(2): 533-59, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19196229

RESUMEN

The goal of regression analysis is to describe the stochastic relationship between an input vector x and a scalar output y. This can be achieved by estimating the entire conditional density p(y / x). In this letter, we present a new approach for nonparametric conditional density estimation. We develop a piecewise-linear path-following method for kernel-based quantile regression. It enables us to estimate the cumulative distribution function of p(y / x) in piecewise-linear form for all x in the input domain. Theoretical analyses and experimental results are presented to show the effectiveness of the approach.


Asunto(s)
Redes Neurales de la Computación , Análisis de Regresión , Estadísticas no Paramétricas , Factores de Edad , Algoritmos , Densidad Ósea , Simulación por Computador , Interpretación Estadística de Datos , Femenino , Vivienda/economía , Vivienda/estadística & datos numéricos , Humanos , Masculino
20.
Neural Comput ; 20(6): 1596-630, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18194110

RESUMEN

We discuss robustness against mislabeling in multiclass labels for classification problems and propose two algorithms of boosting, the normalized Eta-Boost.M and Eta-Boost.M, based on the Eta-divergence. Those two boosting algorithms are closely related to models of mislabeling in which the label is erroneously exchanged for others. For the two boosting algorithms, theoretical aspects supporting the robustness for mislabeling are explored. We apply the proposed two boosting methods for synthetic and real data sets to investigate the performance of these methods, focusing on robustness, and confirm the validity of the proposed methods.


Asunto(s)
Algoritmos , Inteligencia Artificial , Redes Neurales de la Computación , Clasificación/métodos , Modelos Estadísticos , Dinámicas no Lineales
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA