Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38607713

RESUMEN

Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a powerful tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a tiny part of instances, makes modeling AIDTM very challenging. Without prior knowledge, existing works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can effectively help mitigate the challenge of modeling general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data. The implementation is available at https://github.com/tmllab/TAIDTM.

2.
IEEE Trans Image Process ; 33: 3047-3058, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656838

RESUMEN

Detecting infrared small targets under cluttered background is mainly challenged by dim textures, low contrast and varying shapes. This paper proposes an approach to facilitate infrared small target detection by learning contrast-enhanced shape-biased representations. The approach cascades a contrast-shape encoder and a shape-reconstructable decoder to learn discriminative representations that can effectively identify target objects. The contrast-shape encoder applies a stem of central difference convolutions and a few large-kernel convolutions to extract shape-preserving features from input infrared images. This specific design in convolutions can effectively overcome the challenges of low contrast and varying shapes in a unified way. Meanwhile, the shape-reconstructable decoder accepts the edge map of input infrared image and is learned by simultaneously optimizing two shape-related consistencies: the internal one decodes the encoder representations by upsampling reconstruction and constraints segmentation consistency, whilst the external one cascades three gated ResNet blocks to hierarchically fuse edge maps and decoder representations and constrains contour consistency. This decoding way can bypass the challenge of dim texture and varying shapes. In our approach, the encoder and decoder are learned in an end-to-end manner, and the resulting shape-biased encoder representations are suitable for identifying infrared small targets. Extensive experimental evaluations are conducted on public benchmarks and the results demonstrate the effectiveness of our approach.

3.
IEEE Trans Image Process ; 33: 1149-1161, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38300775

RESUMEN

Composed query image retrieval task aims to retrieve the target image in the database by a query that composes two different modalities: a reference image and a sentence declaring that some details of the reference image need to be modified and replaced by new elements. Tackling this task needs to learn a multimodal embedding space, which can make semantically similar targets and queries close but dissimilar targets and queries as far away as possible. Most of the existing methods start from the perspective of model structure and design some clever interactive modules to promote the better fusion and embedding of different modalities. However, their learning objectives use conventional query-level examples as negatives while neglecting the composed query's multimodal characteristics, leading to the inadequate utilization of the training data and suboptimal construction of metric space. To this end, in this paper, we propose to improve the learning objective by constructing and mining hard negative examples from the perspective of multimodal fusion. Specifically, we compose the reference image and its logically unpaired sentences rather than paired ones to create component-level negative examples to better use data and enhance the optimization of metric space. In addition, we further propose a new sentence augmentation method to generate more indistinguishable multimodal negative examples from the element level and help the model learn a better metric space. Massive comparison experiments on four real-world datasets confirm the effectiveness of the proposed method.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 9186-9205, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37015650

RESUMEN

Unmanned aerial vehicle (UAV) tracking is of great significance for a wide range of applications, such as delivery and agriculture. Previous benchmarks in this area mainly focused on small-scale tracking problems while ignoring the amounts of data, types of data modalities, diversities of target categories and scenarios, and evaluation protocols involved, greatly hiding the massive power of deep UAV tracking. In this article, we propose WebUAV-3M, the largest public UAV tracking benchmark to date, to facilitate both the development and evaluation of deep UAV trackers. WebUAV-3M contains over 3.3 million frames across 4,500 videos and offers 223 highly diverse target categories. Each video is densely annotated with bounding boxes by an efficient and scalable semi-automatic target annotation (SATA) pipeline. Importantly, to take advantage of the complementary superiority of language and audio, we enrich WebUAV-3M by innovatively providing both natural language specifications and audio descriptions. We believe that such additions will greatly boost future research in terms of exploring language features and audio cues for multi-modal UAV tracking. In addition, a fine-grained UAV tracking-under-scenario constraint (UTUSC) evaluation protocol and seven challenging scenario subtest sets are constructed to enable the community to develop, adapt and evaluate various types of advanced trackers. We provide extensive evaluations and detailed analyses of 43 representative trackers and envision future research directions in the field of deep UAV tracking and beyond. The dataset, toolkits, and baseline results are available at https://github.com/983632847/WebUAV-3M.

5.
Artículo en Inglés | MEDLINE | ID: mdl-37027685

RESUMEN

Beyond high accuracy, good interpretability is very critical to deploy a face forgery detection model for visual content analysis. In this paper, we propose learning patch-channel correspondence to facilitate interpretable face forgery detection. Patch-channel correspondence aims to transform the latent features of a facial image into multi-channel interpretable features where each channel mainly encoders a corresponding facial patch. Towards this end, our approach embeds a feature reorganization layer into a deep neural network and simultaneously optimizes classification task and correspondence task via alternate optimization. The correspondence task accepts multiple zero-padding facial patch images and represents them into channel-aware interpretable representations. The task is solved by step-wisely learning channel-wise decorrelation and patch-channel alignment. Channel-wise decorrelation decouples latent features for class-specific discriminative channels to reduce feature complexity and channel correlation, while patch-channel alignment then models the pairwise correspondence between feature channels and facial patches. In this way, the learned model can automatically discover corresponding salient features associated to potential forgery regions during inference, providing discriminative localization of visualized evidences for face forgery detection while maintaining high detection accuracy. Extensive experiments on popular benchmarks clearly demonstrate the effectiveness of the proposed approach in interpreting face forgery detection without sacrificing accuracy. The source code is available at https://github.com/Jae35/IFFD.

6.
Artículo en Inglés | MEDLINE | ID: mdl-37015525

RESUMEN

While deep models have proved successful in learning rich knowledge from massive well-annotated data, they may pose a privacy leakage risk in practical deployment. It is necessary to find an effective trade-off between high utility and strong privacy. In this work, we propose a discriminative-generative distillation approach to learn privacy-preserving deep models. Our key idea is taking models as bridge to distill knowledge from private data and then transfer it to learn a student network via two streams. First, discriminative stream trains a baseline classifier on private data and an ensemble of teachers on multiple disjoint private subsets, respectively. Then, generative stream takes the classifier as a fixed discriminator and trains a generator in a data-free manner. After that, the generator is used to generate massive synthetic data which are further applied to train a variational autoencoder (VAE). Among these synthetic data, a few of them are fed into the teacher ensemble to query labels via differentially private aggregation, while most of them are embedded to the trained VAE for reconstructing synthetic data. Finally, a semi-supervised student learning is performed to simultaneously handle two tasks: knowledge transfer from the teachers with distillation on few privately labeled synthetic data, and knowledge enhancement with tangent-normal adversarial regularization on many triples of reconstructed synthetic data. In this way, our approach can control query cost over private data and mitigate accuracy degradation in a unified manner, leading to a privacy-preserving student model. Extensive experiments and analysis clearly show the effectiveness of the proposed approach.

7.
IEEE Trans Neural Netw Learn Syst ; 32(3): 1276-1288, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32305944

RESUMEN

Recent deep trackers have shown superior performance in visual tracking. In this article, we propose a cascaded correlation refinement approach to facilitate the robustness of deep tracking. The core idea is to address accurate target localization and reliable model update in a collaborative way. To this end, our approach cascades multiple stages of correlation refinement to progressively refine target localization. Thus, the localized object could be used to learn an accurate on-the-fly model for improving the reliability of model update. Meanwhile, we introduce an explicit measure to identify the tracking failure and then leverage a simple yet effective look-back scheme to adaptively incorporate the initial model and on-the-fly model to update the tracking model. As a result, the tracking model can be used to localize the target more accurately. Extensive experiments on OTB2013, OTB2015, VOT2016, VOT2018, UAV123, and GOT-10k demonstrate that the proposed tracker achieves the best robustness against the state of the arts.

8.
Artículo en Inglés | MEDLINE | ID: mdl-32324551

RESUMEN

A key for person re-identification is achieving consistent local details for discriminative representation across variable environments. Current stripe-based feature learning approaches have delivered impressive accuracy, but do not make a proper trade-off between diversity, locality, and robustness, which easily suffers from part semantic inconsistency for the conflict between rigid partition and misalignment. This paper proposes a receptive multi-granularity learning approach to facilitate stripe-based feature learning. This approach performs local partition on the intermediate representations to operate receptive region ranges, rather than current approaches on input images or output features, thus can enhance the representation of locality while remaining proper local association. Toward this end, the local partitions are adaptively pooled by using significance-balanced activations for uniform stripes. Random shifting augmentation is further introduced for a higher variance of person appearing regions within bounding boxes to ease misalignment. By twobranch network architecture, different scales of discriminative identity representation can be learned. In this way, our model can provide a more comprehensive and efficient feature representation without larger model storage costs. Extensive experiments on intra-dataset and cross-dataset evaluations demonstrate the effectiveness of the proposed approach. Especially, our approach achieves a state-of-the-art accuracy of 96.2%@Rank-1 or 90.0%@mAP on the challenging Market-1501 benchmark.

9.
Artículo en Inglés | MEDLINE | ID: mdl-31714225

RESUMEN

Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.

10.
Artículo en Inglés | MEDLINE | ID: mdl-31613767

RESUMEN

The performance of video saliency estimation techniques has achieved significant advances along with the rapid development of Convolutional Neural Networks (CNNs). However, devices like cameras and drones may have limited computational capability and storage space so that the direct deployment of complex deep saliency models becomes infeasible. To address this problem, this paper proposes a dynamic saliency estimation approach for aerial videos via spatiotemporal knowledge distillation. In this approach, five components are involved, including two teachers, two students and the desired spatiotemporal model. The knowledge of spatial and temporal saliency is first separately transferred from the two complex and redundant teachers to their simple and compact students, while the input scenes are also degraded from high-resolution to low-resolution to remove the probable data redundancy so as to greatly speed up the feature extraction process. After that, the desired spatiotemporal model is further trained by distilling and encoding the spatial and temporal saliency knowledge of two students into a unified network. In this manner, the inter-model redundancy can be removed for the effective estimation of dynamic saliency on aerial videos. Experimental results show that the proposed approach is comparable to 11 state-of-the-art models in estimating visual saliency on aerial videos, while its speed reaches up to 28,738 FPS and 1,490.5 FPS on the GPU and CPU platforms, respectively.

11.
Artículo en Inglés | MEDLINE | ID: mdl-30507531

RESUMEN

Typically, the deployment of face recognition models in the wild needs to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve higher speed and lower memory at the cost of minimal performance drop. Inspired by that, this paper proposes a learning approach to recognize low-resolution faces via selective knowledge distillation. In this approach, a two-stream convolutional neural network (CNN) is first initialized to recognize high-resolution faces and resolution-degraded faces with a teacher stream and a student stream, respectively. The teacher stream is represented by a complex CNN for high-accuracy recognition, and the student stream is represented by a much simpler CNN for low-complexity recognition. To avoid significant performance drop at the student stream, we then selectively distil the most informative facial features from the teacher stream by solving a sparse graph optimization problem, which are then used to regularize the finetuning process of the student stream. In this way, the student stream is actually trained by simultaneously handling two tasks with limited computational resources: approximating the most informative facial cues via feature regression, and recovering the missing facial cues via low-resolution face classification. Experimental results show that the student stream performs impressively in recognizing low-resolution faces and costs only 0.15MB memory and runs at 418 faces per second on CPU and 9; 433 faces per second on GPU.

12.
Funct Plant Biol ; 43(5): 393-402, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-32480470

RESUMEN

By analysing the cDNA microarray of the salt tolerant mutant of wheat RH8706-49 under salinity stress, our results showed an expressed sequence tag fragment and acquired an unknown gene (designated as TaBAG) with a BAG conserved domain through electronic cloning and RT-PCR technology. The gene was registered into GenBank (No. FJ599765). After homologous alignment analysis, electronic cloning, and amplifying with RT-PCR, the other gene with a BAG conserved domain, TaBAG2, was obtained and registered into GenBank (No. GU471210). Quantitative PCR analysis demonstrated that TaBAG2 expression was induced by saline and heat stress. TaBAG gene expression under salinity stress increased remarkably but showed an insignificant response to heat stress. The adversity stress detection results showed that Arabidopsis overexpressing TaBAG and TaBAG2 exhibited an obvious salt tolerance increase. Under heat stress, Arabidopsis overexpressing TaBAG2 showed increased heat tolerance; however, the heat tolerance of Arabidopsis overexpressing TaBAG did not vary significantly under heat stress. Subcellular localisation results demonstrated that TaBAGs were mainly located in the cytoplasm and the cell nucleus. We applied fluorescence complementation and yeast two-hybrid technique to prove that TaBAG2 can obviously bond with TaHsp70 and TaCaMs. After the respective mutation of aspartic acid (D) and arginine (R) at high conservation in BAG domain of TaBAG2, the bonding interaction between TaBAG2 and TaHsp70 disappeared, indicating that the two amino acids were the key loci for the interaction between TaBAG2 and TaHsp70. Heat tolerance detection results demonstrated that the heat tolerance of Arabidopsis overexpressing and cotransfected with TaBAG2 and TaHsp70 was much higher than that of Arabidopsis overexpressing TaBAG2 and Arabidopsis overexpressing TaHSP70. This finding implies that the synergistic use of TaBAG2 and TaHSP70 can improve heat tolerance of plants.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA