Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
IEEE Trans Image Process ; 32: 2468-2480, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37115831

RESUMEN

Human-object relationship detection reveals the fine-grained relationship between humans and objects, helping the comprehensive understanding of videos. Previous human-object relationship detection approaches are mainly developed with object features and relation features without exploring the specific information of humans. In this paper, we propose a novel Relation-Pose Transformer (RPT) for human-object relationship detection. Inspired by the coordination of eye-head-body movements in cognitive science, we employ the head pose to find those crucial objects that humans focus on and use the body pose with skeleton information to represent multiple actions. Then, we utilize the spatial encoder to capture spatial contextualized information of the relation pair, which integrates the relation features and pose features. Next, the temporal decoder aims to model the temporal dependency of the relationship. Finally, we adopt multiple classifiers to predict different types of relationships. Extensive experiments on the benchmark Action Genome validate the effectiveness of our proposed method and show the state-of-the-art performance compared with related methods.


Asunto(s)
Cognición , Apego a Objetos , Humanos , Benchmarking , Movimientos de la Cabeza , Esqueleto
2.
IEEE Trans Image Process ; 31: 4076-4089, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35446767

RESUMEN

Objects with different orientations are ubiquitous in the real world (e.g., texts/hands in the scene image, objects in the aerial image, etc.), and the widely-used axis-aligned bounding box does not compactly enclose the oriented objects. Thus arbitrarily-oriented object detection has attracted rising attention in recent years. In this paper, we propose a novel and effective model to detect arbitrarily-oriented objects. Instead of directly predicting the angles of oriented bounding boxes like most existing methods, we evolve the axis-aligned bounding box to the oriented quadrilateral box with the assistance of dynamically gathering contour information. More specifically, we first obtain the axis-aligned bounding box in an anchor-free manner. After that, we set the key points based on the sampled contour points of the axis-aligned bounding box. To improve the localization performance, we enrich the feature representations of these key points by exploiting a dynamic information gathering mechanism. This technique propagates the geometrical and semantic information along the sampled contour points, and fuses the information from the semantic neighbors of each sampled point, which varies for different locations. Finally, we estimate the offsets between the axis-aligned bounding box key points and the oriented quadrilateral box corner points. Extensive experiments on two frequently-used aerial image benchmarks HRSC2016 and DOTA, as well as scene text/hand datasets ICDAR2015, TD500, and Oxford-Hand, demonstrate the effectiveness and advantage of our proposed model.

3.
IEEE Trans Image Process ; 30: 1687-1701, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33360990

RESUMEN

Scene text recognition, the final step of the scene text reading system, has made impressive progress based on deep neural networks. However, existing recognition methods devote to dealing with the geometrically regular or irregular scene text. They are limited to the semantically arbitrary-orientation scene text. Meanwhile, previous scene text recognizers usually learn the single-scale feature representations for various-scale characters, which cannot model effective contexts for different characters. In this paper, we propose a novel scale-adaptive orientation attention network for arbitrary-orientation scene text recognition, which consists of a dynamic log-polar transformer and a sequence recognition network. Specifically, the dynamic log-polar transformer learns the log-polar origin to adaptively convert the arbitrary rotations and scales of scene texts into the shifts in the log-polar space, which is helpful to generate the rotation-aware and scale-aware visual representation. Next, the sequence recognition network is an encoder-decoder model, which incorporates a novel character-level receptive field attention module to encode more valid contexts for various-scale characters. The whole architecture can be trained in an end-to-end manner, only requiring the word image and its corresponding ground-truth text. Extensive experiments on several public datasets have demonstrated the effectiveness and superiority of our proposed method.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA