Búsqueda | Portal Regional de la BVS

1.

CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking.

Wang, Jian; Song, Yueming; Song, Ce; Tian, Haonan; Zhang, Shuai; Sun, Jinghui.

Sensors (Basel) ; 24(1)2024 Jan 03.

Artículo en Inglés | MEDLINE | ID: mdl-38203136

RESUMEN

Most single-object trackers currently employ either a convolutional neural network (CNN) or a vision transformer as the backbone for object tracking. In CNNs, convolutional operations excel at extracting local features but struggle to capture global representations. On the other hand, vision transformers utilize cascaded self-attention modules to capture long-range feature dependencies but may overlook local feature details. To address these limitations, we propose a target-tracking algorithm called CVTrack, which leverages a parallel dual-branch backbone network combining CNN and Transformer for feature extraction and fusion. Firstly, CVTrack utilizes a parallel dual-branch feature extraction network with CNN and transformer branches to extract local and global features from the input image. Through bidirectional information interaction channels, the local features from the CNN branch and the global features from the transformer branch are able to interact and fuse information effectively. Secondly, deep cross-correlation operations and transformer-based methods are employed to fuse the template and search region features, enabling comprehensive interaction between them. Subsequently, the fused features are fed into the prediction module to accomplish the object-tracking task. Our tracker achieves state-of-the-art performance on five benchmark datasets while maintaining real-time execution speed. Finally, we conduct ablation studies to demonstrate the efficacy of each module in the parallel dual-branch feature extraction backbone network.

2.

SPT: Single Pedestrian Tracking Framework with Re-Identification-Based Learning Using the Siamese Model.

Manzoor, Sumaira; An, Ye-Chan; In, Gun-Gyo; Zhang, Yueyuan; Kim, Sangmin; Kuc, Tae-Yong.

Sensors (Basel) ; 23(10)2023 May 19.

Artículo en Inglés | MEDLINE | ID: mdl-37430819

RESUMEN

Pedestrian tracking is a challenging task in the area of visual object tracking research and it is a vital component of various vision-based applications such as surveillance systems, human-following robots, and autonomous vehicles. In this paper, we proposed a single pedestrian tracking (SPT) framework for identifying each instance of a person across all video frames through a tracking-by-detection paradigm that combines deep learning and metric learning-based approaches. The SPT framework comprises three main modules: detection, re-identification, and tracking. Our contribution is a significant improvement in the results by designing two compact metric learning-based models using Siamese architecture in the pedestrian re-identification module and combining one of the most robust re-identification models for data associated with the pedestrian detector in the tracking module. We carried out several analyses to evaluate the performance of our SPT framework for single pedestrian tracking in the videos. The results of the re-identification module validate that our two proposed re-identification models surpass existing state-of-the-art models with increased accuracies of 79.2% and 83.9% on the large dataset and 92% and 96% on the small dataset. Moreover, the proposed SPT tracker, along with six state-of-the-art (SOTA) tracking models, has been tested on various indoor and outdoor video sequences. A qualitative analysis considering six major environmental factors verifies the effectiveness of our SPT tracker under illumination changes, appearance variations due to pose changes, changes in target position, and partial occlusions. In addition, quantitative analysis based on experimental results also demonstrates that our proposed SPT tracker outperforms the GOTURN, CSRT, KCF, and SiamFC trackers with a success rate of 79.7% while beating the DiamSiamRPN, SiamFC, CSRT, GOTURN, and SiamMask trackers with an average of 18 tracking frames per second.

3.

Visual Object Tracking in First Person Vision.

Dunnhofer, Matteo; Furnari, Antonino; Farinella, Giovanni Maria; Micheloni, Christian.

Int J Comput Vis ; 131(1): 259-283, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36624862

RESUMEN

The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used "off-the-shelf" or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated. Supplementary Information: The online version contains supplementary material available at 10.1007/s11263-022-01694-6.

4.

Object Relocation Visual Tracking Based on Histogram Filter and Siamese Network in Intelligent Transportation.

Zhang, Jianlong; Liu, Yifan; Li, Qiao; He, Ci; Wang, Bin; Wang, Tianhong.

Sensors (Basel) ; 22(22)2022 Nov 08.

Artículo en Inglés | MEDLINE | ID: mdl-36433211

RESUMEN

Target detection and tracking algorithms are one of the key technologies in the field of autonomous driving in intelligent transportation, providing important sensing capabilities for vehicle localization and path planning. Siamese network-based trackers formulate the visual tracking mission as an image-matching process by regression and classification branches, which simplifies the network structure and improves the tracking accuracy. However, there remain many problems, as described below. (1) The lightweight neural networks decrease the feature representation ability. It is easy for the tracker to fail under the disturbing distractors (e.g., deformation and similar objects) or large changes in the viewing angle. (2) The tracker cannot adapt to variations of the object. (3) The tracker cannot reposition the object that has failed to track. To address these issues, we first propose a novel match filter arbiter based on the Euclidean distance histogram between the centers of multiple candidate objects to automatically determine whether the tracker fails. Secondly, the Hopcroft-Karp algorithm is introduced to select the winners from the dynamic template set through the backtracking process, and object relocation is achieved by comparing the Gradient Magnitude Similarity Deviation between the template and the winners. The experiments show that our method obtains better performance on several tracking benchmarks, i.e., OTB100, VOT2018, GOT-10k, and LaSOT, compared with state-of-the-art methods.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Transportes

5.

Deep learning methods for inverse problems.

Kamyab, Shima; Azimifar, Zohreh; Sabzi, Rasool; Fieguth, Paul.

PeerJ Comput Sci ; 8: e951, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35634121

RESUMEN

In this paper we investigate a variety of deep learning strategies for solving inverse problems. We classify existing deep learning solutions for inverse problems into three categories of Direct Mapping, Data Consistency Optimizer, and Deep Regularizer. We choose a sample of each inverse problem type, so as to compare the robustness of the three categories, and report a statistical analysis of their differences. We perform extensive experiments on the classic problem of linear regression and three well-known inverse problems in computer vision, namely image denoising, 3D human face inverse rendering, and object tracking, in presence of noise and outliers, are selected as representative prototypes for each class of inverse problems. The overall results and the statistical analyses show that the solution categories have a robustness behaviour dependent on the type of inverse problem domain, and specifically dependent on whether or not the problem includes measurement outliers. Based on our experimental results, we conclude by proposing the most robust solution category for each inverse problem class.

6.

Optimal Training Configurations of a CNN-LSTM-Based Tracker for a Fall Frame Detection System.

Mohamed, Nur Ayuni; Zulkifley, Mohd Asyraf; Ibrahim, Ahmad Asrul; Aouache, Mustapha.

Sensors (Basel) ; 21(19)2021 Sep 28.

Artículo en Inglés | MEDLINE | ID: mdl-34640803

RESUMEN

In recent years, there has been an immense amount of research into fall event detection. Generally, a fall event is defined as a situation in which a person unintentionally drops down onto a lower surface. It is crucial to detect the occurrence of fall events as early as possible so that any severe fall consequences can be minimized. Nonetheless, a fall event is a sporadic incidence that occurs seldomly that is falsely detected due to a wide range of fall conditions and situations. Therefore, an automated fall frame detection system, which is referred to as the SmartConvFall is proposed to detect the exact fall frame in a video sequence. It is crucial to know the exact fall frame as it dictates the response time of the system to administer an early treatment to reduce the fall's negative consequences and related injuries. Henceforth, searching for the optimal training configurations is imperative to ensure the main goal of the SmartConvFall is achieved. The proposed SmartConvFall consists of two parts, which are object tracking and instantaneous fall frame detection modules that rely on deep learning representations. The first stage will track the object of interest using a fully convolutional neural network (CNN) tracker. Various training configurations such as optimizer, learning rate, mini-batch size, number of training samples, and region of interest are individually evaluated to determine the best configuration to produce the best tracker model. Meanwhile, the second module goal is to determine the exact instantaneous fall frame by modeling the continuous object trajectories using the Long Short-Term Memory (LSTM) network. Similarly, the LSTM model will undergo various training configurations that cover different types of features selection and the number of stacked layers. The exact instantaneous fall frame is determined using an assumption that a large movement difference with respect to the ground level along the vertical axis can be observed if a fall incident happened. The proposed SmartConvFall is a novel technique as most of the existing methods still relying on detection rather than the tracking module. The SmartConvFall outperforms the state-of-the-art trackers, namely TCNN and MDNET-N trackers, with the highest expected average overlap, robustness, and reliability metrics of 0.1619, 0.6323, and 0.7958, respectively. The SmartConvFall also managed to produce the lowest number of tracking failures with only 43 occasions. Moreover, a three-stack LSTM delivers the lowest mean error with approximately one second delay time in locating the exact instantaneous fall frame. Therefore, the proposed SmartConvFall has demonstrated its potential and suitability to be implemented for a real-time application that could help to avoid any crucial fall consequences such as death and internal bleeding if the early treatment can be administered.

Asunto(s)

Movimiento , Redes Neurales de la Computación , Humanos , Reproducibilidad de los Resultados

7.

Real-Time Multiobject Tracking Based on Multiway Concurrency.

Gong, Xuan; Le, Zichun; Wu, Yukun; Wang, Hui.

Sensors (Basel) ; 21(3)2021 Jan 20.

Artículo en Inglés | MEDLINE | ID: mdl-33498327

RESUMEN

This paper explored a pragmatic approach to research the real-time performance of a multiway concurrent multiobject tracking (MOT) system. At present, most research has focused on the tracking of single-image sequences, but in practical applications, multiway video streams need to be processed in parallel by MOT systems. There have been few studies on the real-time performance of multiway concurrent MOT systems. In this paper, we proposed a new MOT framework to solve multiway concurrency scenario based on a tracking-by-detection (TBD) model. The new framework mainly focuses on concurrency and real-time based on limited computing and storage resources, while considering the algorithm performance. For the former, three aspects were studied: (1) Expanded width and depth of tracking-by-detection model. In terms of width, the MOT system can support the process of multiway video sequence at the same time; in terms of depth, image collectors and bounding box collectors were introduced to support batch processing. (2) Considering the real-time performance and multiway concurrency ability, we proposed one kind of real-time MOT algorithm based on directly driven detection. (3) Optimization of system level-we also utilized the inference optimization features of NVIDIA TensorRT to accelerate the deep neural network (DNN) in the tracking algorithm. To trade off the performance of the algorithm, a negative sample (false detection sample) filter was designed to ensure tracking accuracy. Meanwhile, the factors that affect the system real-time performance and concurrency were studied. The experiment results showed that our method has a good performance in processing multiple concurrent real-time video streams.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA