Avoiding duplications in person detection across video frames
Loading...
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Person re-identification (Re-ID) is a cornerstone of modern video surveillance and smart-city applications, demanding the reliable matching of pedestrian images across non-overlapping cameras despite variations in pose, lighting, background clutter, and occlusion. Here, a person re-identification (Re-ID) system built around a ResNet-50 backbone augmented with multi-level attention and part-aware Transformer encoding is presented. The network begins by extracting deep feature maps from pedestrian images, which are then refined through a channel-wise squeeze-and-excitation block and a spatial attention module: together, these attentional layers suppress background clutter and highlight discriminative cues—such as clothing textures and carried objects— by adaptively weighting feature dimensions and spatial locations. To capture structural dependencies across body regions, the attention-refined feature map is partitioned into horizontal strips corresponding to semantic parts (head, torso, legs), each of which is fed into a lightweight Transformer encoder that dynamically models inter-part relationships, enabling robust identification under pose variation and partial occlusion. Training is stabilized and accelerated via mixed-precision optimization with automatic gradient scaling and gradient clipping, alongside a label-smoothed cross-entropy loss that mitigates overconfidence. A two-stage learning-rate schedule—a brief linear warm-up followed by cosine-annealing decay—ensures rapid initial convergence without catastrophic divergence. At inference, global descriptors are efficiently extracted and pairwise distances computed to evaluate mean average precision (mAP) and Rank- 1 accuracy on the Market-1501 benchmark. Empirical results demonstrate that this architecture achieves competitive retrieval performance—regularly exceeding 0.74 mAP and 0.90 Rank-1 accuracy while maintaining computational efficiency and ease of extension. All data-processing pipelines, training scripts, and evaluation code are fully open-source, providing a reproducible framework for future advances in attention-driven person Re-ID.
Description
Keywords
Citation
Lohanathen, S.P. (2025). Avoiding duplications in person detection across video frames [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24583
