Avoiding duplications in person detection across video frames

Lohanathen, SP

Avoiding duplications in person detection across video frames

Files

TH5961-1.pdf (793.3 KB)

TH5961-2.pdf (87.54 KB)

TH5961.pdf (3.47 MB)

Date

2025

Authors

Lohanathen, SP

Abstract

Person re-identification (Re-ID) is a cornerstone of modern video surveillance and smart-city applications, demanding the reliable matching of pedestrian images across non-overlapping cameras despite variations in pose, lighting, background clutter, and occlusion. Here, a person re-identification (Re-ID) system built around a ResNet-50 backbone augmented with multi-level attention and part-aware Transformer encoding is presented. The network begins by extracting deep feature maps from pedestrian images, which are then refined through a channel-wise squeeze-and-excitation block and a spatial attention module: together, these attentional layers suppress background clutter and highlight discriminative cues—such as clothing textures and carried objects— by adaptively weighting feature dimensions and spatial locations. To capture structural dependencies across body regions, the attention-refined feature map is partitioned into horizontal strips corresponding to semantic parts (head, torso, legs), each of which is fed into a lightweight Transformer encoder that dynamically models inter-part relationships, enabling robust identification under pose variation and partial occlusion. Training is stabilized and accelerated via mixed-precision optimization with automatic gradient scaling and gradient clipping, alongside a label-smoothed cross-entropy loss that mitigates overconfidence. A two-stage learning-rate schedule—a brief linear warm-up followed by cosine-annealing decay—ensures rapid initial convergence without catastrophic divergence. At inference, global descriptors are efficiently extracted and pairwise distances computed to evaluate mean average precision (mAP) and Rank- 1 accuracy on the Market-1501 benchmark. Empirical results demonstrate that this architecture achieves competitive retrieval performance—regularly exceeding 0.74 mAP and 0.90 Rank-1 accuracy while maintaining computational efficiency and ease of extension. All data-processing pipelines, training scripts, and evaluation code are fully open-source, providing a reproducible framework for future advances in attention-driven person Re-ID.

Keywords

VIDEO SURVEILLANCE-Person Re-identification (Re-ID), SMART CITIES-Person Re-identification (Re-ID), UNIQUE PERSON COUNTING, VIDEO SURVEILLANCE-Applications, VIDEO PROCESSING, MSC (MAJOR COMPONENT RESEARCH)-Dissertation, COMPUTER SCIENCE AND ENGINEERING-Dissertation, MSc (Major Component Research)

Citation

Lohanathen, S.P. (2025). Avoiding duplications in person detection across video frames [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24583

URI

https://dl.lib.uom.lk/handle/123/24583

Collections

Master of Science By Research

Full item page

Avoiding duplications in person detection across video frames

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By