Avoiding duplications in person detection across video frames

dc.contributor.advisorGamage, C
dc.contributor.advisorSooriyaarachchi, S
dc.contributor.authorLohanathen, SP
dc.date.accept2025
dc.date.accessioned2025-12-11T10:00:20Z
dc.date.issued2025
dc.description.abstractPerson re-identification (Re-ID) is a cornerstone of modern video surveillance and smart-city applications, demanding the reliable matching of pedestrian images across non-overlapping cameras despite variations in pose, lighting, background clutter, and occlusion. Here, a person re-identification (Re-ID) system built around a ResNet-50 backbone augmented with multi-level attention and part-aware Transformer encoding is presented. The network begins by extracting deep feature maps from pedestrian images, which are then refined through a channel-wise squeeze-and-excitation block and a spatial attention module: together, these attentional layers suppress background clutter and highlight discriminative cues—such as clothing textures and carried objects— by adaptively weighting feature dimensions and spatial locations. To capture structural dependencies across body regions, the attention-refined feature map is partitioned into horizontal strips corresponding to semantic parts (head, torso, legs), each of which is fed into a lightweight Transformer encoder that dynamically models inter-part relationships, enabling robust identification under pose variation and partial occlusion. Training is stabilized and accelerated via mixed-precision optimization with automatic gradient scaling and gradient clipping, alongside a label-smoothed cross-entropy loss that mitigates overconfidence. A two-stage learning-rate schedule—a brief linear warm-up followed by cosine-annealing decay—ensures rapid initial convergence without catastrophic divergence. At inference, global descriptors are efficiently extracted and pairwise distances computed to evaluate mean average precision (mAP) and Rank- 1 accuracy on the Market-1501 benchmark. Empirical results demonstrate that this architecture achieves competitive retrieval performance—regularly exceeding 0.74 mAP and 0.90 Rank-1 accuracy while maintaining computational efficiency and ease of extension. All data-processing pipelines, training scripts, and evaluation code are fully open-source, providing a reproducible framework for future advances in attention-driven person Re-ID.
dc.identifier.accnoTH5961
dc.identifier.citationLohanathen, S.P. (2025). Avoiding duplications in person detection across video frames [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24583
dc.identifier.degreeMSc (Major Component Research)
dc.identifier.departmentDepartment of Computer Science & Engineering
dc.identifier.facultyEngineering
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24583
dc.language.isoen
dc.subjectVIDEO SURVEILLANCE-Person Re-identification (Re-ID)
dc.subjectSMART CITIES-Person Re-identification (Re-ID)
dc.subjectUNIQUE PERSON COUNTING
dc.subjectVIDEO SURVEILLANCE-Applications
dc.subjectVIDEO PROCESSING
dc.subjectMSC (MAJOR COMPONENT RESEARCH)-Dissertation
dc.subjectCOMPUTER SCIENCE AND ENGINEERING-Dissertation
dc.subjectMSc (Major Component Research)
dc.titleAvoiding duplications in person detection across video frames
dc.typeThesis-Full-text

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5961-1.pdf
Size:
793.3 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5961-2.pdf
Size:
87.54 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5961.pdf
Size:
3.47 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: