Segment-driven corporate report summarization using positional-aware clustering and hybrid summarization
| dc.contributor.author | Weerasinghe, CDRM | |
| dc.contributor.author | Gunasinghe, MRAAK | |
| dc.contributor.author | Siriwardana, HBKS | |
| dc.contributor.author | Perera, I | |
| dc.date.accessioned | 2026-01-19T06:35:58Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Summarizing lengthy corporate reports is challenging due to their complex structure and varied formatting. Existing summarization methods often fail to preserve coherence or extract relevant content in these large-scale documents. This study introduces a novel divide-and-conquer hybrid summarization framework that combines semantically-aware text segmentation, extractive summarization using GUSUM with positional encoded embeddings, and transformer-based abstractive summarization. The proposed framework achieves strong results on benchmark datasets, 87.74 BERTScore and 54.65 ROUGE-1 on the Gov-Report dataset, and 79.7 BERTScore on the FindSum dataset, while a qualitative question-answering-based evaluation further reveals that summaries generated using our framework with LLaMA-3.1 (8B) covered 53% of answerable content compared to GPT-4-generated reference summaries, an impressive milestone considering the latter’s scale and cost. | |
| dc.identifier.conference | Moratuwa Engineering Research Conference 2025 | |
| dc.identifier.department | Engineering Research Unit, University of Moratuwa | |
| dc.identifier.email | ravindi.20@cse.mrt.ac.lk | |
| dc.identifier.email | amilag.20@cse.mrt.ac.lk | |
| dc.identifier.email | sandaruth.20@cse.mrt.ac.lk | |
| dc.identifier.email | indika@cse.mrt.ac.lk | |
| dc.identifier.faculty | Engineering | |
| dc.identifier.isbn | 979-8-3315-6724-8 | |
| dc.identifier.pgnos | pp. 203-208 | |
| dc.identifier.proceeding | Proceedings of Moratuwa Engineering Research Conference 2025 | |
| dc.identifier.uri | https://dl.lib.uom.lk/handle/123/24743 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.subject | Summarization | |
| dc.subject | Divide and Conquer | |
| dc.subject | Hybrid | |
| dc.subject | Transformers | |
| dc.title | Segment-driven corporate report summarization using positional-aware clustering and hybrid summarization | |
| dc.type | Conference-Full-text |
