Segment-driven corporate report summarization using positional-aware clustering and hybrid summarization
Loading...
Files
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Summarizing lengthy corporate reports is challenging due to their complex structure and varied formatting. Existing summarization methods often fail to preserve coherence or extract relevant content in these large-scale documents. This study introduces a novel divide-and-conquer hybrid summarization framework that combines semantically-aware text segmentation, extractive summarization using GUSUM with positional encoded embeddings, and transformer-based abstractive summarization. The proposed framework achieves strong results on benchmark datasets, 87.74 BERTScore and 54.65 ROUGE-1 on the Gov-Report dataset, and 79.7 BERTScore on the FindSum dataset, while a qualitative question-answering-based evaluation further reveals that summaries generated using our framework with LLaMA-3.1 (8B) covered 53% of answerable content compared to GPT-4-generated reference summaries, an impressive milestone considering the latter’s scale and cost.
