Advancing retrieval-augmented generation for financial question answering

Jasmin, AA; Perera, I; Mohamed, M; Mushraf, M

Advancing retrieval-augmented generation for financial question answering

Files

1571154315.pdf (2.03 MB)

Date

2025

Authors

Publisher

IEEE

Abstract

Retrieval Augmented Generation (RAG) systems show promise for financial question answering, yet high accuracy on benchmarks such as FinanceBench (19% baseline, 32% updated) remains challenging [1] [8]. This paper presents a systematic, multistage approach to significantly improve the performance of the RAG pipeline for financial QA.We first established a robust curated baseline using Gemini-2.0, Docling parser, Google’s text-embedding-004, and a vector database, achieving an initial accuracy of 43%. Subsequent architectural and component-wise optimizations were then iteratively implemented. Firstly, a metadata filtering strategy, which utilizes a fine-tuned NER model to extract company names and years from queries, improved accuracy to 72%, demonstrating that targeted retrieval can simulate the benefits of a single-store per-filing approach [1]. Secondly, a hybrid chucking technique, which preserves the structure of the document and utilizes tokenization sensitive refinements, further increased the accuracy to 80%. Third, the implementation of a Hybrid Search mechanism, combining dense and sparse retrieval methods, advanced performance to 84%. Finally, LLM-based query expansion, which transforms user queries into answer formats, yielded a final accuracy of 88%. This research demonstrates that a carefully designed RAG pipeline, incorporating intelligent metadata filtering, layoutaware chunking, advanced similarity search, and query semantics enhancement, substantially improves financial QA, significantly outperforming existing baselines.

Keywords

Financial insight engine, transformer-based models, Retrieval-Augmented Generation, annual reports, regulatory filings, hybrid chunking, metadata filtering, query reformulation, real-time analysis, financial decision-making

URI

https://dl.lib.uom.lk/handle/123/24543

Collections

MERCon - 2025

Full item page

Advancing retrieval-augmented generation for financial question answering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By