Conversation prediction for vishing attack detection: a comparative analysis of pre-trained transformer models
Loading...
Files
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Voice phishing (Vishing) is a sophisticated form of social engineering, where attackers manipulate victims through phone conversations to extract sensitive information or induce harmful actions. Traditional detection methods, such as keyword spotting or post-call classification, are often ineffective against contextually adaptive scam dialogues. To address this challenge, we propose a novel approach that frames Vishing detection as a conversation prediction task-leveraging the modeling of expected dialogue flow to detect suspicious conversation turns in real-time. In this study, we explore the use of pre-trained Transformer models—GPT-2, T5, and BART—to predict the continuation of partially observed conversations. By fine-tuning these models on a curated dataset of both legitimate and fraudulent call transcripts, we evaluate their capacity to model coherent dialogue and identify malicious deviations. Our experiments utilize standard text generation metrics alongside qualitative assessments to evaluate performance. Results show that BART outperforms the other models in generating contextually aligned continuations, suggesting its usability for early Vishing detection. We further examine how prediction quality supports detecting social engineering and propose real-time deployment strategies. Furthermore our effort is to run the application within the device to ensure safety and privacy of the user.
