Adversarial learning to improve question image embedding in medical visual question answering

Thumbnail Image

Date

2022-07

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Visual Question Answering (VQA) is a computer vision task in which a system produces an accurate answer to a given image and a question that is relevant to the image. Medical VQA can be considered as a subfield of general VQA, which focuses on images and questions in the medical domain. The VQA model’s most crucial task is to learn the question-image joint representation to reflect the information related to the correct answer. Medical VQA remains a difficult task due to the ineffectiveness of question-image embeddings, despite recent research on general VQA models finding significant progress. To address this problem, we propose a new method for training VQA models that utilizes adversarial learning to improve the question-image embedding and illustrate how this embedding can be used as the ideal embedding for answer inference. For adversarial learning, we use two embedding generators (question–image embedding and a question-answer embedding generator) and a discriminator to differentiate the two embeddings. The questionanswer embedding is used as the ideal embedding and the question-image embedding is improved in reference to that. The experiment results indicate that pre-training the question-image embedding generation module using adversarial learning improves overall performance, implying the effectiveness of the proposed method.

Description

Keywords

Medical visual question answering, Adversarial learning

Citation

K. Silva, T. Maheepala, K. Tharaka and T. D. Ambegoda, "Adversarial Learning to Improve Question Image Embedding in Medical Visual Question Answering," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906168.

Collections