Jing Li (Associate)

Email: sc232jl@leeds.ac.uk
School: School of Computing

Medical Vision-Language Models for Visual Question Answering

Supervisors: Dr Duygu Sarikaya, Dr Nishant Ravikumar

Vision-language models (VLMs) are one of the state-of-art research topics in the field of computer vision (CV), aiming to build models which can process both visual information and natural language, and learn associations between them simultaneously. Such models are highly useful in visual question answering (VQA) tasks, which involve understanding and answering questions about given images. Utilizing these models in the biomedical domain can enhance healthcare service efficiency by reducing the workload of clinicians. This research is proposed to develop a well-structured VLM for medical VQA and to validate its usability in clinical settings