Jing Li (Associate)
- sc232jl@leeds.ac.uk
- School
- School of Computing
Medical Vision-Language Models for Visual Question Answering
Supervisors: Dr Duygu Sarikaya, Dr Nishant Ravikumar
Vision-language models (VLMs) are one of the state-of-art research topics in the field of computer vision (CV), aiming to build models which can process both visual information and natural language, and learn associations between them simultaneously. Such models are highly useful in visual question answering (VQA) tasks, which involve understanding and answering questions about given images. Utilizing these models in the biomedical domain can enhance healthcare service efficiency by reducing the workload of clinicians. This research is proposed to develop a well-structured VLM for medical VQA and to validate its usability in clinical settings