Kvasir-VQA

A Text-Image Pair GI Tract Dataset

The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images

Download

You can downlod the dataset from HuggingFace: https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA

You can use the Kvasir-VQA dataset directly from HuggingFace Dataset Hub.

🔥 See Jupyter Notebook Demo. You can open the notebook on Google Colab.

from datasets import load_dataset
ds = load_dataset("SimulaMet-HOST/Kvasir-VQA")

Downloading Dataset as an Image foler and CSV Metadata

d_path ="./" #existing folder where you want to save images and metadata.csv

df = ds['raw'].select_columns(['source', 'question', 'answer', 'img_id']).to_pandas()
df.to_csv(f"{d_path}/metadata.csv", index=False)

import os
os.makedirs(f"{d_path}/images", exist_ok=True)

for i, row in df.groupby('img_id').nth(0).iterrows(): # for images
  image = ds['raw'][i]['image'].save(f"{d_path}/images/{row['img_id']}.jpg")

The total image size is around 1.5 GB. The CSV file will have 58,849 rows.

Key Features

  • Total Images: 6,500 annotated images
  • Annotations: Includes question-and-answer pairs for each image
  • Question Types: Yes/No, single-choice, multiple-choice, color-related, location-related, numerical count
  • Applications: Image captioning, VQA, synthetic medical image generation, object detection, etc

Dataset Details

Image Categories

The dataset includes images from various GI tract conditions and medical instruments used in GI procedures:

Image Category Number of Samples Source Dataset
Normal 2500 HyperKvasir
Polyps 1000 HyperKvasir
Esophagitis 1000 HyperKvasir
Ulcerative Colitis 1000 HyperKvasir
Instrument 1000 Kvasir-Instrument
TOTAL 6500

Annotation Process

Annotations were developed with input from medical professionals and include six types of questions:

  • Yes/No Questions
  • Single-Choice Questions
  • Multiple-Choice Questions
  • Color-Related Questions
  • Location-Related Questions
  • Numerical Count Questions

Annotations cover a range of GI aspects, including findings, abnormalities, anatomical landmarks, and medical instruments.

When using the Kvasir-VQA dataset, you should include the following information to ensure compliance with the dataset's usage terms, particularly when citing the dataset in documents or papers:

Terms of Use

The data is released fully open for research and educational purposes under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. The use of the dataset for purposes such as competitions and commercial purposes needs prior written permission. In all documents and papers that use or refer to the dataset or report experimental results based on the Kvasir-VQA, a reference to the related article needs to be added:

@inproceedings{gautam2024kvasirvqa,
  title={Kvasir-VQA: A Text-Image Pair GI Tract Dataset},
  author={Gautam, Sushant and Storås, Andrea and Midoglu, Cise and Hicks, Steven A. and Thambawita, Vajira and Halvorsen, Pål and Riegler, Michael A.},
  booktitle={Proceedings of the First International Workshop on Vision-Language Models for Biomedical Applications (VLM4Bio '24)},
  year={2024},
  location={Melbourne, VIC, Australia},
  pages={10 pages},
  publisher={ACM},
  doi={10.1145/3689096.3689458}
}

Contact

Please contact michael@simula.no, vajira@simula.no, steven@simula.no or paalh@simula.no for any questions regarding the dataset.