This page describes the dataset associated with the Enhancing Testicular Ultrasound Image Classification through Synthetic Data and Pretraining Strategies paper presented at 23rd International Conference on Image Analysis and Processing. The dataset presented here has been synthetically generated using Denoising Diffusion Probabilistic Models (DDPMs), a state-of-the-art approach for producing high-fidelity medical images. It comprises approximately 9,300 testicular ultrasound images, each created without explicit guidance to ensure a diverse and unbiased representation of anatomical variability. The synthetic images closely mimic real clinical data in terms of visual appearance, resolution, and noise characteristics, making them suitable for a wide range of research and development purposes. This dataset contains only images, no labels or metadata, and is intended for pretraining. It aims to address the scarcity of testicular ultrasound data by providing a large set of realistic synthetic images for research and model development.
The table below resumes the core information about the synthetic dataset
used in our paper.
The synthetic images were generated using a custom implementation based on the OpenAI guided-diffusion repository. After generation, a dedicated filtering algorithm was applied to ensure the quality and diversity of the dataset. This algorithm evaluates each image using a precision metric computed on the real data distribution, allowing us to select only those synthetic images that closely match the characteristics of real testicular ultrasound scans. This process helps to minimize artifacts and outliers, resulting in a high-quality dataset that is well-suited for pretraining and research applications.
Field | Value |
---|---|
Organ | Testicle |
# images | 9289 |
File format | png |
Image shape | [256, 256] |
Image modality | Ultrasound/Mode-B |
Alternatively, you can generate your own synthetic testicular ultrasound images by following the instructions provided in our GitHub repository. The repository contains the code and guidelines to train and use a Denoising Diffusion Probabilistic Model (DDPM) for image generation. It also includes our custom filtering algorithm, which allows you to select high-quality images that closely match real clinical data. This enables researchers to create tailored datasets for their specific needs while ensuring quality and diversity.