Testicular Ultraound Synthetic Dataset

This page describes the dataset associated with the Enhancing Testicular Ultrasound Image Classification through Synthetic Data and Pretraining Strategies paper presented at 23rd International Conference on Image Analysis and Processing. The dataset presented here has been synthetically generated using Denoising Diffusion Probabilistic Models (DDPMs), a state-of-the-art approach for producing high-fidelity medical images. It comprises approximately 9,300 testicular ultrasound images, each created without explicit guidance to ensure a diverse and unbiased representation of anatomical variability. The synthetic images closely mimic real clinical data in terms of visual appearance, resolution, and noise characteristics, making them suitable for a wide range of research and development purposes. The original goal of the dataset was to address the scarcity of testicular ultrasound data by providing a large set of realistic synthetic images for research and model development. Furthermore, in the work "Too Big to Fail? Not Quite: FiLM-UNet Beats Foundation Models in Cross-Domain Ultrasound Segmentation", currently under review at ISBI 2026, we further extend the dataset with segmentation masks of the testicles, for a subset of 810 synthetic images.

Features

The table below resumes the core information about the synthetic dataset used in our paper.

The synthetic images were generated using a custom implementation based on the OpenAI guided-diffusion repository. After generation, a dedicated filtering algorithm was applied to ensure the quality and diversity of the dataset. This algorithm evaluates each image using a precision metric computed on the real data distribution, allowing us to select only those synthetic images that closely match the characteristics of real testicular ultrasound scans. This process helps to minimize artifacts and outliers, resulting in a high-quality dataset that is well-suited for pretraining and research applications.

Field	Value
Organ	Testicle
# images	9289
File format	png
Image shape	[256, 256]
Image modality	Ultrasound/Mode-B
# segmentation masks	810

Generate your images!

Alternatively, you can generate your own synthetic testicular ultrasound images by following the instructions provided in our GitHub repository. The repository contains the code and guidelines to train and use a Denoising Diffusion Probabilistic Model (DDPM) for image generation. It also includes our custom filtering algorithm, which allows you to select high-quality images that closely match real clinical data. This enables researchers to create tailored datasets for their specific needs while ensuring quality and diversity. Moreover, you can find the code of our segmentation algorithm, trained on a large set of Ultrasound Images and finetuned on the annotated synthetic testicles US, in this GitHub repository, and run the inference with our checkpoints from Hugging Face.

Testicular Ultraound Synthetic Dataset

Features

Generate your images!

Download