LAION-fMRI Dataset

A densely-sampled 7T fMRI dataset spanning four image distributions (LAION-natural, MSCOCO, THINGS, and out-of-distribution images), designed to broadly cover the image space and enable robust replication and generalization of visual neuroscience findings.

What's included

Dataset Contents

Data modalities

All data follows BIDS format, with raw volumes in subject directories and processed outputs in derivatives/. The primary starting point for most analyses is the GLMsingle beta estimates.

fMRI beta estimates

GLMsingle single-trial BOLD responses: (n_trials × n_voxels) per session

Noise ceiling maps

Per-session and cross-session estimates of maximum explainable variance

Anatomical (T1w)

High-resolution structural MRI for cortical surface reconstruction via FreeSurfer

Diffusion MRI

DTI data in sub-XX/dwi/ for white matter characterization

Retinotopic mapping

Phase-encoded retinotopy for delineating early visual areas

Functional localizers

Category-selective region localizers (faces, scenes, bodies, objects)

ROI masks

Pre-computed region-of-interest masks ready for analysis

Stimulus images

~25,000 images presented during scanning (research-only license)

Python package

How to Access

Installation

bash

pip install laion_fmri

Installs the laion_fmri Python package. The S3 bucket is publicly accessible, no AWS credentials required.

Discover

python

from laion_fmri.config import dataset_initialize
from laion_fmri.discovery import describe, get_subjects

dataset_initialize(DATASET_PATH)  # one-time setup
print(get_subjects())                     # ['sub-01', 'sub-03', ...]
describe()                                # human-readable bucket summary

Call dataset_initialize once to register your local data directory. Use get_subjects() to list participants and describe() for a summary of bucket contents.

Download

python

from laion_fmri.download import download

# Download one session (BIDS-aware, idempotent)
download(subject="sub-03", ses="ses-01", n_jobs=4)

# Download only subject-level aggregate maps
download(subject="sub-03", ses="averages")

# Download everything for all subjects
download(subject="all", n_jobs=4)

Fetches data from S3 to your local directory. Downloads are idempotent, so re-running skips files already present. Use ses="averages" to retrieve only cross-session aggregate maps.

Load betas & trial info

python

from laion_fmri.subject import load_subject

sub   = load_subject("sub-03")
betas = sub.get_betas(session="ses-01")           # (n_trials, n_voxels), float32

# Filter to a region of interest
betas_ffa = sub.get_betas(session="ses-01", roi="FFA1")

# Only keep well-driven voxels
betas_nc  = sub.get_betas(session="ses-01", nc_threshold=0.2)

# Get trial metadata as a DataFrame
trials = sub.get_trial_info(session="ses-01")     # image IDs, conditions, ...

Returns single-trial beta estimates as a float32 array of shape (n_trials x n_voxels). Restrict to a brain region with roi=, filter for reliably driven voxels with nc_threshold=, or retrieve trial metadata as a DataFrame with get_trial_info().

Train / test splits

python

from laion_fmri.splits import get_split_masks
import numpy as np

sub    = load_subject("sub-03")
betas  = sub.get_betas(session="ses-01")
trials = sub.get_trial_info(session="ses-01")

# Random split — one of five seeded 80/20 baselines (random_0 … random_4)
train_mask, test_mask = get_split_masks(trials, "random_0", pool="sub-03")

# Within-distribution split (tau) — balanced by image-space coverage
train_mask, test_mask = get_split_masks(trials, "tau", pool="sub-03")

# OOD cluster split — 5-fold cross-validation across semantic clusters
for k in range(5):
    train_mask, test_mask = get_split_masks(trials, f"cluster_k5_{k}", pool="sub-03")

# OOD images — train on shared pool, test on held-out OOD images
train_mask, test_mask = get_split_masks(trials, "ood", pool="sub-03")

# Apply any mask to betas
X_train, X_test = betas[train_mask], betas[test_mask]

Each split corresponds to a generalization method: tau for Method 1 (within-distribution), cluster_k5_{k} for Method 2 (OOD clusters), and ood for Method 3 (OOD images). The random_* splits are simple baselines for replication analyses.

Noise ceiling & inspection

python

# Noise ceiling: max explainable variance per voxel (0-100)
nc = sub.get_noise_ceiling(session="ses-01")      # (n_voxels,)

# Inspect available sessions and ROIs
print(sub.get_sessions())
print(sub.get_available_rois())
print(f"Brain-mask voxels: {sub.get_n_voxels()}")

Noise ceiling scores express the theoretical maximum variance explainable by any model, per voxel. Also exposes methods to list available sessions and ROI masks.

CLI alternative

bash

laion-fmri config --data-dir ./laion_fmri_data
laion-fmri info
laion-fmri download --subject sub-03

Licence

Neuroimaging dataCC0 1.0

Stimulus imagesResearch only

Documentation

Full technical docs

Complete API reference, data format specifications, preprocessing pipeline details, example notebooks, and the interactive brain viewer are available at the official documentation site.

Open documentation