
A densely-sampled 7T fMRI dataset spanning four image distributions (LAION-natural, MSCOCO, THINGS, and out-of-distribution images), designed to broadly cover the image space and enable robust replication and generalization of visual neuroscience findings.
Dataset Contents
Data modalities
All data follows BIDS format, with raw volumes in subject directories and processed outputs in derivatives/. The primary starting point for most analyses is the GLMsingle beta estimates.
GLMsingle single-trial BOLD responses: (n_trials × n_voxels) per session
Per-session and cross-session estimates of maximum explainable variance
High-resolution structural MRI for cortical surface reconstruction via FreeSurfer
DTI data in sub-XX/dwi/ for white matter characterization
Phase-encoded retinotopy for delineating early visual areas
Category-selective region localizers (faces, scenes, bodies, objects)
Pre-computed region-of-interest masks ready for analysis
~25,000 images presented during scanning (research-only license)
How to Access
Installation
pip install laion_fmriInstalls the laion_fmri Python package. The S3 bucket is publicly accessible, no AWS credentials required.
Discover
from laion_fmri.config import dataset_initialize
from laion_fmri.discovery import describe, get_subjects
dataset_initialize(DATASET_PATH) # one-time setup
print(get_subjects()) # ['sub-01', 'sub-03', ...]
describe() # human-readable bucket summaryCall dataset_initialize once to register your local data directory. Use get_subjects() to list participants and describe() for a summary of bucket contents.
Download
from laion_fmri.download import download
# Download one session (BIDS-aware, idempotent)
download(subject="sub-03", ses="ses-01", n_jobs=4)
# Download only subject-level aggregate maps
download(subject="sub-03", ses="averages")
# Download everything for all subjects
download(subject="all", n_jobs=4)Fetches data from S3 to your local directory. Downloads are idempotent, so re-running skips files already present. Use ses="averages" to retrieve only cross-session aggregate maps.
Load betas & trial info
from laion_fmri.subject import load_subject
sub = load_subject("sub-03")
betas = sub.get_betas(session="ses-01") # (n_trials, n_voxels), float32
# Filter to a region of interest
betas_ffa = sub.get_betas(session="ses-01", roi="FFA1")
# Only keep well-driven voxels
betas_nc = sub.get_betas(session="ses-01", nc_threshold=0.2)
# Get trial metadata as a DataFrame
trials = sub.get_trial_info(session="ses-01") # image IDs, conditions, ...Returns single-trial beta estimates as a float32 array of shape (n_trials x n_voxels). Restrict to a brain region with roi=, filter for reliably driven voxels with nc_threshold=, or retrieve trial metadata as a DataFrame with get_trial_info().
Train / test splits
from laion_fmri.splits import get_split_masks
import numpy as np
sub = load_subject("sub-03")
betas = sub.get_betas(session="ses-01")
trials = sub.get_trial_info(session="ses-01")
# Random split — one of five seeded 80/20 baselines (random_0 … random_4)
train_mask, test_mask = get_split_masks(trials, "random_0", pool="sub-03")
# Within-distribution split (tau) — balanced by image-space coverage
train_mask, test_mask = get_split_masks(trials, "tau", pool="sub-03")
# OOD cluster split — 5-fold cross-validation across semantic clusters
for k in range(5):
train_mask, test_mask = get_split_masks(trials, f"cluster_k5_{k}", pool="sub-03")
# OOD images — train on shared pool, test on held-out OOD images
train_mask, test_mask = get_split_masks(trials, "ood", pool="sub-03")
# Apply any mask to betas
X_train, X_test = betas[train_mask], betas[test_mask]Each split corresponds to a generalization method: tau for Method 1 (within-distribution), cluster_k5_{k} for Method 2 (OOD clusters), and ood for Method 3 (OOD images). The random_* splits are simple baselines for replication analyses.
Noise ceiling & inspection
# Noise ceiling: max explainable variance per voxel (0-100)
nc = sub.get_noise_ceiling(session="ses-01") # (n_voxels,)
# Inspect available sessions and ROIs
print(sub.get_sessions())
print(sub.get_available_rois())
print(f"Brain-mask voxels: {sub.get_n_voxels()}")Noise ceiling scores express the theoretical maximum variance explainable by any model, per voxel. Also exposes methods to list available sessions and ROI masks.
CLI alternative
laion-fmri config --data-dir ./laion_fmri_data
laion-fmri info
laion-fmri download --subject sub-03Licence
Documentation
Full technical docs
Complete API reference, data format specifications, preprocessing pipeline details, example notebooks, and the interactive brain viewer are available at the official documentation site.
Open documentation