EVKit is a toolkit for storing, loading, and converting event-camera data into
the tensor representations used to train deep models. It pairs a compact
on-disk format (PODCAST) with CPU C++ kernels that turn raw events into voxel
grids, count frames, and time surfaces, served straight from a PyTorch
DataLoader.
- Ready-to-use datasets.
DSECDataset,M3EDDataset, andMVSECDatasetstream PODCAST files from the Hugging Face Hub and download on first access. - Three representations. Voxel grid, count frame, and time surface, each backed by a single-pass C++ kernel and a configurable spec.
- Flexible binning. Four voxel-grid temporal conventions
(
centered/left/right/spanned) covering the common variants in the literature. - Drops into PyTorch. Datasets are
torch.utils.data.Datasets; collate functions assemble batches on CPU workers with no per-sample Python overhead.
EVKit builds a small set of C++ extensions at install time, so a C++ compiler is required. The recommended workflow uses pixi:
pixi run install-dev # builds the C++ kernels and installs evkit (editable)
pixi run -e dev test # run the test suiteOr install into an existing environment with a working compiler:
pip install .from torch.utils.data import DataLoader
from evkit import DSECDataset, VoxelGridSpec, VoxelGridCollate
# Describe the representation you want.
spec = VoxelGridSpec(n_bins=5, separate_polarity=True, bin_weighting="centered")
# Sequences download from Hugging Face on first access and cache in cache_dir.
ds = DSECDataset(
spec=spec,
cache_dir="~/.cache/evkit/dsec",
split="train",
camera="left",
snippet_ms=500,
)
collate = VoxelGridCollate.from_dataset(ds.datasets[0])
loader = DataLoader(ds, batch_size=8, num_workers=4, collate_fn=collate)
for voxels in loader: # (B, n_bins, 2, H, W) with separate_polarity=True
...A runnable version with throughput reporting is in
examples/load_dsec.py.
Each representation is selected by passing the matching spec to the dataset and
the matching collate function (or collate_for(spec, ...)) to the DataLoader.
| Spec | Output | Notes |
|---|---|---|
VoxelGridSpec |
(n_bins, H, W) signed, or (n_bins, 2, H, W) split |
bin_weighting: centered (default), left, right, spanned |
CountFrameSpec |
(2, H, W) / (H, W) |
polarity: sep, sum, diff; counts saturate to dtype |
TimeSurfaceSpec |
(H, W) or (2, H, W) |
exponential decay with time constant tau_us |
Snippets are sampled from a recording by duration (snippet_ms) with an
optional stride_ms for sliding windows.
| Class | Source |
|---|---|
DSECDataset |
mavlab-tudelft/dsec_podcast (Hugging Face) |
M3EDDataset |
mavlab-tudelft/m3ed_podcast |
MVSECDataset |
mavlab-tudelft/mvsec_podcast |
All three share the same constructor shape: a representation spec, a
cache_dir, an official split (or an explicit sequences list), a camera,
and snippet windowing options.
If you use EVKit, please cite the accompanying paper:
@inproceedings{wu2026evkit,
title = {EVKit: An Open-source Flexible Toolkit for Efficient Event Camera
Data Storage and Loading},
author = {Wu, Yilun and de Croon, Guido C. H. E.},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2026},
}MIT — see LICENSE.