Sample Training Format v1#

Note

This is the documentation of our on-disk storage format for per-sample training datasets. If you are looking to export Noisebase-compatible data from your renderer or write a Noisebase-compatible data loader for your framework, you’re in the right place.

If you want to use a dataset, our corresponding data loader manual should be more helpful. If you are looking for datasets using the format, check out our datasets page.

We store each training sequence in a separate Zarr ZipStore. We found that Zarr provides an excellent balance between compression and read speed. Each file contains several arrays, which we describe below. The first dimension of every array is the frame counter F, and the last three are the height H, width W, and sample count S of the sequence wherever applicable.

We use level 9 LZ4HC compression for each array in FCHWS format, chunked for every frame and every 4 samples. We found that encoding in HWC format would increase file sizes by about 30%.

Supported loaders#

Pytorch: noisebase.loaders.torch.TrainingSampleLoader_v1
Pytorch Lightning: noisebase.loaders.lightning.TrainingSampleLoader_v1

Default format definition is in conf/noisebase/format/training_sample_v1.yaml.

A YAML file describing a dataset using the format needs to give a name and the following source parameters:

name: "Sample Set v1 - Training Dataset"
src:
  sequences: 1024
  files: sampleset_training_v1/scene{index:04d}.zip
  frames_per_sequence: 64
  crop: 256
  samples: 32
  rendering_height: 1080
  rendering_width: 1920

Check conf/noisebase/sampleset_v1.yaml for a complete example.

Radiance#

Key	Description	Dimensions	DType
`color`	RGBE encoded sample radiance	`[F, 4, H, W, S]`	`uint8`
`exposure`	Minimum and maximum exposure per frame for RGBE decoding	`[F, 2]`	`float32`
`reference`	Clean radiance	`[F, 3, H, W]`	`float32`

You can find more information in the technical description of our RGBE compression.

Sample geometry#

We store world-space positions, motion, normals, and diffuse colours for every sample. Why store world-space data when most denoisers operate in screen-space? As most world-space motion is zero, we get much better compression this way. We can still calculate all kinds of screen-space data using the camera data described below.

Key	Description	Dimensions	DType
`position`	Sample position in world-space	`[F, 3, H, W, S]`	`float32`
`motion`	Change of world-space sample position from last frame	`[F, 3, H, W, S]`	`float32`
`normal`	Sample normal in world-space	`[F, 3, H, W, S]`	`float16`
`diffuse`	Diffuse colour of the sample’s material	`[F, 3, H, W, S]`	`float16`

Camera data#

We store camera data to encode the projection between world-space sample positions and screen-space pixel coordinates. Generally you can ignore these and use our helper functions to convert between coordinates, compute motion, depth etc. as we describe in our getting started guide.

Key	Description	Dimensions	DType
`camera_position`	Position of the camera in world-space	`[F, 3]`	`float32`
`camera_target`	A point in world-space in the center of the image (where the camera is looking)	`[F, 3]`	`float32`
`camera_up`	Vector in world-space that points straight upwards in screen-space (what’s upwards for the camera)	`[F, 3]`	`float32`
`view_proj_mat`	Matrix mapping from world-space to screen-space	`[F, 4, 4]`	`float32`
`proj_mat`	Matrix mapping from camera-space to screen-space	`[F, 4, 4]`	`float32`
`crop_offset`	Offset of the image crop from `(0,0)` in pixel coordinates	`[F, 2]`	`int32`

You can find more information about crop_offset in the technical description of temporal cropping.