User manual#
Loading datasets#
Simple approach#
noisebase.Noisebase
provides a convenient function for instantiating datasets:
Keep in mind this still uses Hydra under the hood but without any side effects. If you are already using Hydra globally (e.g. with hydra.main
), this won’t work. Instead, you can use the Noisebase Hydra configs directly, as shown below.
Data loader configuration and use#
Editable installs#
Noisebase datasets are large. You might want to keep them organized in a dedicated folder, separate from your denoiser implementations. A separate, editable install will also make adding your own datasets and managing configurations easier. You can simply clone this repository to replicate this:
git clone https://github.com/balintio/noisebase
Say you want to reproduce the denoising results of NPPD; your setup will look somewhat like this:
/ # Root directory of your denoising projects
noisebase/
... # Contents of this repo
data/ # Terabytes of downloaded training and testing data
nppd/
... # Contents of balintio/nppd
env/ # Conda environment for NPPD
outputs/ # Checkpoints and inference outputs for NPPD
conf/config.yaml # The Hydra configuration file shown below
After setting up the environments for your denoisers you can always add Noisebase as an editable package:
(nppd/env) ~/denoisers/nppd$ pip install -e ../noisebase
This should work in most environments. Just make sure to install the requirements of the denoiser BEFORE the Nosiebase package. (You might end up with the wrong version of Pytorch otherwise.)
Hydra#
Hydra configs for all datasets are available under noisebase/dataset_name
. Noisebase automatically adds them to your Hydra path, you only need to import noisebase
in the file where your hydra.main
is.
import noisebase
import hydra
@hydra.main()
def main(cfg):
loader = hydra.utils.instantiate(cfg['training_data'])
...
For example, NPPD configures its datasets like this:
defaults:
# Import the test and training datasets:
# I like to do this with default lists
# overriding the locations to something sensible
- noisebase/sampleset_v1@training_data
- noisebase/sampleset_test8_v1@test_data
# I prefer to disable Hydra logging
# If you like it, remove these
- override hydra/job_logging: stdout
- override hydra/hydra_logging: none
# Set composition order
- _self_
# Data loader configuration for NPPD
training_data:
samples: 8
batch_size: 8
buffers:
- normal
- motion
- depth
- diffuse
- color
- reference
framework: lightning
# Test data loaders usually use very similar arguments
# so you can just interpolate most things
test_data:
buffers: ${training_data.buffers}
samples: ${training_data.samples}
# For instantiating the Lightning NPPD model
model:
_target_: model.Model
# Disable Hydra working directory shenanigans
# Like logging, this is personal preference
hydra:
run:
dir: ''
output_subdir: null
Scripts#
nb-download#
Downloads a dataset.
# Example:
nb-download sampleset_v1
# Usage:
nb-download --data_path DATA_PATH dataset
The default data path (where the dataset is downloaded) is either a data directory within the current working directory if Noisebase was installed from PyPI or a data folder in the repository’s root if Noisebase was cloned from Git. This should work seamlessly in most cases.
If you decide to use a custom data path, you must pass it as the data_path
parameter to data loaders or the --data_path
argument to scripts.
Note
Downloading is deliberately limited to a single connection. Our data formats don’t require small files, so this does not waste bandwidth. If you get faster downloads by modifying the script, that will come at others’ expense. Please be patient and mindful of others.
nb-save-reference#
Exports reference images from test datasets for metrics computation. Our data formats don’t encode images as PNGs, so these need to be unpacked.
# Example:
nb-save-reference sampleset_test8_v1
# Usage:
nb-save-reference --data_path DATA_PATH dataset
nb-compute-metrics#
Computes metrics of the output of a method on a test dataset. You need to run inference and nb-save-reference
beforehand.
# Example:
nb-compute-metrics sampleset_test8_v1 outputs/large_8_spp
# Usage:
nb-save-reference --data_path DATA_PATH dataset outputs
nb-result-table#
Produces tables from metrics computed by nb-compute-metrics
.
# Example:
nb-result-table sampleset_test8_v1 outputs/large_8_spp,outputs/small_4_spp psnr,ssim,msssim,fvvdp,flip --sep=" | "
# Usage:
nb-result-table --digits DIGITS --sep SEP --data_path DATA_PATH dataset outputs metrics
Output folders and metrics can be comma-separated lists. Formatting can be controlled by setting the number of floating point digits and column separator.