Toolbox Infrastructure
In this document you can learn everything about how to run experiments with the mle-toolbox. The mle-toolbox allows you to run different types of experiments locally or on an SGE cluster. You have to provide three inputs:
- An experiment/meta configuration
.yamlfile. - A job configuration
.jsonfile. - A python
.pyscript that runs your training loop.
The only things you have to do is specify your desired experiment. The toolbox automatically detects whether you start an experiment with access to multiple compute nodes.
train.pytakes three arguments:-config,-seed,-exp_dir- This includes the standard inputs to the training function (
model_config,train_config,log_config) but can be otherwise generalised to your applications.
Jobs, Evals and Experiments
Throughout the toolbox we refer to different granularities of compute loads. It helps being familiar with what these refer to (from lowest to highest level of specification):
- job: A single submission process on resource (e.g. one seed for one configuration)
- eval: A single parameter configuration which can be executed/trained for multiple seeds (individual jobs!)
- experiment: An entire sequence of jobs to be executed (e.g. grid search with pre/post-processing)
Protocol DB Logging
TBC
Rich Dashboard Monitoring
TBC
Experiment Report Summarization
TBC

