Experiment Configuration
Each experiment you want to execute via mle run
has to be specified via a .yaml
file. This file has to contain a set of basic ingredients such as meta arguments (which script to execute, etc.) - meta_job_args
, the experiment type-specific arguments and resource/single job-specific arguments - single_job_args
.
Basic Configuration
The meta_job_args
specify the project name, experiment type, directory and base job script/configuration. For a hyperparameter search experiment it should look as follows:
# Meta Arguments: What job? What train .py file? Base config? Where to store?
meta_job_args:
project_name: "<project_name>" # Project name for protocol db
experiment_type: "<experiment_type>" # ['hyperparameter-search', 'multiple-configs', 'single-config', 'population-based-training']
experiment_dir: "<experiment_dir>" # Directory to store results in
base_train_fname: "<train>.py" # Base .py/.sh job execution script
base_train_config: "<train_config>.yaml" # Base .json/.yaml job configuration
The job_spec_args
, on the other hand, specifies environment settings and the resources required per individual job. E.g.:
# Parameters specific to an individual job
single_job_args:
job_name: "<jname>" # Job name prefix used for qsub/sbatch
num_logical_cores: 2 # Number of logical cpu cores
num_gpus: 1 # Number of gpus required per job
env_name: "<env_name>" # Name of virtual environment to activat at startup
use_conda_venv: true # Whether to use a conda environment
time_per_job: "<dd:hh:mm>" # Time limits of a job
Given a fixed training configuration file and the Python training script, the only thing that has to be adapted for the different types of experiments are the experiment type-specific arguments. These include:
multi_config_args
: For multi-configuration & multi-seed experimentsparam_search_args
: For hyperparameter search experimentspbt_args
: For Population-Based Training
Multiple Configurations & Seeds
multi_config_args:
config_fnames: # List of configuration files to execute
- "config_1.json"
- "config_2.json"
num_seeds: 2 # Number of seeds to schedule
Hyperparameter Search
# Parameters specific to the hyperparameter search
param_search_args:
search_logging: # Logging-specific settings
reload_log: False # Load a previous hyper log
verbose_log: True # Print intermediate batch results
max_objective: False # Whether to max or min objective
aggregate_seeds: "p50" # Aggregation metric across seeds
problem_type: "final" # How to score configs ('final', 'best', 'mean')
eval_metrics: "test_loss" # Metric to look up for scoring
search_resources: # Resource scheduling settings
# num_search_batches: 4
# num_evals_per_batch: 4
num_total_evals: 4 # Number of total evaluations
max_running_jobs: 4 # Number of max jobs running at one time
num_seeds_per_eval: 1 # Number of seeds per evaluation
random_seeds: [2] # Explicitly fix seeds to run
search_config: # Search strategy-specific settings
search_type: "grid"
# search_schedule: "sync"
search_schedule: "async" # "sync" or "async" job/batch scheduling
search_params:
real:
lrate:
begin: 0.1
end: 0.5
bins: 2
integer:
batch_size:
begin: 1
end: 5
bins: 2
Population-Based Training
Note: Population-Based Training is still experimental and will most likely change. More updates to follow. Most likely the toolbox will only support a synchronous version.
# Parameters specific to the population-based training
pbt_args:
pbt_logging:
max_objective: False
eval_metric: "test_loss"
pbt_resources:
num_population_members: 10
num_total_update_steps: 2000
num_steps_until_ready: 500
num_steps_until_eval: 100
pbt_config:
pbt_params:
real:
l_rate:
begin: 1e-5
end: 1e-2
exploration:
strategy: "perturb"
selection:
strategy: "truncation"
Pre-/Post-Processing
You can also add pre_processing_args
and post_processing_args
if you would like to either launch a job before or after the main experiment. This can be useful if you need to prepare some data that is afterwards used by all main experiment jobs or if you want to run some analysis using the networks trained in the main experiment.
# Parameters for the pre-processing job
pre_processing_args:
processing_fname: "<run_preprocessing>.py"
processing_job_args:
num_logical_cores: 2
time_per_job: "00:01:00"
extra_cmd_line_input:
figures_dir: "experiments/data"
# Parameters for the post-processing job
post_processing_args:
processing_fname: "<run_postprocessing>.py"
processing_job_args:
num_logical_cores: 2
time_per_job: "00:01:00"
extra_cmd_line_input:
figures_dir: "experiments/figures"