Welcome to the MLE-Infrastructure 🔬

The MLE-Infrastructure provides a reproducible workflow for distributed Machine Learning experimentation (MLE) with minimal engineering overhead. The core consists of 5 packages:

  • mle-logging: Experiment logging with easy multi-seed and configuration aggregation.
  • mle-hyperopt: Hyperparameter Optimization with config export, refinement & reloading.
  • mle-monitor: Monitor cluster/cloud VM resource utilization & protocol experiments.
  • mle-scheduler: Schedule & monitor jobs on Slurm, GridEngine clusters & GCP VMs.
  • mle-toolbox: Glues everything together to manage & post-process experiments.

Note I: A template repository of an infrastructure-based project can be found in the mle-project. You can inspect your experiment stack in an interactive web UI: mle-laboratory.

Note II: mle-logging, mle-hyperopt, mle-monitor and mle-scheduler are standalone packages and can be used independently of the utilities provided by the mle-toolbox.