A Step-by-Step comprehensive Guide to Running Katib for Hyperparameter Optimization Using Docker and Kubernetes 2024

Machine learning (ML) workflows often involve complex models and the need to tune numerous hyperparameters to optimize performance. Katib, an open-source hyperparameter optimization tool, simplifies this process by automating the search for the best hyperparameters. With Docker and Kubernetes, deploying Katib for distributed hyperparameter optimization (HPO) becomes easier and more efficient.

In this blog, we’ll walk you through the steps for installing and running Katib locally using Docker and Kubernetes, making it easier for you to get started with hyperparameter optimization in your machine learning workflows.

What is Katib?

Katib is an open-source automated machine learning (AutoML) tool designed to optimize machine learning models by searching for the best hyperparameters. It integrates with Kubernetes and Kubeflow, allowing for scalable and efficient hyperparameter tuning, even in complex ML pipelines.

Katib can optimize hyperparameters using different algorithms like Random Search, Grid Search, and Bayesian Optimization, making it a versatile tool for various ML tasks.

Prerequisites for Installing Katib Locally

Before installing and running Katib on Docker and Kubernetes, make sure your system meets the following prerequisites:

Install Python (>= 3.7): Katib requires Python to install the Python SDK for running experiments.
Install Docker Desktop: Docker is required to run containerized applications. You can download Docker Desktop from here.
Install Minikube: Minikube is a tool for running Kubernetes clusters locally. It is essential for running Katib on Kubernetes. Follow the installation guide for Minikube here.
Kubernetes (>= 1.27): Install a compatible version of Kubernetes to ensure that Katib runs smoothly.
Install kubectl: kubectl is the command-line tool for interacting with Kubernetes clusters. You can install it by following the instructions here.

Step 1: Install Katib Control Plane

The first step is to install the Katib control plane, which will allow you to run hyperparameter optimization experiments. You can install the stable release or the latest changes from the GitHub repository.

To install the stable version of Katib:

bashCopykubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.16.0"

Alternatively, you can install the latest version:

bashCopykubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

Once installed, verify that all the Katib control plane components are running:

bashCopykubectl get pods -n kubeflow

You should see the following components running:

sqlCopyNAME                                READY   STATUS      RESTARTS   AGE
katib-controller-566595bdd8-8w7sx   1/1     Running     0          82s
katib-db-manager-57cd769cdb-vt7zs   1/1     Running     0          82s
katib-mysql-7894994f88-djp7m        1/1     Running     0          81s
katib-ui-5767cfccdc-v9fcs           1/1     Running     0          80s

Step 2: Install the Katib Python SDK

To interact with Katib and create experiments using Python, you need to install the Katib Python SDK. Install it via pip:

bashCopypip install -U kubeflow-katib

Alternatively, you can install the SDK using a specific GitHub commit:

bashCopypip install git+https://github.com/kubeflow/katib.git@ea46a7f2b73b2d316b6b7619f99eb440ede1909b#subdirectory=sdk/python/v1beta1

Step 3: Set Up the Katib UI Service

To easily monitor and manage your experiments, Katib provides a web-based UI. To access it, set up port-forwarding for the Katib UI service:

bashCopykubectl port-forward svc/katib-ui -n kubeflow 8080:80

Once the service is running, you can access the UI at http://localhost:8080/katib/.

Step 4: Setting Up and Running a Katib Experiment

Now that the Katib control plane and UI are set up, you can create and run a Katib experiment to optimize hyperparameters for a model. Here’s how you can run an experiment using the Random Search algorithm to tune hyperparameters for an MXNet neural network.

Download the random search example YAML file:

bashCopycurl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/hp-tuning/random.yaml --output random.yaml

Edit the YAML file to use your Kubeflow user profile namespace (e.g., kubeflow-user-example-com):

yamlCopynamespace: kubeflow-user-example-com

Deploy the experiment:

bashCopykubectl apply -f random.yaml

This experiment uses random search to generate hyperparameters like learning rate (lr), number of layers (num-layers), and optimizer type. You can monitor the experiment’s progress using the command:

bashCopykubectl -n kubeflow-user-example-com get experiment random -o yaml

Step 5: Viewing Experiment Results

Once the experiment completes, you can view the results in the Katib UI. It will display graphs showing validation and training accuracy for various combinations of hyperparameters (learning rate, number of layers, and optimizer).

Each trial within the experiment will have its own set of hyperparameters and metrics. You can view detailed metrics for each trial by clicking on the trial name in the UI.

Conclusion

Running Katib locally on Docker and Kubernetes makes it easier than ever to perform hyperparameter optimization (HPO) for machine learning models. With the powerful Katib control plane, Python SDK, and the Katib UI, you can easily set up and monitor experiments to find the best hyperparameters for your models.

By using tools like Docker and Kubernetes, you can scale Katib to run distributed experiments in the cloud, maximizing computational resources for efficient hyperparameter tuning. Whether you’re tuning deep learning models or traditional ML algorithms, Katib provides a flexible, powerful framework for automating HPO and improving model performance.