Track Your Experiments With ClearML

Wasilios Goutas
The Startup
Published in
9 min readJan 17, 2021

--

Do you remember the history of your ML experiments?

Would you like to be able to do so with only 2 lines of code?

In simple words, the work of a Data Scientist includes implementing, performing, analyzing, comparing, optimizing, etc. tasks handling big amount of data trying to generate models usable for classification, clustering or prediction.

Doing this generates output in form of logs, graphics, metrics, etc. of different format. The code itself is changing continuously, as the processing pipeline is improving and/or different ideas are been implemented. Every change in code results in running a new experiment generating new results, which will be better or worse than before. If you are continuously improving your process without leaving the path by experimenting with e.g. different reprocessing steps, modelling frameworks, etc. than it might be sufficient to take care that your changes are just under source control. In the other case, you might want to be able to compare experiments, being able to rerun an older experiment with changed parameter, get all output generated by any experiment in the past and more. If your experiments are implemented in Python you should proceed reading.

In this article I want you to introduce you to the ClearML framework formally known as Allegro Trains. It takes care about storing anything related to your experiments by adding just 2 lines of code.

ClearML-server

The ClearML-server is been connected when running your Python script which has been enhanced by just 2 lines of code. This server can be the service provided by the Allegro team, an local (Docker image) or remote (Kubernetes) installation. In this article I will show you the first where upcoming articles are planned to show the later two options. An additional article is planned to show you the Kubernets deployment of the ClearML-server.

Use the free service

To enable the client you are going to install and configure later you need to register an account on an ClearML-server. Do so on https://app.community.clear.ml/ and open your profile to generate an access key for the ClearML-client to enable it to publish your experiment to this server.

my profile on the ClearML-server

Create new credentials and copy them to clipboard as you will need them on your local machine in the configuration of the ClearML-client step.

generated credentials needed by the ClearML-client

Install the Clear-ML client

On Python 2 I faced issues running the upcoming commands, so I recommend to use Python 3.

As usual in Python you get what you need by a single command. To install the ClearML-client run

pip install clearml

and run the configuration step by

clearml-init

Past the credential you copied earlier from the ClearML-server and confirm the defaults of the next settings.

configure the ClearML-client

This writes the clearml.conf file to your home directory which enables the client to store anything related to your experiment in your workspace of the server.

making use of it

As mentioned you need to add only 2 line of code to your Python code which are the import statement and the initialization of your client instance. To demonstrate it I added just a print statement and run it as shown here

first step with ClearML

Running this script prints not only the scripts print statement, but in addition the information about the generated task id on the ClearML server and the link to the experiment. You do not need to remember this link. Just open the dashboard on the ClearML server and you will notice, that a new project has been added and the top most experiment is the last you started.

dashboard of my account on the freely available server

For any reason the experiment is marked to be in the state ‘Running’ even it is finished. Beside your own projects on your dashboard you will find an additional project call ‘ClearML Examples’ which demonstrates different feature provided by ClearML.

ClearML provided everything needed to the server to re-run the example by — in this case — copying the complete script to the server. If I would had used git as source repository the path to the repository, the commit ID and the uncommitted changes would have been stored.

Selecting an experiment to review all of its attributes.

uncommitted code is been show here

Properties of experiments

Having a history of the experiments done is nice, but ClearML support a lot more. I will show you soon how to take an experiment stored on the server and re-run it. You might ask what should it be good for to re-run the same code as done already once in the past without modifying any parameter, and you would be right. You can re-execute an experiment with change parameters, as long as you have defined them. What ClearML does out of the box is to enable you changing settings been handled in your code as command line parameter parsed by the argparse lib. Known arguments to the parser will be shown in the configurations tab and can be adapted for an re-execution. I created an example to demonstrate this by supporting the argument

--symbol

parser.add_argument('--symbol', help='symbol used for regression', default='AAPL')
list arguments adaptable when re-execute the experiment

ClearML supports also dictionaries as an additional way to define experiment parameters. Therefore you need to make it known by calling a framework function like in the following example code

parameters = task.connect_configuration(
configuration=parameters,
name='regressor selection',
description='set which regressor to run')

Such dictionary appears in the configuration section on the server as well.

dictionary parameters of the experiment

Dashboard

Your dashboard on the ClearML-server gives you an overview on the projects you are working on and shows you a list with the last experiments run. The ‘ClearML Examples’ project includes the examples you will find on the GitHub repository of ClearML.

authors dashboard

cloning & queuing

As the setup provides the user with a lot of example experiments you can directly jump to one of it and gain some experience with the possibilities of the ClearML-server. You can also get a copy of the code I’m referring to available in my GitHub repository.

From my project I open the ‘finance’ task and from the menu on the upper right side I choose ‘Clone’

cloning an experiment

An window will pop up where you can edit the name of the experiment and add an describing text. Once the clone has been created, it becomes part of the experiment list and is in the state ‘draft’ and you can edit the configuration parameter I showed you earlier. I’m going to set all entries of my configuration dictionary to ‘True’ which shall result in creation of 3 types of regressors and visualizing their prediction results.

adapted configuration parameters

Now it is time to add the experiment to a workers queue (I will come shortly to what this is) so it can be processed by an available agent checking this queue periodically. Enqueuing is available in the menu where you have called the clone function.

enqueue an experiment

You will be asked to which queue you want this task to be added. At the beginning there is only the ‘default’ queue as we haven’t created other queues, so I used this.

The state of the experiment is switching to ‘pending’ and will sty there until you add an agent to handle the ‘default’ queue.

Agent registration & queues

To run an experiment you need a CPU. The ClearML-server is not providing additional CPU resource to run your Python code of your cloned experiment, as it didn’t for the original experiment you run on your local machine.

Here comes the “Agent” into the game. The agent needs to be installed

pip install clearml-agent

and configured on the machine you want it to run. The agent needs an clearml.conf file in your home directory which contains different configuration sections as the one used by the developer.

In case you are going to use a different user account from the one you used during development, you just need to generate additional credentials on the ClearML-server and use them while running the command

clearml-agent init

You will be asked also to enter your git credentials so the agent will be able to clone the repositories it is going to run.

The clearml.conf file shall be able to be used on any machine / user you want to act as agent. As more machines you add as more parallel execution of tasks (not only cloned once) can be performed.

In case you want to run the agent from an account having already a clearml.conf file created, you will need to edit the configuration and add an agent configuration section to it as described in the Allegro.ai documentation.

Finally you need to run the agent tool and provide it parameters like the queue is shall handle, in my case the ‘default’ queue. The agent can be called also without the option to listen on a specific queue but to run an experiment from the server. The supported parameters can be found on the documentation page of Allegro.ai. This is the command I use to dequeue from the ‘default’ queue by enabling also all GPUs of my system even for this example they are not used:

clearml-agent daemon --queue default --gpus all

running agent dequeing a task

Some seconds after starting the agent the enqueued task is been processed. The repository is been fetched, all dependencies are tried to be satisfied by installing the used python packages into an virtual environment and the script is been executed.

As Plotly tries to open a browser which fails for the user I’m using to run the agent, the experiment failed.

failing dequeued experiment due to unsatisfied Plotly dependency

After resetting the failed experiment, adapting the experiment to prevent showing the plot if the task is not running locally, run and clone it, the task finally gets processed :)

cloned experiment re-executed by an agent

Summary

In this story I hopefully was able to provide you a short introduction to ClearML.

I just touched the top of the iceberg and there is a lot more to discover like comparing experiments, and more.

Thank you for reading.

My name is Wasilios Goutas, I’m Dipl. Ing. techn. Informatik (FH) which I studied in my home town Hamburg, Germany. I’m experienced software project manager in the automotive and semiconductor industries. In my private life I’m making use of AI libraries like Keras & Tensorflow to learn more about the technology which will play a significant role in tomorrows world.

Feel free to contact me on LinkedIn.

--

--

Wasilios Goutas
The Startup

Dipl. Ing. Technische Informatik, SW Project Manager, Data Scientist