Track your experiments with ClearML — in Docker

Do you remember the history of your ML experiments?

Would you like to be able to do so with only 2 lines of code?

This I already asked in my earlier post. Now the question is

Do you want your experiments to be hosted on your own system?

Photo by Antoine Petitteville on Unsplash

In my last post I showed you how to use a freely accessible ClearML-server, provided by the maintaining company of ClearML, to host your Machine Learning projects and experiments.

Once you start using this platform to manage your ML/DL work you will notice, that it is more than just a storage of experiments enabling you to to re-execute and compare results without any effort. With ClearML you can do much more like the ‘Orchestration of Pipelines’ where your script might enqueue different tasks to be processed through queues and agents running somewhere in the cloud or locally, scaling horizontally with the availability of agents, or enabling you to use the MongoDB instance which is part of the platform, …

Sooner or later you might think it would be a good idea to self host an ClearML-server where I think the most important reasons for this will be the fact, that the free service on clear.ml might delete your experiments once the usable storage size per account has reached its limit.

Also, if you are not working on Open Source ML/DL projects, you might feel safer to have your intellectual property been hosted by systems controlled by you or your company.

This article is going to show you, how to set-up the ClearML on an Linux box.

Docker

As I’m going to show how to run the ClearML-server in Docker you have to install docker and docker-compose first.

I expect that you have the needed binaries on your system and calls to “docker --version” and “docker-compose --version” provide you corresponding information. If not, and you plan to run the ClearML-server in Docker I think you know what you have to do first ;) If you need support for getting the tool installed on an Ubuntu machine you might try the script I provide in my Github repository to this series.

compose it

As the ClearML-server is going to use the ports 8080/8081/8008 you should check if they are already in use and take actions to free them if so. You can check if the ports are in use by e.g.

sudo lsof -Pn -i4 | grep -E ":8080|:8081|:8008"

If any of the ports is in use, you need to release it, otherwise you will fail to get the ClearML-server running.

You will need to set-up some Linux environmental stuff which is is shown here:

echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart

sudo rm -R /opt/clearml/
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver

sudo chown -R 1000:1000 /opt/clearml

ClearML is an Open Source product and hosted on Github. The ClearML-server is an own repository and contains a YML file we need to make use of, so you need to get a copy of the file docker-compose.yml being part of ClearML-server project. Clone or download this file, as you will need to edit and use it in the next steps.

git clone https://github.com/allegroai/clearml-server.git
#or
sudo curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml

The docker-compose.yml which you will find in the sub-folder ‘docker’ or where you have downloaded it to, contains everything needed to instantiate the ClearML-server and all its dependencies like a MongoDB, a web server and others. It is a good idea, to copy the YML file it to /opt/clearml/.

Before you can proceed, you need to set following environment variables being used by ClearML. Replays the git relevant values with your personal to grant the server access to the repositories you make use of.

export CLEARML_HOST_IP=127.0.0.1
export CLEARML_AGENT_GIT_USER=<git_username_here>
export CLEARML_AGENT_GIT_PASS=<git_password_here>

If the host ip is not set, any request will be send to the public ClearML-server. If the git credentials are missing or incorrect, your experiments will not run. Warning messages in your shell will try to inform you if they are not set.

warnings of unset variables

Time for instantiating the server by:

docker-compose -f /opt/clearml/docker-compose.yml up -d

Open an browser and point it to localhost:8080

Done :) Quite easy isn’t it?

To get your experiments handled by your local ClearML-server you need to create credentials and enable your user account to make use of it. You will find the information on how to do this on the first article of my ClearML series.

To shutdown your instance call

docker-compose -f /opt/clearml/docker-compose.yml down

agents

How to configure and run an ClearML agent is part of my the first article to this series. Here I will show you how to run an agent inside a Docker container and how to run it manually.

starting agents within the composition

The current ClearML-server set-up already contains an ‘service’ agent, which is been used by ClearML internally I guess. The agent instance is been configured in the yml file I already make use of.

In the docker-compose.yml file you will find a configuration section containing already an agent.

I’m using the already available service configuration to add an agent handling the ‘default’ queue.

I adapted the service name, the image been used and the workers ID, reflecting that this is the default queue handler.

The Docker image I make use of, is been provided by the AllegroAI team as well. The Dockerfile used to build it is part of the clearml-agent Github repository.

Running the docker-compose command to start the server with a yml file adapted like this, creates automatically an agent instance listening on the ‘default’ queue without the need to instantiate one manually or even set-up credentials :) (I expect this is because it uses the system credentials the server provides.)

starting agents manually

As I have two virtual private server hosted by a service provider, I want them to participate in my ClearML set-up, so I installed the clearml-agent on them and configured them with the credentials I created on my ClearML-server instance. As my ClearML-server installation is running in my home network I configured the router to forward any requests from outside to the ports 8080, 8081 and 8008 to my local PC running the dockerized ClearML-server. As my IP address is dynamic, I need to get an DNS entry been automatically updated on daily base to be able to configure the clearml-agent reaching my installation. Fortunately my router (FRITZ!Box) is been able to get reached by an DNS service of its manufacturer, otherwise I would need to open an account by an Dynamic DNS provider like no-ip.com.

Once the VPS systems are configured with the credentials of my server,

I run the following command on each of them

clearml-agent daemon --cpu-only --queue cpu_work_queue --create-queue -d

which enables me to enqueue experiments into the newly created (if not already existent) queue named ‘cpu_worker_queue’. As they do not have any CUDA enabled GPU I added the parameter ‘--cpu_only’, otherwise I would add ‘--gpus all’ and maybe also a docker container to be used for processing the queue work. For configuration options check the official documentation pages and/or run the agent with --help ;)

multiple workers service different queues

getting support

In case of support needs, there is — besides commercial support — also free support available on the ClearML Slack Channel where I made good experience with the very responsive team.

Thank you guys for answering all my questions :)

Summary

In this story, I hopefully was able to show you how to get ClearML running in a Docker environment, run agents withing the same docker composition and agents stared manually.

Thank you for reading.

My name is Wasilios Goutas, I’m Dipl. Ing. techn. Informatik (FH) which I studied in my home town Hamburg, Germany. I’m experienced software project manager in the automotive and semiconductor industries. In my private life I’m making use of AI libraries like Keras & Tensorflow to learn more about the technology which will play a significant role in tomorrows world.

Feel free to contact me on LinkedIn.

Dipl. Ing. Technische Informatik, SW Project Manager, Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store