Anyone who’s ever set up a data science work environment knows one thing to be true: it’s a painful process. If you’ve ever set up Tensorflow or CNTK with GPU support locally, you know exactly what I mean—the installation instructions alone are overwhelming, and there are countless combinations and permutations of the required components. And, worst of all, mistakes typically won’t rear their ugly heads until after you’ve finished the last step of the install.
Now, let’s assume you overcome these installation hurdles and get your work environment set up correctly: you’ll likely find that sharing your work is still a hassle, because anyone who wants to run your code needs to replicate your exact working environment on their own machine to function correctly.
However, thanks to an underutilized free tool called Docker, the days of tedious environment setups are coming to an end. From installation to sharing work, Docker makes a data scientist’s life much easier, so they can focus on doing what they do best—data science. Here’s how:
At the highest level, Docker allows you to package up not only your code, but also the environment used to run it. It does this via a hierarchy of elements:
Whether you’re using a base install of Anaconda or building your own data science super machine from Ubuntu, Docker is the tool you need for a quick, easy start. Getting your environment set up and primed for data science can be done with two simple lines of code:
The first line pulls a published Docker image of whichever environment you want to use into your local machine from Docker Hub. The second line will run this image, producing a pseudo-virtual machine to conduct your experiments in. Once the image is running in a container, you can open your favorite IDE inside of it and get to work immediately. You can also build your own images but keep this in mind: for most things you want to do, there’s likely a published image out there already—including everything from Tensorflow to Tesseract OCR.
Data science is rarely done (well) by a lone genius. To solve tough problems, it often takes a team of people contributing ideas, insights, and ingenuity to produce an elegant solution. And with Docker, this collaboration is easier than ever.
There are three main ways to share your Dockerized work:
How to Get Started With Docker
To get started, you’ll need to install Docker (anyone you share experiments with will also need to install Docker). It provides built-in examples to try out, but I recommend grabbing a Dockerfile from your favorite repo, or Docker Hub, and trying that out instead; I personally test drove Docker with a tool I happened to use at the time named glyphminer. Some quick commands you should know:
You can also read Docker’s tutorial guides for more detailed, step-by-step instructions on getting started. Trust me—once you realize how many headaches Docker can spare you, you’ll be glad you did.
Contact us to share the challenges you’re facing and learn more about the solutions we offer to help you achieve your goals. Our job is to solve your problems with expertly crafted software solutions and real world training.
For a better experience on the web, please upgrade to a modern browser.