If you’re like most developers, you probably have a ton of images lying around that you can use as starting points for your own projects. But how do you go about using an image as the basis for a new Dockerfile? In this article, we’ll show you how to use an existing image as a starting point for your own Dockerfile. First, find the image that you want to use as a base. You can find images for all sorts of different projects on the Docker Hub website. Once you’ve found the image that you want to use, open up your favorite text editor and create a new file called Dockerfile in the same directory where the image is located. The first thing that we need to do is specify the base image that we’re going to be using. To do this, we use the FROM command: FROM . For example, if we wanted to use the nginx image as our base, we would type in FROM nginx . Next, we need to tell Docker what files and folders from our chosen image should be included in our project. We do this by using the COPY command: COPY . For example, if we wanted to copy all of the files from our nginx image into our project’s src directory, we would type in COPY src nginx . Finally, it’s time to build our project! To do this, we use the RUN command: RUN . For example, if we wanted to run a simple command inside of our project’s src directory (to see whether it was working), we would type in RUN echo “Hello world!” . ..


Docker images are created by building Dockerfiles. The build process executes the instructions in the Dockerfile to create the filesystem layers that form the final image.

What if you already have an image? Can you retrieve the Dockerfile it was built from? In this article, we’ll look at two methods that can achieve this.

The Objective

When you’re building your own Docker images, you should store your Dockerfiles as version controlled files in your source repository. This practice ensures you can always retrieve the instructions used to assemble your images.

Sometimes you won’t have access to a Dockerfile though. Perhaps you’re using an image that’s in a public registry but has an inaccessible source repository. Or you could be working with image snapshots which don’t directly correspond to a versioned Dockerfile. In these cases, you need a technique that can create a Dockerfile from an image on your machine.

Docker doesn’t offer any built-in functionality for achieving this. Built images lack an association with the Dockerfile they were created from. However, you can reverse engineer the build process to produce a good approximation of an image’s Dockerfile on-demand.

The Docker History Command

The docker history command reveals the layer history of an image. It shows the command used to build each successive filesystem layer, making it a good starting point when reproducing a Dockerfile.

Here’s a simple Dockerfile for a Node.js application:

Build the image using docker build:

Now inspect the image’s layer history with docker history:

The history includes the complete list of layers in the image, including those inherited from the node:16 base image. Layers are ordered so the most recent one is first. You can spot where the layers created by the sample Dockerfile begin based on the creation time. These show Docker’s internal representation of the COPY and CMD instructions used in the Dockerfile.

The docker history output is more useful when the table’s limited to just showing each layer’s command. You can disable truncation too to view the full command associated with each layer:

From this list of commands, you can gain an overview of the steps taken to assemble the image. For simple images like this one, this can be sufficient information to accurately reproduce a Dockerfile.

Automating Layer Extraction with Whaler and Dfimage

Copying commands out of docker history is a laborious process. You also need to strip out the /bin/sh -c at the start of each line, as Docker handled each instruction as a no-op Bash comment.

Fortunately there are community tools available that can automate Dockerfile creation from an image’s layer history. For the purposes of this article, we’ll focus on Whaler which is packaged into the alpine/dfimage (Dockerfile-from-Image) Docker image by the Alpine organization.

Running the dfimage image and supplying a Docker tag will output a Dockerfile that can be used to reproduce the referenced image. You must bind your host’s Docker socket into the dfimage container so it can access your image list and pull the tag if needed.

The created Dockerfile contains everything you need to go from scratch (an empty filesystem) to the final layer of the specified image. It includes all the layers that come from the base image. You can see these in the first ENTRYPOINT and CMD instructions in the sample output above (the other base image layers have been omitted for brevity’s sake).

With the exception of COPY, the instructions specific to our image match what was written in the original Dockerfile. You can now copy these instructions into a new Dockerfile, either using the whole dfimage output or by taking just the part that pertains to the final image. The latter option is only a possibility if you know the original base image’s identity so you can add a FROM instruction to the top of the file.

The Limitations

In many cases dfimage will be able to assemble a usable Dockerfile. Nonetheless it’s not perfect and an exact match is not guaranteed. The extent of the discrepancies compared to the image’s original Dockerfile will vary depending on the instructions that were used.

Not all instructions are captured in the layer history. Unsupported ones will be lost and there’s no way you can determine what they were. The best accuracy is obtained with command and metadata instructions like RUN, ENV, WORKDIR, ENTRYPOINT, and CMD. RUN instructions could still be missing if their command didn’t result in filesystem changes, meaning no new image layer was created.

COPY and ADD instructions present unique challenges. The history doesn’t contain the host file path which was copied into the container. You can see a copy occurred but the source path references the file hash that was copied into the image from the build context.

As you do get the final destination, this can be enough to help you work out what’s been copied and why. You can then use this information to interpolate a new source path into the Dockerfile which you can use for future builds. In other cases, inspecting the file inside the image might help reveal the copy’s purpose so you can determine a meaningful filename for the host path.

Summary

Docker images don’t include a direct way to work backwards to the Dockerfile they were built from. It’s still possible to piece together the build process though. For simple images with few instructions, you can often work out the instructions manually by looking at the CREATED BY column in the docker history command’s output.

Larger images with more complex build processes are best analyzed by tools like dfimage. This does the hard work of parsing the verbose docker history output for you, producing a new Dockerfile that’s a best effort match for the likely original.

Reverse engineering efforts aren’t perfect and some Dockerfile instructions are lost or mangled during the build process. Consequently you shouldn’t assume Dockerfiles created in this way are an accurate representation of the original. You might have to make some manual adjustments to ADD and COPY instructions too, resurrecting host file paths that were converted to build context references.