Skip to main content

Docker

Docker is a tool that packages software into containers — isolated, portable environments that include everything a program needs to run: code, runtime, libraries, and configuration. A container runs the same way on any machine, eliminating the "it works on my machine" problem.

Containers vs. Virtual Machines

Both containers and virtual machines (VMs) provide isolated environments, but they work differently:

ContainersVirtual Machines
SizeMegabytesGigabytes
StartupSecondsMinutes
IsolationProcess-levelFull OS
OverheadVery lowHigh

Containers share the host OS kernel but isolate their filesystem, networking, and processes. VMs emulate an entire computer, including a separate OS.

Key Concepts

Image

A Docker image is a read-only template that defines a container. Think of it like a snapshot of an environment. Images are built from a Dockerfile and can be shared via registries like Docker Hub.

Container

A container is a running instance of an image. You can start many containers from the same image. When a container stops, its changes are discarded (unless you use volumes).

Dockerfile

A Dockerfile is a text file with instructions for building an image. Each instruction creates a layer.

Registry

A registry stores and distributes images. Docker Hub is the default public registry, similar to how GitHub hosts code.

Installation

Download Docker Desktop from docker.com. It includes Docker Engine, the CLI, and a GUI.

Verify installation:

docker --version
docker run hello-world

Basic Docker Commands

Working with Images

# Download an image from Docker Hub
docker pull python:3.12

# List downloaded images
docker images

# Remove an image
docker rmi python:3.12

Running Containers

# Run a container (downloads image if needed)
docker run python:3.12

# Run interactively with a shell
docker run -it python:3.12 bash

# Run in the background (detached)
docker run -d nginx

# Run and remove the container when it exits
docker run --rm python:3.12 python --version

# Map a host port to a container port
docker run -p 8080:80 nginx
# Now visit http://localhost:8080

Managing Containers

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a running container
docker stop <container-id>

# Remove a stopped container
docker rm <container-id>

# View logs from a container
docker logs <container-id>

# Open a shell inside a running container
docker exec -it <container-id> bash

Writing a Dockerfile

Here's a Dockerfile for a simple Python application:

# Start from an official Python image
FROM python:3.12-slim

# Set the working directory inside the container
WORKDIR /app

# Copy the requirements file and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# The command to run when the container starts
CMD ["python", "app.py"]

Build and Run

# Build an image from the Dockerfile in the current directory
docker build -t my-app .

# Run it
docker run my-app

# Run with a port mapping
docker run -p 5000:5000 my-app

Dockerfile Instructions

InstructionPurpose
FROMBase image to start from
WORKDIRSet the working directory
COPYCopy files from host to image
RUNExecute a command during build
ENVSet environment variables
EXPOSEDocument which port the app uses
CMDDefault command to run at startup
ENTRYPOINTFixed command (CMD becomes arguments)

Volumes

By default, container data is lost when the container stops. Volumes persist data between runs:

# Mount a host directory into the container
docker run -v /path/on/host:/path/in/container my-app

# Mount the current directory
docker run -v $(pwd):/app my-app

# Create a named volume (managed by Docker)
docker run -v mydata:/app/data my-app

Docker Compose

For projects with multiple services (e.g., a web app + a database), Docker Compose lets you define and run them together with a single docker-compose.yml file.

services:
web:
build: .
ports:
- "5000:5000"
depends_on:
- db
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/mydb

db:
image: postgres:16
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data

volumes:
postgres_data:

Commands:

# Start all services
docker compose up

# Start in the background
docker compose up -d

# Stop all services
docker compose down

# View logs
docker compose logs

# Rebuild images
docker compose build

.dockerignore

Like .gitignore, a .dockerignore file tells Docker which files to exclude when building an image. This keeps images small and prevents secrets from being accidentally included:

.git
.env
venv/
__pycache__/
*.pyc
node_modules/

Practical Use Cases for Researchers

  • Reproducible environments — share a Dockerfile alongside your code so others can reproduce your exact environment years later
  • Running tools without installing themdocker run -it ubuntu bash gives you a temporary Linux environment
  • Consistent development environments — the whole team uses the same container, so setup instructions become a single docker compose up
  • Running services locally — start a local PostgreSQL or Redis instance with one command, no installation required
# Start a PostgreSQL database locally for development
docker run -d \
-e POSTGRES_USER=dev \
-e POSTGRES_PASSWORD=dev \
-e POSTGRES_DB=myapp \
-p 5432:5432 \
postgres:16

References