Docker Security Cheat Sheet

Sun, May 30, 2021

source: https://news.ycombinator.com/item?id=26446337 March 14, 2020
https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html

my biggest takeaway..

A big 👍 for running Docker containers with --read-only, forcing you to use explicit writeable volume/bind mounts for all writable data… it’s not just the security benefits, you can also avoid entire classes of problems like:

minimizing the difference between docker restart (preserves overlayfs changes) and container re-creates (resets overlayfs back to the image state)

surprise data loss on container redeployments because data was unexpectedly being written to the overlayfs instead of a volume

unexpectedly running out of disk space in /var/lib/docker because data was being written outside of a volume

performance issues caused by excessive overlayfs writes (storage drivers and /var/lib/docker not necessarily designed for IO performance)

Rule #2 needs lots of nuance stacked on it.

Host-users and guest-users must be explicitly mapped by whomever starts the container, so this issue is not a security threat to outside of the container. That said, if a guest-os is running as root and then someone compromises it, they have at their disposal all the powers of root in that container.

“Don’t run as root in the container” definitely gets parroted without the nuance you describe. Gaining root in a container does not generally mean anything has been compromised on the host. Unless you do something odd (like mounting the host docker socket, weird setuid stuff, or run as privileged) it should be fine to run the container processes as root. It’s kind of the main point of running a process in a docker container.

I understand the motivation of marking scripts and binaries as non-writable inside the container as an extra layer of assurance (along with a non-root user that can only execute). But it’s a disservice to developers if you don’t explain why. A lot of people walk away from this thinking they’re protecting the host OS and wind up cargo-cult-creating a container user with full write/execute permissions.

An option not given in the article is to start as root, then step down to an unprivileged user using gosu. The official Postgres image does this, so appropriate permissions can be set on the data dir, becoming an unprivileged user for operation.

I keep hearing people recommending systemd to achive something similar to docker. Someone posted this detail command and setup.

All of these isolation techniques (and much more) can be used inside systemd units. Write your .service as usual, then run this command on it:

$ systemd-analyze security service-name

It prints out a long list of hardening flags that can be applied inside of your service, like so:

	NAME                          DESCRIPTION                  EXPOSURE
   PrivateNetwork=     Service has access to the host's network     0.5
   User=/DynamicUser=  Service runs as root user                    0.4
   CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)   Service may change UID/GID identities/capabilities         0.3
   CapabilityBoundingSet=~CAP_SYS_ADMIN           Service has administrator privileges                                         0.3
  ...

Here’s what I typically use for a .NET 5 application:

  WorkingDirectory        = /opt/appname/app
  ReadWritePaths          = /opt/appname/data
  UMask                   = 0077
  LockPersonality         = yes
  NoNewPrivileges         = yes
  PrivateDevices          = yes
  PrivateMounts           = yes
  PrivateTmp              = yes
  PrivateUsers            = yes
  ProtectClock            = yes
  ProtectControlGroups    = yes
  ProtectHome             = yes
  ProtectHostname         = yes
  ProtectKernelLogs       = yes
  ProtectKernelModules    = yes
  ProtectKernelTunables   = yes
  ProtectSystem           = strict
  RemoveIPC               = yes
  RestrictAddressFamilies = AF_UNIX AF_INET AF_INET6
  RestrictNamespaces      = yes
  RestrictRealtime        = yes
  RestrictSUIDSGID        = yes
  SystemCallArchitectures = native
  ProtectProc             = invisible
  CapabilityBoundingSet   =
  SystemCallFilter        = ~@clock @module @mount @raw-io @reboot @swap @privileged @cpu-emulation @obsolete

ReadWritePaths should be replaced with a combination of DynamicUser + writing local persistent data to $STATE_DIRECTORY, but I’m too lazy to do that yet.

See systemd.exec(5) for more.