If you run Docker inside LXC containers on Proxmox you probably woke up this week to a fun surprise. Your containers won't start anymore. The error looks like this:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open sysctl net.ipv4.ip_unprivileged_port_start file: reopen fd 8: permission denied: unknownThis isn't a Proxmox bug. It's not even really a Docker bug. It's a security patch that landed in containerd.io version 1.7.28-2 around November 5th fixing CVE-2025-52881, a critical container escape vulnerability. The fix involves reopening file descriptors for procfs operations which triggers AppArmor permission errors when running Docker inside nested LXC containers.
The technical details are actually kind of fascinating. Bear with me, I'll make it as simple as I can.
A quick detached mounts sidebar
A detached mount is a filesystem mount that exists in the kernel but isn't attached to any path in the filesystem tree. Think of it like a mounted filesystem that's floating in memory without a mountpoint.
Normally when you mount something, it gets attached to a specific path thus:
mount /dev/sda1 /mnt/data # attached to /mnt/dataA detached mount exists but has no path. You can only access it through file descriptors. Runc uses detached mounts as a security feature to avoid race conditions where an attacker could swap out mountpoints while runc is trying to access them.
The problem is that when the kernel tries to generate a pathname for files inside a detached mount (which AppArmor needs since it's path-based), it can only see the relative path from the mount root. So /proc/sys/net/ipv4/ip_unprivileged_port_start inside a detached procfs mount just looks like /sys/net/ipv4/ip_unprivileged_port_start to AppArmor because the /proc part doesn't exist in the filesystem tree.
It's basically a mismatch between two security features: runc's use of detached mounts to prevent path-based attacks, and AppArmor's path-based access control system that needs actual paths to make decisions.
In case you were curious, SELinux wouldn't have this problem because it's label-based rather than path-based.
SELinux assigns security labels (contexts) directly to files, processes, and other objects. When you access a file, SELinux checks if your process label is allowed to perform that action on the file's label.
How we do fix this?
For now, the only way I could find is to disable AppArmor entirely on a per LXC basis. Not exactly the ideal long term solution but for now, the only one available.
lxc.apparmor.profile: unconfined
lxc.mount.entry: /dev/null sys/module/apparmor/parameters/enabled none bind 0 0OR
As per this comment you can trick Docker into thinking AppArmor is disabled. Also yuck.
% mount --bind /dev/null /sys/module/apparmor/parameters/enabled
% systemctl restart dockerAppArmor, you are making this hard
Look, I get it. AppArmor exists for good reasons. It provides mandatory access control and helps contain potential security issues. But for homelab users and small deployments this stuff is beginning to get exhausting. This is not my first frustration with AppArmor since adopting Proxmox 9 a few months ago. The enterprise folks have teams to deal with this crap, the rest of us are just trying to run some containers.
Unfortunately, I just do not believe AppArmor is not fit for purpose, nor was Proxmox diligent or rigorous in their including it in Proxmox 9. I won't belabor the point any further here but I did write about it recently.
Proxmox is genuinely great software. And I have made countless videos both at work and personally about it. The team does excellent work and the platform is rock solid for virtualization. But the AppArmor integration continues to be a source of friction that makes recommending it harder than it should be. I'm going to have to start to look for alternatives soon, not that there really are any. Ugh.