The clone system call needs a better wrapper, at least on Linux. See, you might have heard about this neat containers thing. Run processes with some actual separation and (the start of) security! Let yourself feel the freedom! And they’re great… as long as you’re not calling the libc functions yourself.
Don’t get me wrong, there’s unshare(), and I’m sure that works for some people (in fact, examining its documentation explained so much about Docker’s design). It just completely doesn’t match what I want. As part of something I’m creating, I’m trying to write an abstraction around the tools on Linux to sanely create containers for arbitrary processes. And unshare() just doesn’t do what I want. Changing the current process’s namespace isn’t what I need, and that’s without getting into all the weird caveats that unshare() comes with. So I could fork twice, but that screws up the process parenting.
The answer looks like it’s clone(), and the system call does almost precisely what I want, except… the libc wrapper doesn’t. libc wraps it one of two ways – the good old fashioned fork(), which is easy to work with but takes no arguments, or clone(), which seems to be specifically built for threads and actively gets in the way of anything else. If you’re trying to fork a process, well, too bad, you still need to create a new stack and pass in a function. What’s that? Linux has all those neat COW facilities around the clone syscall? Doesn’t matter, you don’t get half the benefits.
This is honestly just a rant, and I know I can just call syscall() (and I will soon enough), this just really wound up irritating me. I want something as simple as calling fork(), but with the ability to pass clone flags, and it just doesn’t exist. In this day and age it really should, but then I suppose Docker wouldn’t seem quite so magic.
Oh well.