Extended Attributes30 Dec 2015
While working on support for FreeBSD’s extended attributes in python, I tried to be conscious of other implementations of extended attributes in different operating systems, that way I wouldn’t be inadvertently causing the sane problem that I was trying to fix: reliance on a particular API’s semantics.
What are extended attributes?
To put it very simply, extended attributes are metadata that are attached to files. Typically, they’re key/value pairs that the filesystem associates with a particular file on the filesystem, though that doesn’t always have to be the case.
How they’re implemented depends on both the filesystem, as well as the operating system. This means that implementations on the same filesystem (UFS, for example) can be complete incompatible across operating systems (Solaris and FreeBSD).
Extended attributes are not mandated by any standard. The tooling and APIs are quite different across operating systems and some operating systems (OpenBSD, HP-UX) don’t implement them at all. Because support is non-standard and spotty, it’s rare to see them used in cross-platform software. I’d be super-interested in seeing some counter-examples to this.
Extended attributes are sometimes namespaced. That is to say, there exists
some top-level grouping of attributes. Other than the top-level namespace,
there usually isn’t hierarchy to attributes, other than any arbitrary
user-defined hierarchy. Namespaces are usually
this isn’t necessarily consistent, as we’ll see. Extended attributes under the
system namespace are only modifiable by root (and sometimes only queriable
The linux API is actually a fairly nice one, and for the rest of this post I’m
going to use it as my point of comparison. The return values of the
listxattr functions are the total size of the attribute, not the size of
the data returned. This lends itself to a nice idiom for checking whether or
not truncation occurred:
NULL in a call to
listxattr, the size of
the buffer required to hold the contents of the EA will be returned. This
allows you to query the amount of space required to hold the return value,
allocate it, and then call the function again to populate that. That’s
unfortunately racey, so it’s preferable to call and then
Linux extended attributes are namespaced, and the namespace is specified as
part of the attribute name. Namespaces are separated from attribute names by a
.. Currently, they support the common
user namespaces, as
It’s important to note that
listxattr will never retrun
EPERM. If there
are EAs that the current user cannot access, they just won’t be returned.
The attribute list returned by
NULL-delimited, and all of the
attribute names returned by
listxattr are fully-qualified.
It seems funny that I’m going to talk about AIX’s interface right after linux’s, but that’s largely because it’s…almost exactly the same.
Just like linux,
listea return the size of the actual attribute
value, which makes checking for truncation super easy. They also support
getting called with a zero size, which will just return the size of the list or
attribute value without writing any data to
The only key difference is that is that they use the character
separate the namespace from the attribute name. So querying for system
attributes involves querying the name
There’s also the
statea family of functions, which will fill in a
struct stat64x, but that’s of little consequence to us here.
FreeBSD / NetBSD
FreeBSD and NetBSD both use the same functions for the extended attribute calls. The most obvious difference is that the attribute namespace is no longer part of the attribute name. Each namespace is defined as an constant, and must be passed separately.
Almost seemingly as a result of this difference,
extattr_list can now error with
EPERM, rather than hiding the attribute names that the caller doesn’t have
The other, more annoying difference, is the return value of the
extattr_list functions. Rather than behaving like linux, AIX or OS X,
they instead return the number of bytes written, making truncation detection
harder. This basically requires that you make two calls if you want to ensure
that no truncation will occur.
OS X differs in a few ways. Notably, their functions all take an options arg.
Rather than calling an entirely different function to prevent following
symlinks, you can pass the
XATTR_NOFOLLOW to prevent traversing symlinks.
Another, fairly curious difference is the
position argument that’s part of
the prototype for
getxattr. To really get a handle on this, we’re going to
dive into the wonderful world of forks.
Forks are kind of like having multiple datastreams for the same file. The data that we typically think of being stored in a file is dumped into one fork (in the case of Mac OS, the data fork) and metadata, resources or any other type of data could exist in other forks, wholly independent.
On Mac OS filesystems (MFS, HFS, HFS+), each file could have at least a resource fork for the purpose of storing resources about a given file. This was used for things like splitting up icons that Finder would use to represent a file, or for separating presentation and content of text documents.
HFS+ (maybe HFS too? I’m not sure) allowed for any number of named forks.
Back to OS X
Extended attributes on OS X are actually just named forks. The extended attribute API wholly supplanted the old resource manager API. To ensure that applications could seek to arbitrary points in a fork, OS X’s extended attribute API includes a position argument.
getxattr is similar to Linux, in that it returns the size of the attribute’s
data, not just the number of bytes read. This makes truncation detection
It is worth noting that extended attribute names in OS X are not namespaced in any special way.
Solaris gets weird. Solaris is probably closest to OS X in its implementation of extended attributes, in that extended attributes are just named forks. However, Solaris includes only one specialized function call to deal with extended attributes.
But even this isn’t required, since you can get the same results from using
a combination of
From there, all of the *at functions can be used to operate on extended attributes with some restrictions:
- no links between attribute space and non-attribute space
- no renames between attribute space and non-attribute space
- only regular files are allowed - no dirs, symlinks or devices
Otherwise, extended attributes are treated like regular files.
This is awful when trying to expose a generic, cross-platform API for extended attributes; the only one that I’ve found is written for perl. I had to add support for FreeBSD in Go, Python and Rust - and none of these deal with Solaris or AIX! Adding FreeBSD support was pretty rough, largely since implementors assume that every OS has a Linux-compatible API.
No OS has a Linux-compatible extended attribute API
- Are attributes namespaced? Are namespaces strings? Are they
- Are they named forks? What happens if I need to seek?
- How big can the data be? How do we check for truncation?
- Error conditions differ radically
Honestly, I wonder if this contributes to the lack of cross-platform apps that use extended attributes. They’re super useful in any case where it’s necessary to track metadata about files without having to keep track of it in a separate database. That’s honestly fraught with peril anyway, since you’re dependent on the name of the file (or whatever identifier you use in your db) staying constant across renames, deletes, etc.
Where to go from here?
A C wrapper lib around all of these implementations would be nice, but there are some obvious trade-offs that need to be made.
The way that I’ve done this in Python and Rust has been to:
- Assume linux-like namespaces, and translate accordingly. If there aren’t namespaces in your OS’s implementation, then just make the namespace part of the attribute name.
- Make two calls to get the size of the extended attribute. This works across AIX, Linux and OS X. Solaris will have to use
statatto get the size. Unfortunately, race conditions abound.
- When listing extended attributes, ignore
EPERMfor system-level attributes
Maybe when I get some time, I’ll start working on one.
Finally: please, please stop assuming that the whole world is Linux.