Linux/VM: Safely Updating Critical Files on a Linux System

Last updated on:
Sunday, July 06, 2008

Software

Information

Community

News

Fun

Credits

Cast of characters

Safely Updating Critical Files on a Linux System

This information was originally posted to the Linux-390 mailing list on April 4, 2002, by Malcolm Beattie.

Linux (same as any UNIX) presents a file system namespace with reference counting and atomic operations so you can do this stuff without any race conditions or windows at all. Behold; I have nothing up my sleeves:

ln foo foo.bak
Creates a new link to the same file as foo does: both foo and foo.bak now refer to exactly the same underlying file.
cp foo newfoo
Create a copy newfoo which you then edit, change, modify and do whatever you want to in order to prepare your new version.
mv newfoo foo
Atomically replaces the directory entry foo: before the command (specifically: before the system call "rename" that mv does for you), opening "foo" refers to the old file; after it, opening "foo" refers to the new file. At no time is there a window where no file named "foo" exists and at no time is there a window where both exist or get mixed up in any way.

Processes which already have the old foo open continue happily onwards with the old underlying file: they have reference counts on the file in the same way that the directory entries do and they can modify the underlying file however they like.

That initial "ln foo foo.bak" you did also means you can access the old file under the name "foo.bak". Note that creating that foo.bak link is not a necessary part of letting those existing processes continue to access the file. If you don't create that extra foo.bak link, those processes still have a reference count on the underlying file and can access/modify/map it. The only difference is that there is no longer any name in the filesystem by which that file can be newly opened.

However, creating that hard link is useful so that you still have a name for that old file (in case your new one turns out not to have been such a good idea after all). It also saves taking a complete new copy which would have meant that you'd have had three copies of the data at one point (old one, backup of old one, new one) which might be inconvenient with big files.

Note that foo.bak is a link to a completely different file--let's call it "inode 1234" just to be more concrete and to separate out the idea of "filename" from "underlying file/inode." (Even if you're using some modern filesystem that doesn't think of itself as really using inodes, it's got to pretend it does from the point of view of these operations.) Then you start with a directory entry "foo -> 1234," do a "ln foo foo.bak" which adds an entry "foo.bak -> 1234" and then do "mv newfoo foo" which atomically replaces the "foo -> 1234" entry with "foo -> 5678" (whereas before there had been "newfoo -> 5678"). If you follow all the links, it all works out nicely.

The guaranteed semantics of rename(2) are documented in SuSv2 at http://www.opengroup.org/onlinepubs/007908799/xsh/rename.html

One thing to note is that the Linux man page I have documents that there is allowed to be a window in which both "newfoo" and "foo" both refer to the same file (the file being renamed, i.e. the one with inode 5678 in the above example). I'm not quite sure how one could observe such an event (does it imply getdents(2) is allowed to return both the "newfoo -> 5678" entry and the "foo -> 5678" entry in the returned buffer of a single call?) since there is no way of issuing both opens "at the same time" in any observable way.

Nevertheless, this "both newfoo and foo may exist at the same time" part of the semantics does not affect the issue that you were asking about: nothing with this renamed entry can affect the contents of inode 1234. Note, of course, that any process which had inode 1234 open (the old "foo") before the rename can continue to access/modify it if it wishes. If you access "foo.bak" (which refers to the same inode) then you are accessing the very same file and your accesses or modification will be seen by those processes that opened it under its other name (the old "foo") and vice versa.

Site hosting courtesy of Velocity Software