cache-size implementation downsides
John Keeping
john at keeping.me.uk
Wed Jun 20 01:02:59 CEST 2018
On Sat, Jun 16, 2018 at 04:46:21PM +0100, John Keeping wrote:
> On Wed, Jun 13, 2018 at 03:02:42PM -0400, Konstantin Ryabitsev wrote:
> > 2. I have witnessed cache corruption due to collisions (which is
> > a bug in itself). One of our frontends was hit by a lot of agressive
> > crawling of snapshots that raised the load to 60+ (many, many gzip
> > processes). After we blackholed the bot, some of the cache objects for
> > non-snapshot URLs had trailing gzip junk in them, meaning that either
> > two instances were writing to the same file, or something else resulted
> > in cache corruption. This is probably a race condition somewhere in the
> > locking code.
>
> I've had a look at this, and I think we might end up dropping our lock
> too early thanks to this code (in fill_slot()):
>
> /* Restore stdout */
> if (dup2(tmp, STDOUT_FILENO) == -1) {
>
> Before this line, STDOUT_FILENO refers to lock_fd which has a POSIX
> advisory record lock on the entire file. However, the documentation for
> that says:
>
> * If a process closes any file descriptor referring to a file, then all
> of the process's locks on that file are released, regardless of the
> file descriptor(s) on which the locks were obtained. This is
> bad: it means that a process can lose its locks on a file such as
> /etc/passwd or /etc/mtab when for some reason a library function
> decides to open, read, and close the same file.
>
> I haven't verified this, but I suspect that dup'ing the original stdout
> over STDOUT_FILENO is equivalent to closing a file descriptor referring
> to our lock file. And thus the lock is released at this point, which is
> before we rename the lock file over the cache file.
>
> If that is correct, then there is a window during which a new process
> can open the lock file to write new content and successfully acquire the
> lock on that file even though it is still being used by another process.
I confirmed this behaviour with trace-cmd.
Before:
cgit-7291 : posix_lock_inode: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : fcntl_setlk: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8801beeb7930
cgit-7291 : posix_lock_inode: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : locks_remove_posix: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : sys_enter_rename: oldname: 0x559bd0bd5830, newname: 0x559bd0bd57d0
cgit-7291 : sys_exit_rename: 0x0
After:
cgit-7488 : posix_lock_inode: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : fcntl_setlk: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : sys_enter_rename: oldname: 0x56512cd7f830, newname: 0x56512cd7f7d0
cgit-7488 : sys_exit_rename: 0x0
cgit-7488 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8802144daa10
cgit-7488 : posix_lock_inode: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : locks_remove_posix: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
I'm planning to queue the patch below on jk/for-jason and send a PR in
the next day or two, but it would be nice to get a reviewed-by before I
do that.
> -- >8 --
> Subject: [PATCH] cache: close race window when unlocking slots
>
> We use POSIX advisory record locks to control access to cache slots, but
> these have an unhelpful behaviour in that they are released when any
> file descriptor referencing the file is closed by this process.
>
> Mostly this is okay, since we know we won't be opening the lock file
> anywhere else, but there is one place that it does matter: when we
> restore stdout we dup2() over a file descriptor referring to the file,
> thus closing that descriptor.
>
> Since we restore stdout before unlocking the slot, this creates a window
> during which the slot content can be overwritten. The fix is reasonably
> straightforward: simply restore stdout after unlocking the slot, but the
> diff is a bit bigger because this requires us to move the temporary
> stdout FD into struct cache_slot.
>
> Signed-off-by: John Keeping <john at keeping.me.uk>
> ---
> cache.c | 37 ++++++++++++++-----------------------
> 1 file changed, 14 insertions(+), 23 deletions(-)
>
> diff --git a/cache.c b/cache.c
> index 0901e6e..2c70be7 100644
> --- a/cache.c
> +++ b/cache.c
> @@ -29,6 +29,7 @@ struct cache_slot {
> cache_fill_fn fn;
> int cache_fd;
> int lock_fd;
> + int stdout_fd;
> const char *cache_name;
> const char *lock_name;
> int match;
> @@ -197,6 +198,13 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot)
> else
> err = unlink(slot->lock_name);
>
> + /* Restore stdout and close the temporary FD. */
> + if (slot->stdout_fd >= 0) {
> + dup2(slot->stdout_fd, STDOUT_FILENO);
> + close(slot->stdout_fd);
> + slot->stdout_fd = -1;
> + }
> +
> if (err)
> return errno;
>
> @@ -208,42 +216,24 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot)
> */
> static int fill_slot(struct cache_slot *slot)
> {
> - int tmp;
> -
> /* Preserve stdout */
> - tmp = dup(STDOUT_FILENO);
> - if (tmp == -1)
> + slot->stdout_fd = dup(STDOUT_FILENO);
> + if (slot->stdout_fd == -1)
> return errno;
>
> /* Redirect stdout to lockfile */
> - if (dup2(slot->lock_fd, STDOUT_FILENO) == -1) {
> - close(tmp);
> + if (dup2(slot->lock_fd, STDOUT_FILENO) == -1)
> return errno;
> - }
>
> /* Generate cache content */
> slot->fn();
>
> /* Make sure any buffered data is flushed to the file */
> - if (fflush(stdout)) {
> - close(tmp);
> + if (fflush(stdout))
> return errno;
> - }
>
> /* update stat info */
> - if (fstat(slot->lock_fd, &slot->cache_st)) {
> - close(tmp);
> - return errno;
> - }
> -
> - /* Restore stdout */
> - if (dup2(tmp, STDOUT_FILENO) == -1) {
> - close(tmp);
> - return errno;
> - }
> -
> - /* Close the temporary filedescriptor */
> - if (close(tmp))
> + if (fstat(slot->lock_fd, &slot->cache_st))
> return errno;
>
> return 0;
> @@ -393,6 +383,7 @@ int cache_process(int size, const char *path, const char *key, int ttl,
> strbuf_addstr(&lockname, ".lock");
> slot.fn = fn;
> slot.ttl = ttl;
> + slot.stdout_fd = -1;
> slot.cache_name = filename.buf;
> slot.lock_name = lockname.buf;
> slot.key = key;
> --
> 2.17.1
More information about the CGit
mailing list