diff options
author | Kirk McKusick <mckusick@FreeBSD.org> | 2012-02-07 20:43:28 +0000 |
---|---|---|
committer | Kirk McKusick <mckusick@FreeBSD.org> | 2012-02-07 20:43:28 +0000 |
commit | 19c87af0fd797dd71e9ade676d20f4641e837905 (patch) | |
tree | bc91bb302c502bbccb492f8d484b961f67b0b921 /sys/ufs/ffs | |
parent | 8c91aec273f3f9458dbd283c10020a7360da847e (diff) | |
download | src-19c87af0fd797dd71e9ade676d20f4641e837905.tar.gz src-19c87af0fd797dd71e9ade676d20f4641e837905.zip |
In the original days of BSD, a sync was issued on every filesystem
every 30 seconds. This spike in I/O caused the system to pause every
30 seconds which was quite annoying. So, the way that sync worked
was changed so that when a vnode was first dirtied, it was put on
a 30-second cleaning queue (see the syncer_workitem_pending queues
in kern/vfs_subr.c). If the file has not been written or deleted
after 30 seconds, the syncer pushes it out. As the syncer runs once
per second, dirty files are trickled out slowly over the 30-second
period instead of all at once by a call to sync(2).
The one drawback to this is that it does not cover the filesystem
metadata. To handle the metadata, vfs_allocate_syncvnode() is called
to create a "filesystem syncer vnode" at mount time which cycles
around the cleaning queue being sync'ed every 30 seconds. In the
original design, the only things it would sync for UFS were the
filesystem metadata: inode blocks, cylinder group bitmaps, and the
superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from
which the filesystem is mounted).
Somewhere in its path to integration with FreeBSD the flushing of
the filesystem syncer vnode got changed to sync every vnode associated
with the filesystem. The result of this change is to return to the
old filesystem-wide flush every 30-seconds behavior and makes the
whole 30-second delay per vnode useless.
This change goes back to the originally intended trickle out sync
behavior. Key to ensuring that all the intended semantics are
preserved (e.g., that all inode updates get flushed within a bounded
period of time) is that all inode modifications get pushed to their
corresponding inode blocks so that the metadata flush by the
filesystem syncer vnode gets them to the disk in a timely way.
Thanks to Konstantin Belousov (kib@) for doing the audit and commit
-r231122 which ensures that all of these updates are being made.
Reviewed by: kib
Tested by: scottl
MFC after: 2 weeks
Notes
Notes:
svn path=/head/; revision=231160
Diffstat (limited to 'sys/ufs/ffs')
-rw-r--r-- | sys/ufs/ffs/ffs_vfsops.c | 20 |
1 files changed, 15 insertions, 5 deletions
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index a97d23a07ec0..a12fd83b0a64 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -1436,17 +1436,26 @@ ffs_sync(mp, waitfor) int softdep_accdeps; struct bufobj *bo; + wait = 0; + suspend = 0; + suspended = 0; td = curthread; fs = ump->um_fs; if (fs->fs_fmod != 0 && fs->fs_ronly != 0 && ump->um_fsckpid == 0) panic("%s: ffs_sync: modification on read-only filesystem", fs->fs_fsmnt); /* + * For a lazy sync, we just care about the filesystem metadata. + */ + if (waitfor == MNT_LAZY) { + secondary_accwrites = 0; + secondary_writes = 0; + lockreq = 0; + goto metasync; + } + /* * Write back each (modified) inode. */ - wait = 0; - suspend = 0; - suspended = 0; lockreq = LK_EXCLUSIVE | LK_NOWAIT; if (waitfor == MNT_SUSPEND) { suspend = 1; @@ -1517,11 +1526,12 @@ loop: #ifdef QUOTA qsync(mp); #endif + +metasync: devvp = ump->um_devvp; bo = &devvp->v_bufobj; BO_LOCK(bo); - if (waitfor != MNT_LAZY && - (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0)) { + if (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0) { BO_UNLOCK(bo); vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY); if ((error = VOP_FSYNC(devvp, waitfor, td)) != 0) |