diff options
author | Josh Paetzel <jpaetzel@FreeBSD.org> | 2017-01-20 15:01:04 +0000 |
---|---|---|
committer | Josh Paetzel <jpaetzel@FreeBSD.org> | 2017-01-20 15:01:04 +0000 |
commit | f2be81e92cf6c3b09442837158cefca2e5a2a1f1 (patch) | |
tree | 9552a72fca2b7c483e03fbc1bcb1ba0f2d01c4e2 /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c | |
parent | 039644eca9c00d03c823893ce01bf8af11e484ea (diff) | |
parent | 1e37d7e5558aefc7d3f5e7321fec393cf25f8dcb (diff) | |
download | src-f2be81e92cf6c3b09442837158cefca2e5a2a1f1.tar.gz src-f2be81e92cf6c3b09442837158cefca2e5a2a1f1.zip |
MFV 312436
6569 large file delete can starve out write ops
illumos/illumos-gate@ff5177ee8bf9a355131ce2cc61ae2da6a5a6fdd6
https://github.com/illumos/illumos-gate/commit/ff5177ee8bf9a355131ce2cc61ae2da6a5a6fdd6
https://www.illumos.org/issues/6569
The core issue I've found is that there is no throttle for how many
deletes get assigned to one TXG. As a results when deleting large files
we end up filling consecutive TXGs with deletes/frees, then write
throttling other (more important) ops.
There is an easy test case for this problem. Try deleting several
large files (at least 1/2 TB) while you do write ops on the same
pool. What we've seen is performance of these write ops (let's
call it sideload I/O) would drop to zero.
More specifically the problem is that dmu_free_long_range_impl()
can/will fill up all of the dirty data in the pool "instantly",
before many of the sideload ops can get in. So sideload
performance will be impacted until all the files are freed.
The solution we have tested at Nexenta (with positive results)
creates a relatively simple throttle for how many "free" ops we let
into one TXG.
However this solution exposes other problems that should also be
addressed. If we are to slow down freeing of data that means one
has to wait even longer (assuming vnode ref count of 1) to get shell
back after an rm or for NFS thread to finish the free-ing op.
To avoid this the proposed solution is to call zfs_inactive() async
for "large" files. Async freeing then begs for the reclaimed space
to be accounted for in the zpool's "freeing" prop.
The other issue with having a longer delete is the inability to
export/unmount for a longer period of time. The proposed solution
is to interrupt freeing of blocks when a fs is unmounted.
Author: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: avg
Differential Revision: D9008
Notes
Notes:
svn path=/head/; revision=312535
Diffstat (limited to 'sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c')
-rw-r--r-- | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c index 9b3b79bfb517..0ac2f70cf3b6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c @@ -24,6 +24,7 @@ * Copyright (c) 2013 Steven Hartland. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Integros [integros.com] + * Copyright 2016 Nexenta Systems, Inc. All rights reserved. */ #include <sys/dsl_pool.h> @@ -593,6 +594,16 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) dsl_pool_undirty_space(dp, dp->dp_dirty_pertxg[txg & TXG_MASK], txg); /* + * Update the long range free counter after + * we're done syncing user data + */ + mutex_enter(&dp->dp_lock); + ASSERT(spa_sync_pass(dp->dp_spa) == 1 || + dp->dp_long_free_dirty_pertxg[txg & TXG_MASK] == 0); + dp->dp_long_free_dirty_pertxg[txg & TXG_MASK] = 0; + mutex_exit(&dp->dp_lock); + + /* * After the data blocks have been written (ensured by the zio_wait() * above), update the user/group space accounting. */ |