aboutsummaryrefslogtreecommitdiff
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Allow stacked filesystems to be recursively unmountedJason A. Harmening2021-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
* Use an ANSI C function declaration for journal_check_space.John Baldwin2021-07-231-2/+1
| | | | | | GCC6 fails to compile this due to a -Wstrict-prototypes error. Sponsored by: Chelsio Communications
* ffs_softdep: force sync if journal is low in journal_check_spaceKonstantin Belousov2021-06-231-0/+7
| | | | | | | | | | | | | This effectively causes syncing of the mount point from softdep_prealloc(), softdep_prerename(), and softdep_prelink(). Typically it avoids the need for journal suspension at this point, at all. Suggested and reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ffs_softdep.c: add journal_check_space() helperKonstantin Belousov2021-06-231-15/+16
| | | | | | | | | Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* softdep_prelink(): only do sync if other thread changed the vnode metadata ↵Konstantin Belousov2021-06-233-12/+35
| | | | | | | | | | | | | | | | | | | | | since previous prelink We call into softdep_prerename() and softdep_prelink() when there is low free space in the journal. Functions sync all vnodes participating in the VOP, in the hope that this would reduce journal utilization. But if the vnodes are already synced, doing sync would only spend writes, journal is filled not due to the records from modifications of our vnodes. Remember original seqc numbers for vnodes, and only initiate syncs when seqc changed. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ufs_rename(): only do softdep_prerename() when other thread changed a vnodeKonstantin Belousov2021-06-231-1/+14
| | | | | | | | | Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ffs: mark block (re-)allocations as seqc writesKonstantin Belousov2021-06-232-50/+75
| | | | | | | | | Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ufs_rename(): softdep_prerename() does something only for SU+JKonstantin Belousov2021-06-231-1/+1
| | | | | | | | | | | so call it only in SU+J case Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ffs: reduce number of dvp relocks in softdep_prelink()Konstantin Belousov2021-06-231-8/+6
| | | | | | | | | | | If vp == NULL, we unlocked and then immediately relocked dvp there. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ufs_vnops.c: styleKonstantin Belousov2021-06-231-2/+4
| | | | | | | | | | | Wrap too long functions declarations. Reviewed by: mckusick Discussed with: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30041
* ffs: Correct the input size check in sysctl_ffs_fsck()Mark Johnston2021-05-311-2/+2
| | | | | | | | | | | Make sure we return an error if no input was specified, since SYSCTL_IN() will report success in that case. Reported by: KMSAN Reviewed by: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30586
* VFS_QUOTACTL(9): allow implementation to indicate busy state changesJason A. Harmening2021-05-303-26/+16
| | | | | | | | | | | | | | | Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Also, add stbool.h to libprocstat modules which #define _KERNEL before including sys/mount.h. Otherwise they'll pull in sys/types.h before defining _KERNEL and therefore won't have the bool definition they need for mp_busy. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30556
* Revert commits 6d3e78ad6c11 and 54256e7954d7Jason A. Harmening2021-05-303-16/+26
| | | | | | | Parts of libprocstat like to pretend they're kernel components for the sake of including mount.h, and including sys/types.h in the _KERNEL case doesn't fix the build for some reason. Revert both the VFS_QUOTACTL() change and the follow-up "fix" for now.
* VFS_QUOTACTL(9): allow implementation to indicate busy state changesJason A. Harmening2021-05-293-26/+16
| | | | | | | | | | Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218
* Move mnt_maxsymlinklen into appropriate fs mount data structuresKonstantin Belousov2021-05-227-18/+18
| | | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-Note: struct mount layout Differential revision: https://reviews.freebsd.org/D30325
* ufs: Avoid M_WAITOK allocations when building a dirhashDon Morris2021-05-201-2/+2
| | | | | | | | | | | | | | | | At this point the directory's vnode lock is held, so blocking while waiting for free pages makes the system more susceptible to deadlock in low memory conditions. This is particularly problematic on NUMA systems as UMA currently implements a strict first-touch policy. ufsdirhash_build() already uses M_NOWAIT for other allocations and already handled failures for the block array allocation, so just convert to M_NOWAIT. PR: 253992 Reviewed by: markj, mckusick, vangyzen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29045
* Fix handling of embedded symbolic links (and history lesson).Kirk McKusick2021-05-172-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original filesystem release (4.2BSD) had no embedded sysmlinks. Historically symbolic links were just a different type of file, so the content of the symbolic link was contained in a single disk block fragment. We observed that most symbolic links were short enough that they could fit in the area of the inode that normally holds the block pointers. So we created embedded symlinks where the content of the link was held in the inode's pointer area thus avoiding the need to seek and read a data fragment and reducing the pressure on the block cache. At the time we had only UFS1 with 32-bit block pointers, so the test for a fastlink was: di_size < (NDADDR + NIADDR) * sizeof(daddr_t) (where daddr_t would be ufs1_daddr_t today). When embedded symlinks were added, a spare field in the superblock with a known zero value became fs_maxsymlinklen. New filesystems set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus filesystems that preceeded this change always read from blocks (since fs->fs_maxsymlinklen == 0) and newer ones used embedded symlinks if they fit. Similarly symlinks created on pre-embedded symlink filesystems always spill into blocks while newer ones will embed if they fit. At the same time that the embedded symbolic links were added, the on-disk directory structure was changed splitting the former u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen. Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can be used to distinguish old directory formats. In retrospect that should have just been an added flag, but we did not realize we needed to know about that change until it was already in production. Code was split into ufs/ffs so that the log structured filesystem could use ufs functionality while doing its own disk layout. This meant that no ffs superblock fields could be used in the ufs code. Thus ffs superblock fields that were needed in ufs code had to be copied to fields in the mount structure. Since ufs_readlink needed to know if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen. The kernel panic that arose to making this fix was triggered when a disk error created an inode of type symlink with no allocated data blocks but a large size. When readlink was called the uiomove was attempted which segment faulted. static int ufs_readlink(ap) struct vop_readlink_args /* { struct vnode *a_vp; struct uio *a_uio; struct ucred *a_cred; } */ *ap; { struct vnode *vp = ap->a_vp; struct inode *ip = VTOI(vp); doff_t isize; isize = ip->i_size; if ((isize < vp->v_mount->mnt_maxsymlinklen) || DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */ return (uiomove(SHORTLINK(ip), isize, ap->a_uio)); } return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred)); } The second part of the "if" statement that adds DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */ is problematic. It never appeared in BSD released by Berkeley because as noted above mnt_maxsymlinklen is 0 for old format filesystems, so will always fall through to the VOP_READ as it should. I had to dig back through `git blame' to find that Rodney Grimes added it as part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.'' He must have brought it across from an earlier FreeBSD. Unfortunately the source-control logs for FreeBSD up to the merger with the AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the agreement to let FreeBSD remain unencumbered, so I cannot pin-point where that line got added on the FreeBSD side. The one change needed here is that mnt_maxsymlinklen is declared as an `int' and should be changed to be `u_int64_t'. This discovery led us to check out the code that deletes symbolic links. Specifically if (vp->v_type == VLNK && (ip->i_size < vp->v_mount->mnt_maxsymlinklen || datablocks == 0)) { if (length != 0) panic("ffs_truncate: partial truncate of symlink"); bzero(SHORTLINK(ip), (u_int)ip->i_size); ip->i_size = 0; DIP_SET(ip, i_size, 0); UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE); if (needextclean) goto extclean; return (ffs_update(vp, waitforupdate)); } Here too our broken symlink inode with no data blocks allocated and a large size will segment fault as we are incorrectly using the test that we have no data blocks to decide that it is an embdedded symbolic link and attempting to bzero past the end of the inode. The test for datablocks == 0 is unnecessary as the test for ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right thing in all cases. The test for datablocks == 0 was added by David Greenman in this commit: Author: David Greenman <dg@FreeBSD.org> Date: Tue Aug 2 13:51:05 1994 +0000 Completed (hopefully) the kernel support for old style "fastlinks". Notes: svn path=/head/; revision=1821 I am guessing that he likely earlier added the incorrect test in the ufs_readlink code. I asked David if he had any recollection of why he made this change. Amazingly, he still had a recollection of why he had made a one-line change more than twenty years ago. And unsurpisingly it was because he had been stuck between a rock and a hard place. FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code base. Prior to that, there were three years of development in all areas of the kernel, including the filesystem code, from the combined set of people including Bill Jolitz, Patchkit contributors, and FreeBSD Project members. The compatibility issue at hand was caused by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite changes David had to find a way to provide compatibility with both the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite. He felt that these changes would provide compatibility with both systems. In his words: ``My recollection is that the 'FASTLINKS' symlinks support in FreeBSD-1.x, as implemented by Curt Mayer, worked differently than 4.4BSD. He used a spare field in the inode to duplicately store the length. When the 4.4BSD-Lite merge was done, the optimized symlinks support for existing filesystems (those that were initialized in FreeBSD-1.x) were broken due to the FFS on-disk structure of 4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to restore the backward compatibility with FreeBSD-1.x filesystems. I think it was the best that could be done in the somewhat urgent circumstances of the post Berkeley-USL settlement. Also, regarding Rod's massive commit with little explanation, some context: John Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to the 386 platform in just 10 days. It was by far the most intense hacking effort of my life. In addition to the porting of tons of FreeBSD-1 code, I think we wrote more than 30,000 lines of new code in that time to deal with the missing pieces and architectural changes of 4.4BSD-Lite. We didn't make many notes along the way. There was a lot of pressure to get something out to the rest of the developer community as fast as possible, so detailed discrete commits didn't happen - it all came as a giant wad, which is why Rod's commit message was worded the way it was.'' Reported by: Chuck Silvers Tested by: Chuck Silvers History by: David Greenman Lawrence MFC after: 1 week Sponsored by: Netflix
* b_vflags update requries bufobj lockKonstantin Belousov2021-04-152-1/+5
| | | | | | | | | | The trunc_dependencies() issue was reported by Alexander Lochmann <alexander.lochmann@tu-dortmund.de>, who found the problem by performing lock analysis using LockDoc, see https://doi.org/10.1145/3302424.3303948. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Ensure that the mount command shows "with quotas" when quotas are enabled.Kirk McKusick2021-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | When quotas are enabled with the quotaon(8) command, it sets the MNT_QUOTA flag in the mount structure mnt_flag field. The mount structure holds a cached copy of the filesystem statfs structure in mnt_stat that includes a copy of the mnt_flag field in mnt_stat.f_flags. The mnt_stat structure may not be updated for hours. Since the mount command requests mount details using the MNT_NOWAIT option, it gets the mount's mnt_stat statfs structure whose f_flags field does not yet show the MNT_QUOTA flag being set in mnt_flag. The fix is to have quotaon(8) set the MNT_QUOTA flag in both mnt_flag and in mnt_stat.f_flags so that it will be immediately visible to callers of statfs(2). Reported by: Christos Chatzaras Tested by: Christos Chatzaras PR: 254682 MFC after: 3 days Sponsored by: Netflix
* softdep_unmount: assert that no dandling dependencies are leftKonstantin Belousov2021-03-121-0/+7
| | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* FFS: assign fully initialized struct mount_softdeps to um_softdepKonstantin Belousov2021-03-121-33/+35
| | | | | | | | | | | | Other threads observing the non-NULL um_softdep can assume that it is safe to use it. This is important for ro->rw remounts where change from read-only to read-write status cannot be made atomic. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* Assert that um_softdep is NULL on free(ump), i.e. softdep_unmount() was calledKonstantin Belousov2021-03-121-0/+2
| | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* ffs_mount: when remounting ro->rw and sbupdate failed, cleanup softdepsKonstantin Belousov2021-03-121-0/+2
| | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* softdep_unmount: handle spurious wakeupsKonstantin Belousov2021-03-121-2/+5
| | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* softdep_flush(): do not access ump after we acked FLUSH_EXIT and unlocked SU ↵Konstantin Belousov2021-03-121-2/+7
| | | | | | | | | | | | lock otherwise we might follow a pointer in the freed memory. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* ffs: clear MNT_SOFTDEP earlier when remounting rw to roKonstantin Belousov2021-03-122-12/+37
| | | | | | | | | | | | | | | | | | Suppose that we remount rw->ro and in parallel some reader tries to instantiate a vnode, e.g. during lookup. Suppose that softdep_unmount() already started, but we did not cleared the MNT_SOFTDEP flag yet. Then ffs_vgetf() calls into softdep_load_inodeblock() which accessed destroyed hashes and freed memory. Set/clear fs_ronly simultaneously (WRT to files flush) with MNT_SOFTDEP. It might be reasonable to move the change of fs_ronly to under MNT_ILOCK, but no readers take it. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* Rework MOUNTED/DOING SOFTDEP/SUJ macrosKonstantin Belousov2021-03-122-6/+5
| | | | | | | | | | | | | Now MNT_SOFTDEP indicates that SU are active in any variant +-J, and SU+J is indicated by MNT_SOFTDEP | MNT_SUJ combination. The reason is that unmount will be able to easily hide SU from other operations by clearing MNT_SOFTDEP while keeping the record of the active journal. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* ffs softdep: clear ump->um_softdep on softdep_unmount()Konstantin Belousov2021-03-121-21/+26
| | | | | | | | Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* ffs_extern.h: Add comments for ffs_vgetf() flagsKonstantin Belousov2021-03-121-4/+6
| | | | | | | Requested and reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* Add FFSV_FORCEINODEDEP flag for ffs_vgetf()Konstantin Belousov2021-03-123-7/+10
| | | | | | | | | | | | It will be used to allow SU flush code to sync the volume while external consumers see that SU is already disabled on the filesystem. Use it where ffs_vgetf() called by SU code to process dependencies. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* simplify journal_mount: move the out label after success blockKonstantin Belousov2021-03-121-19/+19
| | | | | | | | | | This removes the need to check for error == 0. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* FFS extattr: fix handling of the tailKonstantin Belousov2021-03-021-6/+16
| | | | | | | | | | | | | | | | | | | There are three issues with change that stopped truncating ea area before write, and resulted in possible zero tail in the ea area: - Truncate to zero checked i_ea_len after the reference was dropped, making the last drop effectively truncate to zero length always. - Loop to fill uio for zeroing specified too large length, that triggered assert in normal situation. - Integrity check could trip over the tail, instead we must allow partial header or header with zero length, and clamp ea image in memory at it. Reported by: arichardson Tested by: arichardson, pho Sponsored by: The FreeBSD Foundation MFC after: 3 days Fixup: 5e198e7646a27412c0541719f7bf1bbc0bd89223 Differential Revision: https://reviews.freebsd.org/D28999
* Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committingKonstantin Belousov2021-02-241-0/+20
| | | | | | | | | | | | softdep_prealloc() must be called to ensure enough journal space is available, before ffs_extwrite(). Also it must be done before taking ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping the vnode lock. Reviewed by: mckusick Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation
* ffs_close_ea: do not relock vnode under lock_eaKonstantin Belousov2021-02-241-10/+27
| | | | | | | | | | | | | | | | ffs_lock_ea is after the vnode lock, so vnode must not be relocked under lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is dropped, and only truncate to length zero, since this is the only mode supported by ffs_truncate() for EAs. Previously code did truncation and then write. Zero the part of the ext area that is unused, if truncation is due but not done because ea area is not zero-length. Reviewed by: mckusick Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation
* ffs_vnops.c: styleKonstantin Belousov2021-02-241-23/+25
| | | | | | | | | Use local var to shorten ap->a_vp expression. Reviewed by: mckusick Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation
* ffs: do not call softdep_prealloc() from UFS_BALLOC()Konstantin Belousov2021-02-242-5/+5
| | | | | | | | | | | | Do it in ffs_write(), where we can gracefuly handle relock and its consequences. In particular, recheck the v_data to see if the vnode reclamation ended, and return EBADF when we cannot proceed with the write. Reviewed by: mckusick Reported by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation
* ffs_reallocblks: change the guard for softdep_prealloc() call to DOINGSUJ()Konstantin Belousov2021-02-241-1/+1
| | | | | | | | | | | | | instead of DOINGSOFTDEP(). The softdep_prealloc() function does nothing in SU case. Note that the call should be safe with regard to the vnode relock, because it is called with MNT_NOWAIT, which does not descend into fsync. Reviewed by: mckusick Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation
* vnode: move write cluster support data to inodes.Konstantin Belousov2021-02-214-3/+7
| | | | | | | | | | | | The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679
* Remove #define _KERNEL hacks from libprocstatKonstantin Belousov2021-02-213-9/+14
| | | | | | | | | | | | | | | | | | Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in userspace, assuming that the consumer has an idea what it is for. Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h, sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the same caveat. Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h being unusable in userspace, where it override struct buf with its own definition. Instead, provide struct m_buf and struct m_vnode and adapt code to use local variants. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D28679
* UFS snapshots: properly set the vm object size.Konstantin Belousov2021-02-161-0/+4
| | | | | | | | | | | | | | | | | | Citing Kirk: The previous code [before 8563de2f2799b2cb -- kib] did not call vnode_pager_setsize() but worked because later in ffs_snapshot() it does a UFS_WRITE() to output the snaplist. Previously the UFS_WRITE() allocated the extra block at the end of the file which caused it to do the needed vnode_pager_setsize(). But the new code had already allocated the extra block, so UFS_WRITE() did not extend the size and thus did not do the vnode_pager_setsize(). PR: 253158 Reported by: Harald Schmalzbauer <bugzilla.freebsd@omnilan.de> Reviewed by: mckusick Tested by: cy Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crashKirk McKusick2021-02-121-67/+70
| | | | | | | | | | | | | | | | | | | | | | | The panic reported in 253158 arises because the /mnt/.snap/.factory snapshot allocated the last block in the filesystem. The snapshot code allocates the last block in the filesystem as a way of setting its length to be the size of the filesystem. Part of taking a snapshot is to remove all the earlier snapshots from the image of the newest snapshot so that newer snapshots will not claim the blocks of the earlier snapshots. The panic occurs when the new snapshot finds that both it and an earlier snapshot claim the same block. The fix is to set the size of the snapshot to be one block after the last block in the filesystem. This block can never be allocated since it is not a valid block in the filesystem. This extra block is used as a place to store the initial list of blocks that the snapshot has already copied and is used to avoid a deadlock in and speed up the ffs_copyonwrite() function. Reported by: Harald Schmalzbauer Tested by: Peter Holm PR: 253158 Sponsored by: Netflix
* fifo: minor comment and assert improvements.Konstantin Belousov2021-02-121-3/+4
| | | | | | | | | | In particular, replace a note that reload through vget() is obsoleted, with explanation why this code is required. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_unlock: assert that IN_ENDOFF is not leaked past locked scopeKonstantin Belousov2021-02-121-0/+3
| | | | | | | | | | This catches both missed processing of IN_ENDOFF and missed application of VOP_VPUT_PAIR() after VOP that created an entry in the directory. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs softdep: Force processing of VI_OWEINACT vnodes when there is inode shortageKonstantin Belousov2021-02-122-0/+63
| | | | | | | | | | Such vnodes prevent inode reuse, and should be force-cleared when ffs_valloc() is unable to find a free inode. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* softdep_request_cleanup: wait for softdep_request_clean_flush() to passKonstantin Belousov2021-02-121-0/+6
| | | | | | | | | | | if we noted a parallel request is active and declined to overflow the system with parallel redundant sync of the vnodes. But we need to wait for the flush to finish to see if there are any freed resources. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_inactive(): stop hiding ERELOOKUP from ffs_truncate(), return it.Konstantin Belousov2021-02-122-6/+5
| | | | | | | | | | VFS should retry inactivation when possible, then. This should provide timely removal of unlinked unreferenced inodes. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Stop ignoring ERELOOKUP from VOP_INACTIVE()Konstantin Belousov2021-02-121-1/+7
| | | | | | | | | | | | | When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs vnops: brace softdep_prelink() with DOINGSUJ instead of DOINGSOFTDEPKonstantin Belousov2021-02-121-6/+6
| | | | | | | | | | | | because softdep_prelink() is reverted to NOP for non-J case. There is no need to do anything before ufs_direnter() in SU/non-J case, everything required to sync the directory is done in VOP_VPUT_PAIR(). Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 week Sponsored by: The FreeBSD Foundation
* ffs softdep: remove will_direnter argument of softdep_prelink()Konstantin Belousov2021-02-123-45/+15
| | | | | | | | | | | | | | | | | | | Originally this was done in 8a1509e442bc9a075 to forcibly cover cases where a hole in the directory could be created by extending into indirect block, since dependency of writing out indirect block is not tracked. This results in excessive amount of fsyncing the directories, where all creation of new entry forced fsync before it. This is not needed, it is enough to fsync when IN_NEEDSYNC is set, and VOP_VPUT_PAIR() provides the required hook to only perform required syncing. The series of changes culminating in this commit puts the performance of metadata-intensive loads back to that before 8a1509e442bc9a075. Analyzed by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_direnter: directory truncation does not need special case for renameKonstantin Belousov2021-02-124-26/+23
| | | | | | | | | | | | | | | | | | | In ufs_rename case, tdvp is locked from the place where ufs_direnter() is done till VOP_VPUT_PAIR(), which means that we no longer need to specially handle rename in ufs_direnter(). Truncation, if possible, is done in the same way in ffs_vput_pair() both for rename and other VOPs calling ufs_direnter(). Remove isrename argument and set IN_ENDOFF if ufs_direnter() succeeded and directory needs truncation. In ffs_vput_pair(), stop verifying the condition that directory needs truncation when IN_ENDOFF is set, instead assert that the condition is true. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation