linux_old1

Commit Graph

Author	SHA1	Message	Date
Jan Kara	fabf7f29b3	fanotify: Use interruptible wait when waiting for permission events When waiting for response to fanotify permission events, we currently use uninterruptible waits. That makes code simple however it can cause lots of processes to end up in uninterruptible sleep with hard reboot being the only alternative in case fanotify listener process stops responding (e.g. due to a bug in its implementation). Uninterruptible sleep also makes system hibernation fail if the listener gets frozen before the process generating fanotify permission event. Fix these problems by using interruptible sleep for waiting for response to fanotify event. This is slightly tricky though - we have to detect when the event got already reported to userspace as in that case we must not free the event. Instead we push the responsibility for freeing the event to the process that will write response to the event. Reported-by: Orion Poplawski <orion@nwra.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	40873284d7	fanotify: Track permission event state Track whether permission event got already reported to userspace and whether userspace already answered to the permission event. Protect stores to this field together with updates to ->response field by group->notification_lock. This will allow aborting wait for reply to permission event from userspace. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	ca6f86998d	fanotify: Simplify cleaning of access_list Simplify iteration cleaning access_list in fanotify_release(). That will make following changes more obvious. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	f7db89accc	fsnotify: Create function to remove event from notification list Create function to remove event from the notification list. Later it will be used from more places. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	8c5544666c	fanotify: Move locking inside get_one_event() get_one_event() has a single caller and that just locks notification_lock around the call. Move locking inside get_one_event() as that will make using ->response field for permission event state easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:40:46 +01:00
Jan Kara	af6a511306	fanotify: Fold dequeue_event() into process_access_response() Fold dequeue_event() into process_access_response(). This will make changes to use of ->response field easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 11:49:36 +01:00
Jan Kara	53136b393c	fanotify: Select EXPORTFS Fanotify now uses exportfs_encode_inode_fh() so it needs to select EXPORTFS. Fixes: `e9e0c89030` "fanotify: encode file identifier for FAN_REPORT_FID" Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-14 17:46:33 +01:00
Amir Goldstein	e7fce6d94c	fanotify: report FAN_ONDIR to listener with FAN_REPORT_FID dirent modification events (create/delete/move) do not carry the child entry name/inode information. Instead, we report FAN_ONDIR for mkdir/rmdir so user can differentiate them from creat/unlink. This is consistent with inotify reporting IN_ISDIR with dirent events and is useful for implementing recursive directory tree watcher. We avoid merging dirent events referring to subdirs with dirent events referring to non subdirs, otherwise, user won't be able to tell from a mask FAN_CREATE\|FAN_DELETE\|FAN_ONDIR if it describes mkdir+unlink pair or rmdir+create pair of events. For backward compatibility and consistency, do not report FAN_ONDIR to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR to user in FAN_REPORT_FID mode for all event types. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:47:32 +01:00
Amir Goldstein	235328d1fa	fanotify: add support for create/attrib/move/delete events Add support for events with data type FSNOTIFY_EVENT_INODE (e.g. create/attrib/move/delete) for inode and filesystem mark types. The "inode" events do not carry enough information (i.e. path) to report event->fd, so we do not allow setting a mask for those events unless group supports reporting fid. The "inode" events are not supported on a mount mark, because they do not carry enough information (i.e. path) to be filtered by mount point. The "dirent" events (create/move/delete) report the fid of the parent directory where events took place without specifying the filename of the child. In the future, fanotify may get support for reporting filename information for those events. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:43:23 +01:00
Amir Goldstein	83b535d289	fanotify: support events with data type FSNOTIFY_EVENT_INODE When event data type is FSNOTIFY_EVENT_INODE, we don't have a refernece to the mount, so we will not be able to open a file descriptor when user reads the event. However, if the listener has enabled reporting file identifier with the FAN_REPORT_FID init flag, we allow reporting those events and we use an identifier inode to encode fid. The inode to use as identifier when reporting fid depends on the event. For dirent modification events, we report the modified directory inode and we report the "victim" inode otherwise. For example: FS_ATTRIB reports the child inode even if reported on a watched parent. FS_CREATE reports the modified dir inode and not the created inode. [JK: Fixup condition in fanotify_group_event_mask()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:36 +01:00
Amir Goldstein	0321e03cb4	fanotify: check FS_ISDIR flag instead of d_is_dir() All fsnotify hooks set the FS_ISDIR flag for events that happen on directory victim inodes except for fsnotify_perm(). Add the missing FS_ISDIR flag in fsnotify_perm() hook and let fanotify_group_event_mask() check the FS_ISDIR flag instead of checking if path argument is a directory. This is needed for fanotify support for event types that do not carry path information. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:36 +01:00
Amir Goldstein	0a20df7ed3	fsnotify: report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events We need to report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events for fanotify, because fanotify API requires the user to explicitly request events on directories by FAN_ONDIR flag. inotify never reported IN_ISDIR with those events. It looks like an oversight, but to avoid the risk of breaking existing inotify programs, mask the FS_ISDIR flag out when reprting those events to inotify backend. We also add the FS_ISDIR flag with FS_ATTRIB event in the case of rename over an empty target directory. inotify did not report IN_ISDIR in this case, but it normally does report IN_ISDIR along with IN_ATTRIB event, so in this case, we do not mask out the FS_ISDIR flag. [JK: Simplify the checks in fsnotify_move()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	73072283a2	fanotify: use vfs_get_fsid() helper instead of vfs_statfs() This is a cleanup that doesn't change any logic. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	ec86ff5689	vfs: add vfs_get_fsid() helper Wrapper around statfs() interface. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	77115225ac	fanotify: cache fsid in fsnotify_mark_connector For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on every event. To avoid having to call vfs_statfs() on every event to get fsid, we store the fsid in fsnotify_mark_connector on the first time we add a mark and on handle event we use the cached fsid. Subsequent calls to add mark on the same object are expected to pass the same fsid, so the call will fail on cached fsid mismatch. If an event is reported on several mark types (inode, mount, filesystem), all connectors should already have the same fsid, so we use the cached fsid from the first connector. [JK: Simplify code flow around fanotify_get_fid() make fsid argument of fsnotify_add_mark_locked() unconditional] Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	a8b13aa20a	fanotify: enable FAN_REPORT_FID init flag When setting up an fanotify listener, user may request to get fid information in event instead of an open file descriptor. The fid obtained with event on a watched object contains the file handle returned by name_to_handle_at(2) and fsid returned by statfs(2). Restrict FAN_REPORT_FID to class FAN_CLASS_NOTIF, because we have have no good reason to support reporting fid on permission events. When setting a mark, we need to make sure that the filesystem supports encoding file handles with name_to_handle_at(2) and that statfs(2) encodes a non-zero fsid. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	5e469c830f	fanotify: copy event fid info to user If group requested FAN_REPORT_FID and event has file identifier, copy that information to user reading the event after event metadata. fid information is formatted as struct fanotify_event_info_fid that includes a generic header struct fanotify_event_info_header, so that other info types could be defined in the future using the same header. metadata->event_len includes the length of the fid information. The fid information includes the filesystem's fsid (see statfs(2)) followed by an NFS file handle of the file that could be passed as an argument to open_by_handle_at(2). Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	e9e0c89030	fanotify: encode file identifier for FAN_REPORT_FID When user requests the flag FAN_REPORT_FID in fanotify_init(), a unique file identifier of the event target object will be reported with the event. The file identifier includes the filesystem's fsid (i.e. from statfs(2)) and an NFS file handle of the file (i.e. from name_to_handle_at(2)). The file identifier makes holding the path reference and passing a file descriptor to user redundant, so those are disabled in a group with FAN_REPORT_FID. Encode fid and store it in event for a group with FAN_REPORT_FID. Up to 12 bytes of file handle on 32bit arch (16 bytes on 64bit arch) are stored inline in fanotify_event struct. Larger file handles are stored in an external allocated buffer. On failure to encode fid, we print a warning and queue the event without the fid information. [JK: Fold part of later patched into this one to use exportfs_encode_inode_fh() right away] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:34 +01:00
Amir Goldstein	bb2f7b4542	fanotify: open code fill_event_metadata() The helper is quite trivial and open coding it will make it easier to implement copying event fid info to user. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:37:01 +01:00
Amir Goldstein	33913997d5	fanotify: rename struct fanotify_{,perm_}event_info struct fanotify_event_info "inherits" from struct fsnotify_event and therefore a more appropriate (and short) name for it is fanotify_event. Same for struct fanotify_perm_event_info, which now "inherits" from struct fanotify_event. We plan to reuse the name struct fanotify_event_info for user visible event info record format. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:25:45 +01:00
Amir Goldstein	a0a92d261f	fsnotify: move mask out of struct fsnotify_event Common fsnotify_event helpers have no need for the mask field. It is only used by backend code, so move the field out of the abstract fsnotify_event struct and into the concrete backend event structs. This change packs struct inotify_event_info better on 64bit machine and will allow us to cram some more fields into struct fanotify_event_info. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:25:11 +01:00
Amir Goldstein	45a9fb3725	fsnotify: send all event types to super block marks So far, existence of super block marks was checked only on events with data type FSNOTIFY_EVENT_PATH. Use the super block of the "to_tell" inode to report the events of all event types to super block marks. This change has no effect on current backends. Soon, this will allow fanotify backend to receive all event types on a super block mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:20:30 +01:00
Linus Torvalds	7c2614bf7a	a set of small smb3 fixes, some fixing various crediting issues discovered during xfstest runs, five for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxLV8IACgkQiiy9cAdy T1HRIgv/fieDI6HBXAoK9mXzVLJXCZ1IxVZTygC3Xn/bCaxHerS+QSe88w0r1QTZ KLCoT7pBBy06uTowX4/cAF0IYTStsYWngBsH3+HL4EEchkZAVe02wAJ8QOCXs9YU UjoMvNLmcyrzwNt8EduNM2jUUpOL8EHiZOQKUvLt1Vdo3xBtyB3Nkttji92YLOQb Cxs0RJdKNaalMphOQwAtbdbZwlL79KGSA0gMm9W2VD744YKMjQ3uGXaTuoe6+w19 voYKVTSelgyZg51yzF20x1Pl3OV/8L6tWgN1BIcKZbfGBshJhhcD93Pef7bmhxNS Y2yF3SPr/zcdgEkwigVlNs1uBCBBeU04/yQr+YvJwNcjsRTcTKOMuquxq4gXM1P8 7f8gA1u0n1H70YKK29sd3sl7Iu8rHVacwlxjfoAcCjMe0VQvOJtR+V/zwP2uWr8T IjX05unBe/B1v+NXqNdkOpE6ZaiJtt2ny5NDZhfeubOouBLsey7/g/gSx48ICrq+ U20kKsb6 =CCH2 -----END PGP SIGNATURE----- Merge tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb3 fixes from Steve French: "A set of small smb3 fixes, some fixing various crediting issues discovered during xfstest runs, five for stable" * tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData smb3: add credits we receive from oplock/break PDUs CIFS: Fix mounts if the client is low on credits CIFS: Do not assume one credit for async responses CIFS: Fix credit calculations in compound mid callback CIFS: Fix credit calculation for encrypted reads with errors CIFS: Fix credits calculations for reads with errors CIFS: Do not reconnect TCP session in add_credits() smb3: Cleanup license mess CIFS: Fix possible hang during async MTU reads and writes cifs: fix memory leak of an allocated cifs_ntsd structure	2019-01-26 15:38:22 -08:00
Linus Torvalds	6b8f915916	for-linus-20190125 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxLdgsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoVGD/4sGYQqfiXogQIJYbPH2RRPrJuLIIITjiAv lPXX1wx/tvz/ktwKJE2OiWTES0JjdH1HlC+7/0L/fLb8DXiKBmFuUHlwhureoL9Y o//BIQKuaje35kyHITTy2UAJOOqnNtJTaAP2AfkL+eOcj/V/G5rJIfLGs9QtuAR7 sRJ+uhg1EbW/+uO0bULDmG6WUjxFu8mqcw3i6g0VVLVOnXB2EKcZTl3KPrdAXrUp XtmouERga6OfAUSJyZmTSV136mL+opRB2WFFVeIzjQfLmyItDGbSX/YPS8oJ2pow v7630F+CMrd4aKpqqtnAhfWpGqd0Xw7cYfZ9MKTJmZPmGzf9a1fQFpmgZosD4Dh3 7MrhboU4TUt9PdXESA7CmE7LkTp99ghfj5/ysKrSV5h3HsH2RbLxJk91Rx3vmAWD u1xWRYL+GYLH6ZwOLvM1iqBrrLN3mUyrx98SaMgoXuqNzmQmgz9LPeA0Gt09FJbo uj+ebg4dRwuThjni4xQhl3zL2RQy7nlTDFDdKOz/XoiYk2NUVksss+sxGjNarHj0 b5pCD4HOp57OreGExaOARpBRah5HSNdQtBRsIOnbyEq6f/e1LsIY23Z9nNF0deGO sZzgsbnsn+zg8bC6T/Gk4UY6XdCcgaS3SL04SVKAE3lO6A4C/Awo8DgD9Bl1zpC1 HQlNkl5fBg== =iucY -----END PGP SIGNATURE----- Merge tag 'for-linus-20190125' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A collection of fixes for this release. This contains: - Silence sparse rightfully complaining about non-static wbt functions (Bart) - Fixes for the zoned comments/ioctl documentation (Damien) - direct-io fix that's been lingering for a while (Ernesto) - cgroup writeback fix (Tejun) - Set of NVMe patches for nvme-rdma/tcp (Sagi, Hannes, Raju) - Block recursion tracking fix (Ming) - Fix debugfs command flag naming for a few flags (Jianchao)" * tag 'for-linus-20190125' of git://git.kernel.dk/linux-block: block: Fix comment typo uapi: fix ioctl documentation blk-wbt: Declare local functions static blk-mq: fix the cmd_flag_name array nvme-multipath: drop optimization for static ANA group IDs nvmet-rdma: fix null dereference under heavy load nvme-rdma: rework queue maps handling nvme-tcp: fix timeout handler nvme-rdma: fix timeout handler writeback: synchronize sync(2) against cgroup writeback membership switches block: cover another queue enter recursion via BIO_QUEUE_ENTERED direct-io: allow direct writes to empty inodes	2019-01-26 12:42:41 -08:00
Ronnie Sahlberg	a5f1a81f70	cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData Was helpful in debug for some recent problems. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:06 -06:00
Ronnie Sahlberg	2e5700bdde	smb3: add credits we receive from oplock/break PDUs Otherwise we gradually leak credits leading to potential hung session. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:06 -06:00
Pavel Shilovsky	6a9cbdd1ce	CIFS: Fix mounts if the client is low on credits If the server doesn't grant us at least 3 credits during the mount we won't be able to complete it because query path info operation requires 3 credits. Use the cached file handle if possible to allow the mount to succeed. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:06 -06:00
Pavel Shilovsky	0fd1d37b05	CIFS: Do not assume one credit for async responses If we don't receive a response we can't assume that the server granted one credit. Assume zero credits in such cases. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:06 -06:00
Pavel Shilovsky	3d3003fce8	CIFS: Fix credit calculations in compound mid callback The current code doesn't do proper accounting for credits in SMB1 case: it adds one credit per response only if we get a complete response while it needs to return it unconditionally. Fix this and also include malformed responses for SMB2+ into accounting for credits because such responses have Credit Granted field, thus nothing prevents to get a proper credit value from them. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:06 -06:00
Pavel Shilovsky	ec678eae74	CIFS: Fix credit calculation for encrypted reads with errors We do need to account for credits received in error responses to read requests on encrypted sessions. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:05 -06:00
Pavel Shilovsky	8004c78c68	CIFS: Fix credits calculations for reads with errors Currently we mark MID as malformed if we get an error from server in a read response. This leads to not properly processing credits in the readv callback. Fix this by marking such a response as normal received response and process it appropriately. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:52:05 -06:00
Pavel Shilovsky	ef68e83184	CIFS: Do not reconnect TCP session in add_credits() When executing add_credits() we currently call cifs_reconnect() if the number of credits is zero and there are no requests in flight. In this case we may call cifs_reconnect() recursively twice and cause memory corruption given the following sequence of functions: mid1.callback() -> add_credits() -> cifs_reconnect() -> -> mid2.callback() -> add_credits() -> cifs_reconnect(). Fix this by avoiding to call cifs_reconnect() in add_credits() and checking for zero credits in the demultiplex thread. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 14:50:57 -06:00
Linus Torvalds	c04e2a780c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlxJ6LUACgkQnJ2qBz9k QNn75Qf/U60RPju54Kdyepo4iMFoQXS/VJg87bpRetUdN9vKU3CbnyYAx43ftPOk GH+UU2u+24DX2AWbzy4yx2CJP1YUUF1oa2QkSW2P9WWn1NYU3goWha1Il8bc96Kq TQU/ypl83RoaFtw4JwpL27ot39UH5WWafYiackqzOYIuAdpgei2aaZFNvVeb9Ije 9udt7ygcTn94DVdYAFI6NeH5KAA+kzpIyWWRCWlmESsNYnCUQntSLm3tQG8sTHZt sXk2Kp1r7XFl/zkg9iFSOhSIIyVizSC4cHrE0iJ/0kLgoJjfjbOa1jZ5Eeq8yTc3 UsXI5ICNS3RuZWwSWkh759JnhQ1R/A== =cB63 -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull inotify fix from Jan Kara: "Fix a file refcount leak in an inotify error path" * tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Fix fd refcount leak in inotify_add_watch().	2019-01-25 06:03:24 +13:00
Thomas Gleixner	b0b2cac7e2	smb3: Cleanup license mess Precise and non-ambiguous license information is important. The recently added aegis header file has a SPDX license identifier, which is nice, but at the same time it has a contradictionary license boiler plate text. SPDX-License-Identifier: GPL-2.0 versus * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. Oh well. Assuming that the SPDX identifier is correct and according to x86/hyper-v contributions from Microsoft GPL V2 only is the usual license. Remove the boiler plate as it is wrong and even if correct it is redundant. Fixes: `eccb4422cf` ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steve French <sfrench@samba.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 09:37:33 -06:00
Pavel Shilovsky	acc58d0bab	CIFS: Fix possible hang during async MTU reads and writes When doing MTU i/o we need to leave some credits for possible reopen requests and other operations happening in parallel. Currently we leave 1 credit which is not enough even for reopen only: we need at least 2 credits if durable handle reconnect fails. Also there may be other operations at the same time including compounding ones which require 3 credits at a time each. Fix this by leaving 8 credits which is big enough to cover most scenarios. Was able to reproduce this when server was configured to give out fewer credits than usual. The proper fix would be to reconnect a file handle first and then obtain credits for an MTU request but this leads to bigger code changes and should happen in other patches. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2019-01-24 09:37:33 -06:00
Colin Ian King	73aaf920cc	cifs: fix memory leak of an allocated cifs_ntsd structure The call to SMB2_queary_acl can allocate memory to pntsd and also return a failure via a call to SMB2_query_acl (and then query_info). This occurs when query_info allocates the structure and then in query_info the call to smb2_validate_and_copy_iov fails. Currently the failure just returns without kfree'ing pntsd hence causing a memory leak. Currently, data is allocated if it's not already pointing to a buffer, so it needs to be kfree'd only if was allocated in query_info, so the fix adds an allocated flag to track this. Also set dlen to zero on an error just to be safe since data is kfree'd. Also set errno to -ENOMEM if the allocation of data fails. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Dan Carpener <dan.carpenter@oracle.com>	2019-01-24 09:37:33 -06:00
Tejun Heo	7fc5854f8c	writeback: synchronize sync(2) against cgroup writeback membership switches sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-01-22 14:39:38 -07:00
Ernesto A. Fernández	8b9433eb4d	direct-io: allow direct writes to empty inodes On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently not allowed to create blocks for an empty inode. This confusion comes from trying to bit shift a negative number, so check the size of the inode first. The problem is most visible for hfsplus, because the fallback to buffered I/O doesn't happen and the write fails with EIO. This is in part the fault of the module, because it gives a wrong return value on ->get_block(); that will be fixed in a separate patch. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-01-22 08:26:44 -07:00
Thomas Gleixner	74827ee295	ceph: quota: cleanup license mess Precise and non-ambiguous license information is important. The recently added quota.c file has a SPDX license identifier, which is nice, but at the same time it has a contradictionary license boiler plate text. SPDX-License-Identifier: GPL-2.0 versus * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. Oh well. As the other ceph related files are licensed under the GPL v2 only, it's assumed that the SPDX id is correct and the boiler plate was randomly copied into that patch. Remove the boiler plate as it is wrong and even if correct it is redundant. Fixes: `fb18a57568` ("ceph: quota: add initial infrastructure to support cephfs quotas") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Henriques <lhenriques@suse.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Sage Weil <sage@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: ceph-devel@vger.kernel.org Acked-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-01-21 14:53:23 +01:00
Yan, Zheng	d95e674c01	ceph: clear inode pointer when snap realm gets dropped by its inode snap realm and corresponding inode have pointers to each other. The two pointer should get clear at the same time. Otherwise, snap realm's pointer may reference freed inode. Cc: stable@vger.kernel.org # 4.17+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-01-21 14:52:41 +01:00
Linus Torvalds	1e556ba3b6	Fixes for pstore/ram - Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan) - Avoid allocation and leak of platform data -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlxE/QYWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsVXD/94pJsNHogOle3XnZJst/1cV4+R KTyQ6QQ1184lI5FPGJliV3veCuKnTY4Q7e5vCl2mzEvds+VRxu2yUm+LFxKVj04W yiRNv8Yn3+3eKxNufqLa9ySbsSZyIpoCKQ/20nbC5cNM3r+Gq12GUnDBixtGdl9V fB316emIaKR70Qvi9KlfwDTcjh+SWBR2u8L0YsmTaoJkjtaCJhekN3hfya6Wnjf5 1sj1fCDzteda/Ld0gXoImHPvt0UcHPJDp1+ZhY0ja64QIuoWwDxsisp4X4nS8iFD oak9BWRDhRwSzPnyCu8BI3Hc4ayxcBYR/vmBS12LTkIJVdR1CaqfeDYvZexyX04n TYZnmPeZ+2R3wzA9aCLjUDEAYNi403sLqIvR99jlrOoiMo+5Bf05TvBG8lND2dQg A13854vg9ssxfiuEmvxJTg8SLtcFiqEtrzZhqaLVfuf4cqaN3o0Ed+A0gjZadwwD TPxiMh4YEZuw+qszZwyzi7uECKOvJIJeLCL8CApPvihCNFy2vKF5wj58ZizXXmoT Orq6DJ1ITx4DTkYIgtiTC9HKDuvNDZ/ncdb5QoqlpoTe01r8GQOzffZ+tD3xRTNc 8+yAdK0/bTjLVNbFX/Q4dwagh0I8EQSrdw8CmmQs0lsCvfyYTfHz8uPeBQ7rrzeN 2r1H7a/OKpL/duTkEw== =m6sX -----END PGP SIGNATURE----- Merge tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fixes from Kees Cook: - Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan) - Avoid allocation and leak of platform data * tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Avoid allocation and leak of platform data pstore/ram: Fix console ramoops to show the previous boot logs	2019-01-21 13:12:03 +13:00
Kees Cook	5631e8576a	pstore/ram: Avoid allocation and leak of platform data Yue Hu noticed that when parsing device tree the allocated platform data was never freed. Since it's not used beyond the function scope, this switches to using a stack variable instead. Reported-by: Yue Hu <huyue2@yulong.com> Fixes: `35da60941e` ("pstore/ram: add Device Tree bindings") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>	2019-01-20 14:44:52 -08:00
Linus Torvalds	1be969f468	for-5.0-rc2-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxElHsACgkQxWXV+ddt WDsF8Q/6A+l/Ku8D+xSmw+JGXGNroX1V62sxVYWqIgrkYNg6iSMjicqF3aN1bCbR LqvOQ5skerrKteNYIPbTTOD5Xp37ccinSeEWEF0ktFkeU1G6yJo7aRsnQlO1sduk 2PKUOA1/PdeTBiOj1bej/1ybhtIW+d0MaoPtnUCMC8DD/ihfmU332+KC8VmRUYGZ 4kT1DvKfBOSVz1UTl6OJdWo76crvjz0eGVnH1YG7DoFbpVzAbphHE7+aC/WkHQme X5Ux2NtYVMe/0IGAC7kJnj4jCZ1weAdlmvzzagjzGtWdhWgTxntiJ/FVJs9nO8Dm G/pVtD8RVjgMkPOWcfw5fdderrBJqjVGgl4VDrDLqjO9OTGNFJs+HgcPRi6Oli28 sA+HG+U34YzSfKY0L9eAmpkNxMjWywBuXTQIAlMhHNZCL0vZ2K5EYa3dTp9OswAW IIcOh/LfZxiomvMvUqQWcRCy5y/b+cYjOjbHwkrw+ewd3IWXVLG8YLMyZI3vnHKu /f1xn6KCap9a1cS4LwyK6gzstEugn0MYmnmD/Jx8I1BJFBt55Q31ES6tPHgTmh/d QjveRjMkxNCql4h5Hq0+LiXoSoocBmsO0wrs2QrWSx4PBnJsvjySnWr/8GfAOj79 BhnuQFxbr/BkNyBzvKrjoI+zZnrVm0cBU59lP6PzN75+kQTaIfs= =hGlM -----END PGP SIGNATURE----- Merge tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A handful of fixes (some of them in testing for a long time): - fix some test failures regarding cleanup after transaction abort - revert of a patch that could cause a deadlock - delayed iput fixes, that can help in ENOSPC situation when there's low space and a lot data to write" * tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: wakeup cleaner thread when adding delayed iput btrfs: run delayed iputs before committing btrfs: wait on ordered extents on abort cleanup btrfs: handle delayed ref head accounting cleanup in abort Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"	2019-01-21 07:35:26 +13:00
Linus Torvalds	b0efca46b5	NFS client fixes for Linux 5.0 Stable bugfixes: - Fix TCP receive code on archs with flush_dcache_page() Other bugfixes: - Fix error code in rpcrdma_buffer_create() - Fix a double free in rpcrdma_send_ctxs_create() - Fix kernel BUG at kernel/cred.c:825 - Fix unnecessary retry in nfs42_proc_copy_file_range() - Ensure rq_bytes_sent is reset before request transmission - Ensure we respect the RPCSEC_GSS sequence number limit - Address Kerberos performance/behavior regression -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxCH1kACgkQ18tUv7Cl QOs4rBAAymqyhUzNgap1TX/KezFxqii7CVMVabrA5eGN+ZXbSVAZkwy7BMZWwVIp tEvD7lxWtF11x7bQDw7Xz+ruBCjLdD0RQIFnlBpVKqsRy9oSRA4PsgSbuIFaw+gX Bun4Z0xmOCPF7knRv6gQonArEZfHeokIIN8AtSBtWVByaOrnZwgDkNTIub8akpUl FQlzgq7lTydVzNcju2ImBeubU7KgFEu0F2Zub5z/iR+F2Mx/bAju8Q4YeVlPyD8U QJoIBlXAvgK8LK4bZCh40zPeEt0TMWXnW7o0JHgVQ0g6VbT+hp17I7fz91xEazye qbjpIJIjv5daEv0REM8t5ZCZB3tEatVjb4EQWXp0gJYb0l5E3I/O+7MO44n4uMYx s3UTxzM6NjwCtlgmn4tYUj+vEIExQHUUnwOl02e5iEa7bqNNY75ehAhj5Rh7iQBH H4b+OVuqc608q87rNePdK1LRyh0/u1cDI1kDAQoIP2omlb5hJQGk0Nuz9G2BodIj rP0x7nV+ykOXZtr6TR+RvaksL1W39PzVKYA0aL+e2gbcv4YO+Oq1phvNKwRWPM4a g08r/kvifS5h6/Jq8Wmn83f1vAOX7Sf23RtEoj+t9hc4S4JbsV2iYK3PY3eWbSYE Oz0Vt4gvBBJ+0rHJ10BsQ7686OQkyMKpIlvmx6O5mWVlthovbJM= =6Nzz -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: "These are mostly fixes for SUNRPC bugs, with a single v4.2 copy_file_range() fix mixed in. Stable bugfixes: - Fix TCP receive code on archs with flush_dcache_page() Other bugfixes: - Fix error code in rpcrdma_buffer_create() - Fix a double free in rpcrdma_send_ctxs_create() - Fix kernel BUG at kernel/cred.c:825 - Fix unnecessary retry in nfs42_proc_copy_file_range() - Ensure rq_bytes_sent is reset before request transmission - Ensure we respect the RPCSEC_GSS sequence number limit - Address Kerberos performance/behavior regression" * tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Address Kerberos performance/behavior regression SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit SUNRPC: Ensure rq_bytes_sent is reset before request transmission NFSv4.2 fix unnecessary retry in nfs4_copy_file_range sunrpc: kernel BUG at kernel/cred.c:825! SUNRPC: Fix TCP receive code on archs with flush_dcache_page() xprtrdma: Double free in rpcrdma_sendctxs_create() xprtrdma: Fix error code in rpcrdma_buffer_create()	2019-01-20 09:27:38 +12:00
Linus Torvalds	0facb89245	for-linus-20190118 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxCLxcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmBID/0SpsLsnoeki37Wei0HzBZeRdFGIXFfg1XB +s66nJUL+igHIIwH+EO7Ct+vw8jfCWZfasI+ThGtkgRg88bqJO3HEbE1MQY8KlTh Dm9LRHH9Jp4OYbbag2RXammaD4XKi980fIyVTp9R0keszQroSUTXXS40fksszTtB K4LEBiAxGJgin60W9RgbbK93n/pw17U5Wu2WID6ZfVxZYVzpXl0iIQXUz1zmnpy5 3aP7ORqJGnH7WQiV57LMpq5/QFBoKrfUsjtxpnJ+i2694sLYX8NCqnH82SUliSU+ NN3Vxic1ozmZEADFKg4dfmgVqqyJaoVCRvsA5i5YjqG9m65G/S7/BmsaWbq5Ixf2 N1KQr/36vkYGYWqtb2rbYRVf9XfbxWKp1wuSaX3N1Vh1/Ee180qD68+wpa/0mtkh 2dXV6CQP9De4BXH6mxj3Yr7LI2a2r7KLYhULZCAvLMvDtsBrPyHAmRRgXxaLlKL8 QMFYYO/1wjlzpGnn4IxP8sDMH6N2shtZHAmC2n9TM2gmk5tA0OmajJJ4uxNbtyhw +ZlDco3aw/x0v2onjlbTiPij2cReRgwN+tVaEvn0dj3uxaisCru60iYdnr9FX4nL g3NxNUJbUhl6YMwvNMI0GKfz1IILzjPgJIhDqSHvYhtCA1w0DMVXs3Qv4HXoVh3m ffdby6/Yqw== =5yJp -----END PGP SIGNATURE----- Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - block size setting fixes for loop/nbd (Jan Kara) - md bio_alloc_mddev() cleanup (Marcos) - Ensure we don't lose the REQ_INTEGRITY flag (Ming) - Two NVMe fixes by way of Christoph: - Fix NVMe IRQ calculation (Ming) - Uninitialized variable in nvmet-tcp (Sagi) - BFQ comment fix (Paolo) - License cleanup for recently added blk-mq-debugfs-zoned (Thomas) * tag 'for-linus-20190118' of git://git.kernel.dk/linux-block: block: Cleanup license notice nvme-pci: fix nvme_setup_irqs() nvmet-tcp: fix uninitialized variable access block: don't lose track of REQ_INTEGRITY flag blockdev: Fix livelocks on loop device nbd: Use set_blocksize() to set device blocksize md: Make bio_alloc_mddev use bio_alloc_bioset block, bfq: fix comments on __bfq_deactivate_entity	2019-01-20 09:12:50 +12:00
Josef Bacik	fd340d0f68	btrfs: wakeup cleaner thread when adding delayed iput The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. Delaying iputs means we are potentially delaying the eviction of an inode and it's respective space. The cleaner thread only gets woken up every 30 seconds, or when we require space. If there are a lot of inodes that need to be deleted we could induce a serious amount of latency while we wait for these inodes to be evicted. So instead wakeup the cleaner if it's not already awake to process any new delayed iputs we add to the list. If we suddenly need space we will less likely be backed up behind a bunch of inodes that are waiting to be deleted, and we could possibly free space before we need to get into the flushing logic which will save us some latency. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-01-18 17:27:23 +01:00
Josef Bacik	3ec9a4c81c	btrfs: run delayed iputs before committing Delayed iputs means we can have final iputs of deleted inodes in the queue, which could potentially generate a lot of pinned space that could be free'd. So before we decide to commit the transaction for ENOPSC reasons, run the delayed iputs so that any potential space is free'd up. If there is and we freed enough we can then commit the transaction and potentially be able to make our reservation. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-01-18 17:27:21 +01:00
Josef Bacik	74d5d229b1	btrfs: wait on ordered extents on abort cleanup If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transaction cleanup stuff. generic/475 can produce this warning: [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000 [ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000 [ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0 [ 8531.207516] Call Trace: [ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs] [ 8531.210209] ? wait_for_completion+0x5b/0x190 [ 8531.211303] close_ctree+0x157/0x350 [btrfs] [ 8531.212412] generic_shutdown_super+0x64/0x100 [ 8531.213485] kill_anon_super+0x14/0x30 [ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8531.215539] deactivate_locked_super+0x29/0x60 [ 8531.216633] cleanup_mnt+0x3b/0x70 [ 8531.217497] task_work_run+0x98/0xc0 [ 8531.218397] exit_to_usermode_loop+0x83/0x90 [ 8531.219324] do_syscall_64+0x15b/0x180 [ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 8531.221286] RIP: 0033:0x7fc68e5e4d07 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000 Leaving a tree in the rb-tree: 3853 void btrfs_free_fs_root(struct btrfs_root *root) 3854 { 3855 iput(root->ino_cache_inode); 3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); CC: stable@vger.kernel.org Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add stacktrace ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-01-18 17:24:19 +01:00
Josef Bacik	31890da0bf	btrfs: handle delayed ref head accounting cleanup in abort We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. The test generic/475 on a 2G VM can trigger the problems eg.: [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs] [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394 [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014 [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs] [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206 [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000 [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400 [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000 [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108 [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100 [ 8502.170296] FS: 00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000 [ 8502.172439] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0 [ 8502.175094] Call Trace: [ 8502.175759] close_ctree+0x17f/0x350 [btrfs] [ 8502.176721] generic_shutdown_super+0x64/0x100 [ 8502.177702] kill_anon_super+0x14/0x30 [ 8502.178607] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8502.179602] deactivate_locked_super+0x29/0x60 [ 8502.180595] cleanup_mnt+0x3b/0x70 [ 8502.181406] task_work_run+0x98/0xc0 [ 8502.182255] exit_to_usermode_loop+0x83/0x90 [ 8502.183113] do_syscall_64+0x15b/0x180 [ 8502.183919] entry_SYSCALL_64_after_hwframe+0x49/0xbe Corresponding to release_global_block_rsv() { ... WARN_ON(fs_info->delayed_refs_rsv.reserved > 0); CC: stable@vger.kernel.org Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add log dump ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-01-18 17:10:04 +01:00
David Sterba	77b7aad195	Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" This reverts commit `e73e81b6d0`. This patch causes a few problems: - adds latency to btrfs_finish_ordered_io - as btrfs_finish_ordered_io is used for free space cache, generating more work from btrfs_btree_balance_dirty_nodelay could end up in the same workque, effectively deadlocking 12260 kworker/u96:16+btrfs-freespace-write D [<0>] balance_dirty_pages+0x6e6/0x7ad [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90 [<0>] btrfs_finish_ordered_io+0x3da/0x770 [<0>] normal_work_helper+0x1c5/0x5a0 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x46/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Transaction commit will wait on the freespace cache: 838 btrfs-transacti D [<0>] btrfs_start_ordered_extent+0x154/0x1e0 [<0>] btrfs_wait_ordered_range+0xbd/0x110 [<0>] __btrfs_wait_cache_io+0x49/0x1a0 [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0 [<0>] commit_cowonly_roots+0x215/0x2b0 [<0>] btrfs_commit_transaction+0x37e/0x910 [<0>] transaction_kthread+0x14d/0x180 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff And then writepages ends up waiting on transaction commit: 9520 kworker/u96:13+flush-btrfs-1 D [<0>] wait_current_trans+0xac/0xe0 [<0>] start_transaction+0x21b/0x4b0 [<0>] cow_file_range_inline+0x10b/0x6b0 [<0>] cow_file_range.isra.69+0x329/0x4a0 [<0>] run_delalloc_range+0x105/0x3c0 [<0>] writepage_delalloc+0x119/0x180 [<0>] __extent_writepage+0x10c/0x390 [<0>] extent_write_cache_pages+0x26f/0x3d0 [<0>] extent_writepages+0x4f/0x80 [<0>] do_writepages+0x17/0x60 [<0>] __writeback_single_inode+0x59/0x690 [<0>] writeback_sb_inodes+0x291/0x4e0 [<0>] __writeback_inodes_wb+0x87/0xb0 [<0>] wb_writeback+0x3bb/0x500 [<0>] wb_workfn+0x40d/0x610 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x1e0/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Eventually, we have every process in the system waiting on balance_dirty_pages(), and nobody is able to make progress on page writeback. The original patch tried to fix an OOM condition, that happened on 4.4 but no success reproducing that on later kernels (4.19 and 4.20). This is more likely a problem in OOM itself. Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/ Reported-by: Chris Mason <clm@fb.com> CC: stable@vger.kernel.org # 4.18+ CC: ethanlien <ethanlien@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-01-18 17:09:55 +01:00

1 2 3 4 5 ...

57133 Commits