cifs: fix wait_for_response to time out sleeping processes correctly
The current scheme that CIFS uses to sleep and wait for a response is
not quite what we want. After sending a request, wait_for_response puts
the task to sleep with wait_event(). One of the conditions for
wait_event is a timeout (using time_after()).
The problem with this is that there is no guarantee that the process
will ever be woken back up. If the server stops sending data, then
cifs_demultiplex_thread will leave its response queue sleeping.
I think the only thing that saves us here is the fact that
cifs_dnotify_thread periodically (every 15s) wakes up sleeping processes
on all response_q's that have calls in flight. This makes for
unnecessary wakeups of some processes. It also means large variability
in the timeouts since they're all woken up at once.
Instead of this, put the tasks to sleep with wait_event_timeout. This
makes them wake up on their own if they time out. With this change,
cifs_dnotify_thread should no longer be needed.
I've been testing this in conjunction with some other patches that I'm
working on. It doesn't seem to affect performance at all with with heavy
I/O. Identical iozone -ac runs complete in almost exactly the same time
(<1% difference in times).
Thanks to Wasrshi Nimara for initially pointing this out. Wasrshi, it
would be nice to know whether this patch also helps your testcase.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Wasrshi Nimara <warshinimara@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Windows allows you to deny access to the top of a share, but permit access to
a directory lower in the path. With the prefixpath feature of cifs
(ie mounting \\server\share\directory\subdirectory\etc.) this should have
worked if the user specified a prefixpath which put the root of the mount
at a directory to which he had access, but we still were doing a lookup
on the root of the share (null path) when we should have been doing it on
the prefixpath subdirectory.
This fixes Samba bug # 5925
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Some applications/subsystems require mandatory byte range locks
(as is used for Windows/DOS/OS2 etc). Sending advisory (posix style)
byte range lock requests (instead of mandatory byte range locks) can
lead to problems for these applications (which expect that other
clients be prevented from writing to portions of the file which
they have locked and are updating). This mount option allows
mounting cifs with the new mount option "forcemand" (or
"forcemandatorylock") in order to have the cifs client use mandatory
byte range locks (ie SMB/CIFS/Windows/NTFS style locks) rather than
posix byte range lock requests, even if the server would support
posix byte range lock requests. This has no effect if the server
does not support the CIFS Unix Extensions (since posix style locks
require support for the CIFS Unix Extensions), but for mounts
to Samba servers this can be helpful for Wine and applications
that require mandatory byte range locks.
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Alexander Bokovoy <ab@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In order to unify the smb_send routines, we need to reorganize the
routines that connect the sockets. Have ipv4_connect take a
TCP_Server_Info pointer and get the necessary fields from that.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
struct smb_vol is fairly large, it's probably best to kzalloc it...
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Clean up cifs_mount a bit by moving the code that creates new TCP
sessions into a separate function. Have that function search for an
existing socket and then create a new one if one isn't found.
Also reorganize the initializion of TCP_Server_Info a bit to prepare
for cleanup of the socket connection code.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The current code for setting the session serverName is IPv4-specific.
Allow it to be an IPv6 address as well. Use NIP* macros to set the
format.
This also entails increasing the length of the serverName field, so
declare a new macro for RFC1001 name length and use it in the
appropriate places.
Finally, drop the unicode_server_Name field from TCP_Server_Info since
it's not used. We can add it back later if needed, but for now it just
wastes memory.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now that tasks sleeping in wait_for_response will time out on their own,
we're not reliant on the dnotify thread to do this. Mark it as
experimental code for now.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifsd can outlive the last cifs mount. We need to hold a module
reference until it exits to prevent someone from unplugging
the module until we're ready.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Have cifs_show_options display the addr and prefixpath options in
/proc/mounts. Reduce struct dereferencing by adding some local
variables.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
While testing a kernel with memory poisoning enabled, I saw some warnings
about the redzone getting clobbered when chasing DFS referrals. The
buffer allocation for the unicode converted version of the searchName is
too small and needs to take null termination into account.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.
It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.
CC: Nick Piggin <npiggin@suse.de>
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If a connection with open file handles has gone down
and come back up and reconnected without reopening
the file handle yet, do not attempt to send an SMB close
request for this handle in cifs_close. We were
checking for the connection being invalid in cifs_close
but since the connection may have been reconnected
we also need to check whether the file handle
was marked invalid (otherwise we could close the
wrong file handle by accident).
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This was recently changed to check for need_reconnect, but should
actually be a check for a tidStatus of CifsExiting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes a data corruption under heavy stress in which pages could be left
dirty after all open instances of a inode have been closed.
In order to write contiguous pages whenever possible, cifs_writepages()
asks pagevec_lookup_tag() for more pages than it may write at one time.
Normally, it then resets index just past the last page written before calling
pagevec_lookup_tag() again.
If cifs_writepages() can't write the first page returned, it wasn't resetting
index, and the next call to pagevec_lookup_tag() resulted in skipping all of
the pages it previously returned, even though cifs_writepages() did nothing
with them. This can result in data loss when the file descriptor is about
to be closed.
This patch ensures that index gets set back to the next returned page so
that none get skipped.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Shirish S Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Since these hit the same routines, and are relatively small, it is easier to review
them as one patch.
Fixed incorrect handling of the last option in some cases
Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
Use non default separator if it is provided in the original mount options
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
set tcon->ses earlier
If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
with a NULL pointer. Fix it by setting the tcon->ses earlier.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We do this by abandoning the global list of SMB sessions and instead
moving to a per-server list. This entails adding a new list head to the
TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to
a non-atomic variable. We have to protect it by a lock anyway, so there's
no benefit to making it an atomic. The list and refcount are protected
by the global cifs_tcp_ses_lock.
The patch also adds a new routines to find and put SMB sessions and
that properly take and put references under the lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The code that allows these structs to be shared is extremely racy.
Disable the sharing of SMB and tcon structs for now until we can
come up with a way to do this that's race free.
We want to continue to share TCP sessions, however since they are
required for multiuser mounts. For that, implement a new (hopefully
race-free) scheme. Add a new global list of TCP sessions, and take
care to get a reference to it whenever we're dealing with one.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We're currently declaring both a sockaddr_in and sockaddr6_in on the
stack, but we really only need storage for one of them. Declare a
sockaddr struct and cast it to the proper type. Also, eliminate the
protocolType field in the TCP_Server_Info struct. It's redundant since
we have a sa_family field in the sockaddr anyway.
We may need to revisit this if SCTP is ever implemented, but for now
this will simplify the code.
CIFS over IPv6 also has a number of problems currently. This fixes all
of them that I found. Eventually, it would be nice to move more of the
code to be protocol independent, but this is a start.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Also adds two lines missing from the previous patch (for the need reconnect flag in the
/proc/fs/cifs/DebugData handling)
The new global_cifs_sock_list is added, and initialized in init_cifs but not used yet.
Jeff Layton will be adding code in to use that and to remove the GlobalTcon and GlobalSMBSession
lists.
CC: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In preparation for Jeff's big umount/mount fixes to remove the possibility of
various races in cifs mount and linked list handling of sessions, sockets and
tree connections, this patch cleans up some repetitive code in cifs_mount,
and addresses a problem with ses->status and tcon->tidStatus in which we
were overloading the "need_reconnect" state with other status in that
field. So the "need_reconnect" flag has been broken out from those
two state fields (need reconnect was not mutually exclusive from some of the
other possible tid and ses states). In addition, a few exit cases in
cifs_mount were cleaned up, and a problem with a tcon flag (for lease support)
was not being set consistently for the 2nd mount of the same share
CC: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When retrying kernel_recvmsg, reset iov_base and iov_len.
Note comment from Sridhar: "In the normal path, iov.iov_len is clearly set to 4. But i think you are
running into a case where kernel_recvmsg() is called via 'goto incomplete_rcv'
It happens if the previous call fails with EAGAIN.
If you want to call recvmsg() after EAGAIN failure, you need to reset iov."
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: fix renaming one hardlink on top of another
POSIX says that renaming one hardlink on top of another to the same
inode is a no-op. We had the logic mostly right, but forgot to clear
the return code.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
smb_send2 exit logic was strange, and with the previous change
could cause us to fail large
smb writes when all of the smb was not sent as one chunk.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
CIFS in some heavy stress conditions cifs could get EAGAIN
repeatedly in smb_send2 which led to repeated retries and eventually
failure of large writes which could lead to data corruption.
There are three changes that were suggested by various network
developers:
1) convert cifs from non-blocking to blocking tcp sendmsg
(we left in the retry on failure)
2) change cifs to not set sendbuf and rcvbuf size for the socket
(let tcp autotune the buffer sizes since that works much better
in the TCP stack now)
3) if we have a partial frame sent in smb_send2, mark the tcp
session as invalid (close the socket and reconnect) so we do
not corrupt the remaining part of the SMB with the beginning
of the next SMB.
This does not appear to hurt performance measurably and has
been run in various scenarios, but it definately removes
a corruption that we were seeing in some high stress
test cases.
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: handle the TCP_Server_Info->tsk field more carefully
cifs: fix unlinking of rename target when server doesn't support open file renames
[CIFS] improve setlease handling
[CIFS] fix saving of resume key before CIFSFindNext
cifs: make cifs_rename handle -EACCES errors
[CIFS] fix build error
[CIFS] undo changes in cifs_rename_pending_delete if it errors out
cifs: track DeletePending flag in cifsInodeInfo
cifs: don't use CREATE_DELETE_ON_CLOSE in cifs_rename_pending_delete
[CIFS] eliminate usage of kthread_stop for cifsd
[CIFS] Add nodfs mount option
For execute permission on a regular files we need to check if file has
any execute bits at all, regardless of capabilites.
This check is normally performed by generic_permission() but was also
added to the case when the filesystem defines its own ->permission()
method. In the latter case the filesystem should be responsible for
performing this check.
Move the check from inode_permission() inside filesystems which are
not calling generic_permission().
Create a helper function execute_ok() that returns true if the inode
is a directory or if any execute bits are present in i_mode.
Also fix up the following code:
- coda control file is never executable
- sysctl files are never executable
- hfs_permission seems broken on MAY_EXEC, remove
- hfsplus_permission is eqivalent to generic_permission(), remove
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
With this patch all directory fops instances that have a readdir
that doesn't take the BKL are switched to generic_file_llseek.
Signed-off-by: Christoph Hellwig <hch@lst.de>
cifs: handle the TCP_Server_Info->tsk field more carefully
We currently handle the TCP_Server_Info->tsk field without any locking,
but with some half-measures to try and prevent races. These aren't
really sufficient though. When taking down cifsd, use xchg() to swap
the contents of the tsk field with NULL so we don't end up trying
to send it more than one signal. Also, don't allow cifsd to exit until
the signal is received if we expect one.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: fix unlinking of rename target when server doesn't support open file renames
The patch to make cifs_rename undoable broke renaming one file on top of
another when the server doesn't support busy file renames. Remove the
code that uses busy file renames to unlink the target file, and just
have it call cifs_unlink. If the rename of the source file fails, then
the unlink won't be undoable, but hopefully that's rare enough that it
won't be a problem.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
fcntl(F_SETLEASE) currently is not exported by cifs (nor by local file
systems) so cifs grants leases based on how other local processes have
opened the file not by whether the file is cacheable (oplocked). This
adds the check to make sure that the file is cacheable on the client
before checking whether we can grant the lease locally
(generic_setlease). It also adds a mount option for cifs (locallease)
if the user wants to override this and try to grant leases even
if the server did not grant oplock.
Signed-off-by: Steve French <sfrench@us.ibm.com>
We recently fixed the cifs readdir code so that it saves the resume key
before calling CIFSFindNext. Unfortunately, this assumes that we have
just done a CIFSFindFirst (or FindNext) and have resume info to save.
This isn't necessarily the case. Fix the code to save resume info if we
had to reinitiate the search, and after a FindNext.
This fixes connectathon basic test6 against NetApp filers.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: make cifs_rename handle -EACCES errors
Some servers seem to return -EACCES when attempting to rename one
open file on top of another. Refactor the cifs_rename logic to
attempt to rename the target file out of the way in this situation.
This also fixes the "unlink_target" logic to be undoable if the
subsequent rename fails.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.
The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.
This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.
[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cifs_rename_pending_delete process involves multiple steps. If it
fails and we're going to return error, we don't want to leave things in
a half-finished state. Add code to the function to undo changes if
a call fails.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: track DeletePending flag in cifsInodeInfo
The QPathInfo call returns a flag that indicates whether DELETE_ON_CLOSE
is set. Track it in the cifsInodeInfo.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: don't use CREATE_DELETE_ON_CLOSE in cifs_rename_pending_delete
CREATE_DELETE_ON_CLOSE apparently has different semantics than when you
set the DELETE_ON_CLOSE bit after opening the file. Setting it in the
open says "delete this file as soon as this filehandle is closed". That's
not what we want for cifs_rename_pending_delete.
Don't set this bit in the CreateFlags. Experimentation shows that
setting this flag in the SET_FILE_INFO call has no effect.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When cifs_demultiplex_thread was converted to a kthread based kernel
thread, great pains were taken to make it so that kthread_stop would be
used to bring it down. This just added unnecessary complexity since we
needed to use a signal anyway to break out of kernel_recvmsg.
Also, cifs_demultiplex_thread does a bit of cleanup as it's exiting, and
we need to be certain that this gets done. It's possible for a kthread
to exit before its main function is ever run if kthread_stop is called
soon after its creation. While I'm not sure that this is a real problem
with cifsd now, it could be at some point in the future if cifs_mount is
ever changed to bring down the thread quickly.
The upshot here is that using kthread_stop to bring down the thread just
adds extra complexity with no real benefit. This patch changes the code
to use the original method to bring down the thread, but still leaves it
so that the thread is actually started with kthread_run.
This seems to fix the deadlock caused by the reproducer in this bug
report:
https://bugzilla.samba.org/show_bug.cgi?id=5720
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>