osd_request_timeout specifies how many seconds to wait for a response
from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default)
means no limit.
osd_request_timeout is osdkeepalive-precise -- in-flight requests are
swept through every osdkeepalive seconds. With ack vs commit behaviour
gone, abort_request() is really simple.
This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>.
Tested-by: Artur Molchanov <artur.molchanov@synesis.ru>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
... so that userspace can generate meaningful error messages and spell
out unsupported features that need to be disabled.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs. Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.
Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
Fixes: 930c532869 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Older (shorter) CRUSH maps too need to be finalized.
Fixes: 66a0e2d579 ("crush: remove mutable part of CRUSH map")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
commit 93825f2ec7 converted NSEC_PER_SEC to TICK_NSEC because the author
confused NSEC_PER_JIFFY with NSEC_PER_SEC.
As a result, the calculation of refined jiffies got broken, triggering
lockups.
Fixes: 93825f2ec7 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY")
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1488880534-3777-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arnd Bergmann reported a (false positive) objtool warning:
drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer
The issue is in find_switch_table(). It tries to find a switch
statement's jump table by walking backwards from an indirect jump
instruction, looking for a relocation to the .rodata section. In this
case it stopped walking prematurely: the first .rodata relocation it
encountered was for a variable (resp_state_name) instead of a jump
table, so it just assumed there wasn't a jump table.
The fix is to ignore any .rodata relocation which refers to an ELF
object symbol. This works because the jump tables are anonymous and
have no symbols associated with them.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 3732710ff6 ("objtool: Improve rare switch jump table pattern detection")
Link: http://lkml.kernel.org/r/20170302225723.3ndbsnl4hkqbne7a@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix:
drivers/char/nwbutton.c: In function 'button_sequence_finished':
drivers/char/nwbutton.c:134:3: error: implicit declaration of function 'kill_cad_pid'
The declaration has been moved from one include file to another.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c3edc4010e ("sched/headers: Move task_struct::signal and ...")
Link: http://lkml.kernel.org/r/1488762811-9022-1-git-send-email-linux@roeck-us.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Revise lpfc version number to 11.2.0.10
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch addresses the smatch issues identified by Dan Carpenter
in http://www.spinics.net/lists/linux-scsi/msg105665.html
The issues are:
drivers/scsi/lpfc/lpfc_ct.c:943 lpfc_cmpl_ct_cmd_gft_id()
error: we previously assumed 'ndlp' could be null (see line 928)
Action: moved under if check
drivers/scsi/lpfc/lpfc_nvmet.c:1694 lpfc_nvmet_unsol_issue_abort()
error: we previously assumed 'ndlp' could be null (see line 1690)
Action: conditionalized arg in printf stmt
drivers/scsi/lpfc/lpfc_nvmet.c:1792 lpfc_nvmet_sol_fcp_issue_abort()
error: we previously assumed 'ndlp' could be null (see line 1788)
Action: conditionalized arg in printf stmt
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch addresses the smatch issues identified by Dan Carpenter
in http://www.spinics.net/lists/linux-scsi/msg105663.html
The issues are:
drivers/scsi/lpfc/lpfc_hbadisc.c:316 lpfc_dev_loss_tmo_handler()
warn: we tested 'vport->load_flag & 2' before and it was 'false'
Action: removed item from test
drivers/scsi/lpfc/lpfc_hbadisc.c:701 lpfc_work_done()
warn: test_bit() takes a bit number
Action: changed definition so bit number
drivers/scsi/lpfc/lpfc_hbadisc.c:2206 lpfc_mbx_cmpl_fcf_scan_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
drivers/scsi/lpfc/lpfc_hbadisc.c:2582 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
drivers/scsi/lpfc/lpfc_hbadisc.c:2683 lpfc_mbx_cmpl_read_fcf_rec() error:
uninitialized symbol 'vlan_id'.
Action: initilized value
drivers/scsi/lpfc/lpfc_hbadisc.c:4025 lpfc_register_remote_port()
error: we previously assumed 'rdata' could be null (see line 4023)
Action: refactored check block
drivers/scsi/lpfc/lpfc_hbadisc.c:4613 lpfc_sli4_dequeue_nport_iocbs()
error: double unlock 'irq:'
Action: removed inner irq reference
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME merge reverted diag port names to the physical port.
They should be the vport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove nvme teardown calls that should not be there on sli3 devices
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct a merge error that had debug data printed twice for the
same element
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Without apriori understanding of what the define is, the name gives
a very different impression of what it is (a max delay value
for an EQ). Rename the define so it reflects what it is: the number
of EQ IDs that can be set in one instance of the MODIFY_EQ_DELAY
mbx command.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reworked Kconfig so that lfpc only requires the scsi stack.
NVME Initiator and NVME Target support can be enabled if
the other NVMe subsystems have been enabled.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph's prior patch missed the template for the sli3 adapters,
which is now the "no host reset" template. Add the transport
eh_timed_out handler to the no host reset template
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A previous change unilaterally removed the hba reset entry point
from the sli3 host template. This was done to allow tape devices
being used for back up from being removed. Why was this done ?
When there was non-responding device on the fabric, the error
escalation policy would escalate to the reset handler. When the
reset handler was called, it would reset the adapter, dropping
link, thus logging out and terminating all i/o's - on any target.
If there was a tape device on the same adapter that wasn't in
error, it would kill the tape i/o's, effectively killing the
tape device state. With the reset point removed, the adapter
reset avoided the fabric logout, allowing the other devices to
continue to operate unaffected. A hack - yes. Hint: we really
need a transport I_T nexus reset callback added to the eh process
(in between the SCSI target reset and hba reset points), so a
fc logout could occur to the one bad target only and stop the error
escalation process.
This patch commonizes the approach so it can be used for sli3 and sli4
adapters, but mandates the admin, via module parameter, specifically
identify which adapters the resets are to be removed for. Additionally,
bus_reset, which sends Target Reset TMFs to all targets, is also removed
from the template as it too has the same effect as the adapter reset.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
previous code did little more than log a message.
This patch adds abort path support, modeled after the SCSI code paths.
Currently addresses only the initiator path. Target path under
development, but stubbed out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
nvme bufs get allocated even when the registration fails.
Move allocation into the rsgistration success path.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For both initiator and target: if WQ is full, return -EBUSY.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Word 1 in NVME CMD IU appears byte swapped from value placed in WQE
Should be Big Endian value in WQE word 16
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME LS requests and responses had wrong R_CTL values.
Use the FC4 ELS Request and Response defines (defines badly
named, they are FC4 LS's) instead of the base ELS values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
if REG_VPI fails, the driver was incorrectly issuing INIT_VFI
(a SLI4 command) on a SLI3 adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
In the case where sglq is null, the current code just returns without
unlocking the spinlock sql_list_lock. Fix this by breaking out of the
while loop and the exit path will then unlock and return NULL as was
the original intention.
Detected by CoverityScan, CID#1411635 ("Missing unlock")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
dma_buf->iocbq is being dereferenced immediately before it is
being null checked, so we have a potential null pointer dereference
bug. Fix this by only dereferencing it only once we have passed
a null check on the pointer.
Detected by CoverityScan, CID#1411652 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
The sanity check for hrq should be moved to before the deference
of hrq to ensure we don't perform a null pointer deference.
Detected by CoverityScan, CID#1411650 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
In the NVMET_FCOP_RSP case, sgel is assigned but never used and
hence is redundant and can be removed.
Detected by CoverityScan, CID#1411658 ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch shortens every init_timer in lpfc module followed by function
and data assignment using setup_timer. This is purely cleanup patch, it
does not add new functionality nor remove any existing functionality.
An init_timer call in this form:
init_timer(&vport->fc_disctmo);
vport->fc_disctmo.function = lpfc_disc_timeout;
vport->fc_disctmo.data = vport;
is shortened to:
setup_timer(&vport->fc_disctmo, lpfc_disc_timeout, vport);
It increases readability and reduces chances of mistakes done by
developers.
Signed-off-by: Tomas Jasek <tomsik68@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Using %llx for a dma_addr_t can lead to format/argument mismatches. Use
%pad and the address of the dma_addr_t instead.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add __printf compiler verification of format and arguments. Fix
fallout.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It returns the number of vectors allocated when successful, so check for
a negative error only.
Fixes: 2e48e349 ("scsi: vmw_pvscsi: switch to pci_alloc_irq_vectors")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Loïc Yhuel <loic.yhuel@gmail.com>
Tested-by: Loïc Yhuel <loic.yhuel@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Recent printk changes for KERN_CONT cause this logging to be defectively
emitted on multiple lines. Fix it.
Also reduces object size a trivial amount.
$ size drivers/scsi/qla2xxx/qla_dbg.o*
text data bss dec hex filename
39125 0 0 39125 98d5 drivers/scsi/qla2xxx/qla_dbg.o.new
39164 0 0 39164 98fc drivers/scsi/qla2xxx/qla_dbg.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The return status of the adapter check on KERNEL_PANIC is supposed to be
the upper 16 bits of the OMR status register.
Fixes: c421530bf8 (scsi: aacraid: Reorder Adpater status check)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Without CONFIG_DEBUG_FS, we run into a link error:
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'
This defines the symbol as a constant in this case, as there is no way to
set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
it to qedi_do_not_recover in order to put it into a driver specific namespace,
as "do_not_recover" is a really bad name for a kernel-wide global identifier
when it is used only in one driver.
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Always increment/decrement ucount->count under the ucounts_lock. The
increments are there already and moving the decrements there means the
locking logic of the code is simpler. This simplification in the
locking logic fixes a race between put_ucounts and get_ucounts that
could result in a use-after-free because the count could go zero then
be found by get_ucounts and then be freed by put_ucounts.
A bug presumably this one was found by a combination of syzkaller and
KASAN. JongWhan Kim reported the syzkaller failure and Dmitry Vyukov
spotted the race in the code.
Cc: stable@vger.kernel.org
Fixes: f6b2db1a3e ("userns: Make the count of user namespaces per user")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
pcpu_get_pages() doesn't use chunk_alloc parameter, remove it.
Fixes: fbbb7f4e14 ("percpu: remove the usage of separate populated bitmap in percpu-vm")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done
without holding pcpu_lock. This can lead to bad updates to the variable.
Add missing lock calls.
Fixes: b539b87fed ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.18+
If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender. This actually happened with smc.
__queue_delayed_work() already does several input sanity checks
synchronously. Add NULL @wq check.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
ata_sff_qc_issue() expects upper layers to never issue commands on a
command protocol that it doesn't implement. While the assumption
holds fine with the usual IO path, nothing filters based on the
command protocol in the passthrough path (which was added later),
allowing the warning to be tripped with a passthrough command with the
right (well, wrong) protocol.
Failing with AC_ERR_SYSTEM is the right thing to do anyway. Remove
the unnecessary WARN.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/CACT4Y+bXkvevNZU8uP6X0QVqsj6wNoUA_1exfTSOzc+SmUtMOA@mail.gmail.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Without this patch, failed probe would not free resources like irq.
ata port tdev object currently hold a reference to the ata port
object. Therefore the ata port object release function will not get
called until the ata_tport_release is called. But that would never
happen, releasing the last reference of ata port dev is done by
scsi_host_release, which is called by ata_host_release when the ata
port object is released.
The ata device objects actually do not need to explicitly hold a
reference to their real counterpart, given the transport objects are
the children of these objects and device_add() is call for each child.
We know the parent will not be deleted until we call the child's
device_del().
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
As found in grsecurity, this avoids exposing a kernel pointer through
the cgroup debug entries.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
pids_can_fork() is special in that the css association is guaranteed
to be stable throughout the function and thus doesn't need RCU
protection around task_css access. When determining the css to charge
the pid, task_css_check() is used to override the RCU sanity check.
While adding a warning message on fork rejection from pids limit,
135b8b37bd ("cgroup: Add pids controller event when fork fails
because of pid limit") incorrectly added a task_css access which is
neither RCU protected or explicitly annotated. This triggers the
following suspicious RCU usage warning when RCU debugging is enabled.
cgroup: fork rejected by pids controller in
===============================
[ ERR: suspicious RCU usage. ]
4.10.0-work+ #1 Not tainted
-------------------------------
./include/linux/cgroup.h:435 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
1 lock held by bash/1748:
#0: (&cgroup_threadgroup_rwsem){+++++.}, at: [<ffffffff81052c96>] _do_fork+0xe6/0x6e0
stack backtrace:
CPU: 3 PID: 1748 Comm: bash Not tainted 4.10.0-work+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
Call Trace:
dump_stack+0x68/0x93
lockdep_rcu_suspicious+0xd7/0x110
pids_can_fork+0x1c7/0x1d0
cgroup_can_fork+0x67/0xc0
copy_process.part.58+0x1709/0x1e90
_do_fork+0xe6/0x6e0
SyS_clone+0x19/0x20
do_syscall_64+0x5c/0x140
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7853fab93a
RSP: 002b:00007ffc12d05c90 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7853fab93a
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
RBP: 00007ffc12d05cc0 R08: 0000000000000000 R09: 00007f78548db700
R10: 00007f78548db9d0 R11: 0000000000000246 R12: 00000000000006d4
R13: 0000000000000001 R14: 0000000000000000 R15: 000055e3ebe2c04d
/asdf
There's no reason to dereference task_css again here when the
associated css is already available. Fix it by replacing the
task_cgroup() call with css->cgroup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Fixes: 135b8b37bd ("cgroup: Add pids controller event when fork fails because of pid limit")
Cc: Kenny Yu <kennyyu@fb.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Tejun Heo <tj@kernel.org>
Follow the common documentation style in the file and indent the
interface file description by a tab instead of just a space.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Tejun Heo <tj@kernel.org>
After XFS switching to iomap based DIO (commit acdda3aae1 ("xfs:
use iomap_dio_rw")), I started to notice dio29/dio30 tests failures
from LTP run on ppc64 hosts, and they can be reproduced on x86_64
hosts with 512B/1k block size XFS too.
dio29 diotest3 -b 65536 -n 100 -i 1000 -o 1024000
dio30 diotest6 -b 65536 -n 100 -i 1000 -o 1024000
The failure message is like:
bufcmp: offset 0: Expected: 0x62, got 0x0
diotest03 1 TPASS : Read with Direct IO, Write without
diotest03 2 TFAIL : diotest3.c:142: comparsion failed; child=98 offset=1425408
diotest03 3 TFAIL : diotest3.c:194: Write Direct-child 98 failed
Direct write wrote 0x62 but buffer read got zero. This is because,
when doing direct write to a hole or preallocated file, we
invalidate the page caches before converting the extent from
unwritten state to normal state, which is done by
iomap_dio_complete(), thus leave a window for other buffer reader to
cache the unwritten state extent.
Consider this case, with sub-page blocksize XFS, two processes are
direct writing to different blocksize-aligned regions (say 512B) of
the same preallocated file, and reading the region back via buffered
I/O to compare contents.
process A, region [0,512] process B, region [512,1024]
xfs_file_write_iter
xfs_file_aio_dio_write
iomap_dio_rw
iomap_apply
invalidate_inode_pages2_range
xfs_file_write_iter
xfs_file_aio_dio_write
iomap_dio_rw
iomap_apply
invalidate_inode_pages2_range
iomap_dio_complete
xfs_file_read_iter
xfs_file_buffered_aio_read
generic_file_read_iter
do_generic_file_read
<readahead fills pagecache with 0>
iomap_dio_complete
xfs_file_read_iter
<read gets 0 from pagecache>
Process A first invalidates page caches, at this point the
underlying extent is still in unwritten state (iomap_dio_complete
not called yet), and process B finishs direct write and populates
page caches via readahead, which caches zeros in page for region A,
then process A reads zeros from page cache, instead of the actual
data.
Fix it by invalidating page caches after converting unwritten extent
to make sure we read content from disk after extent state changed,
as what we did before switching to iomap based dio.
Also introduce a new 'start' variable to save the original write
offset (iomap_dio_complete() updates iocb->ki_pos), and a 'err'
variable for invalidating caches result, cause we can't reuse 'ret'
anymore.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
VMCLEAR should silently ignore a failure to clear the launch state of
the VMCS referenced by the operand.
Signed-off-by: Jim Mattson <jmattson@google.com>
[Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In the function scan_dma_completions() there is a reusage of tmp
variable. That coused a wrong value being used in some case when
reading a short packet terminated transaction from an endpoint,
in 2 concecutive reads.
This was my logic for the patch:
The req->td->dmadesc equals to 0 iff:
-- There was a transaction ending with a short packet, and
-- The read() to read it was shorter than the transaction length, and
-- The read() to complete it is longer than the residue.
I believe this is true from the printouts of various cases,
but I can't be positive it is correct.
Entering this if, there should be no more data in the endpoint
(a short packet terminated the transaction).
If there is, the transaction wasn't really done and we should exit and
wait for it to finish entirely. That is the inner if.
That inner if should never happen, but it is there to be on the safe
side. That is why it is marked with the comment /* paranoia */.
The size of the data available in the endpoint is ep->dma->dmacount
and it is read to tmp.
This entire clause is based on my own educated guesses.
If we passed that inner if without breaking in the original code,
than tmp & DMA_BYTE_MASK_COUNT== 0.
That means we will always pass dma bytes count of 0 to dma_done(),
meaning all the requested bytes were read.
dma_done() reports back to the upper layer that the request (read())
was done and how many bytes were read.
In the original code that would always be the request size,
regardless of the actual size of the data.
That did not make sense to me at all.
However, the original value of tmp is req->td->dmacount,
which is the dmacount value when the request's dma transaction was
finished. And that is a much more reasonable value to report back to
the caller.
To recreate the problem:
Read from a bulk out endpoint in a loop, 1024 * n bytes in each
iteration.
Connect the PLX to a host you can control.
Send to that endpoint 1024 * n + x bytes,
such that 0 < x < 1024 * n and (x % 1024) != 0
You would expect the first read() to return 1024 * n
and the second read() to return x.
But you will get the first read to return 1024 * n
and the second one to return 1024 * n.
That is true for every positive integer n.
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org
Signed-off-by: Raz Manor <Raz.Manor@valens.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
A call usb_put_phy(udc->transceiver) must be tested for a valid pointer.
Use an already existing test for usb_unregister_notifier call.
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reported-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Petr Cvek <petr.cvek@tul.cz>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
We need to break from all cases if we want to treat
each one of them separately.
Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Fixes: d2728fb3e0 ("usb: dwc3: omap: Pass VBUS and ID events transparently")
Cc: <stable@vger.kernel.org> #v4.8+
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>