currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link
In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps
This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.
To fix this problem we prevent change of q->now until its synchronization
with real time.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
warning: 'pf_info.max_tx_ques' may be used uninitialized in this function
[-Wmaybe-uninitialized]
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Fix for smatch tool reported error
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:2029
qlcnic_probe() error: potential NULL dereference 'adapter'.
o While returning from an error path in probe, adapter is not
initialized. So do not access adapter in cleanup path.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[net-next:master 301/302] drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c:563
qlcnic_set_multi() error: potential null dereference 'cur'. (kzalloc returns null)
o Break out of the loop after memory allocation failure. Program all the
MAC addresses that were cached in the return path.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove a declaration left over from the TCPCT-ectomy. This sysctl is
no longer referenced anywhere since 1a2c6181c4 ("tcp: Remove TCPCT").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found. This
garantees that msg is either a error pointer or a copy address.
Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 00cfec3748 (net: add a synchronize_net() in
netdev_rx_handler_unregister())
allows us to remove the synchronized_net() call from del_nbp()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6bbb6d9 "net/mlx4_en: Optimize Rx fast path filter checks" introduced a regression
under which the MAC address read from the card was not converted correctly
(the most significant byte was not handled), fix that.
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a quiet set of fixes for 3.9-rc4, a lot of people woke up and sent
urgent fixes for 3.9. I pushed back on a number of them that got
deferred to 3.10, but these are the ones that seemed important.
Regression in 3.9:
- Multiple regressions in OMAP2+ clock cleanup
- SH-Mobile frame buffer bug fix that merged here because of maintainer MIA
- ux500 prcmu changes broke DT booting
- MMCI duplicated regulator setup on ux500
- New ux500 clock driver broke ethernet on snowball
- Local interrupt driver for mvebu broke ethernet
- MVEBU GPIO driver did not get set up right on Orion DT
- incorrect interrupt number on Orion crypto for DT
Long-standing bugs, including candidates for stable:
- Kirkwood MMC needs to disable invalid card detect pins
- MV SDIO pinmux was wrong on Mirabox
- GoFlex Net board file needs to set NAND chip delay
- MSM timer restart race
- ep93xx early debug code broke in 3.7
- i.MX CPU hotplug race
- Incorrect clock setup for OMAP1 USB
- Workaround for bad clock setup by some old OMAP4 boot loaders
- Static I/O mappings on cns3xxx since 3.2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAUVrxWGCrR//JCVInAQLsoQ/+IQKk3v3xOhsvLaPxYjpO1dZqwxmWHwCz
ujmpnsUabYwuVfVL982k3RTSry5brgOB75CdztOyYsnckF1ZJ3zPjBN+TQM7G/aF
1lpfUTdCEQFgWFb69G6Lr5ZIDk7co3nRJk1GFS/xi/EAlnmUY/tC1Epco2Y0z0g4
dWz34sor22lxwWkUTdgXKwynoxmmjBzZIJOhtSOeednVPxN2qUe9IDAy9qk43U3a
Xg8j4OQT59TTmAZgAB7DLlJ3BGpFvTFAeZ4sDwrCCnibmB5E9LiaYwS9vrk9SQB1
D8CYIUqcP+cGKnftCCIzgjXHYvJw8fa7NKBUw9CzusIuk+c5AbE28KZRIL4D24Oq
ImlFV4Neec3Iab6IWfD0+PQK6PkwqnvPd5IBSFO4zUv2adafl7sTASlMnNPZtWbo
gV+GNVlCyab3l1KYBPo+CQGup3UpIAs5trQoCUh7BRf4HEsL+HILr/SFRQk3GQ6H
B+3HgSleiipT8n81VDFiWY1o5KuXmjUd2qpbc0a45VtM6EFBONwqBaKew93NkDYa
oIhI6yS8aIMYPXC6ZP5R2OvKUuL+mypKKXlt9BMCnDG9mrGMks8BLumcHco+Jmkt
9p6DChibxsaH6QArAi16shFPm0VqFUI6cidgTUcY024bZSiXpBMF52mIu6SEf0/S
agIwGxpuXFg=
=oQbL
-----END PGP SIGNATURE-----
Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC bug fixes from Arnd Bergmann:
"After a quiet set of fixes for 3.9-rc4, a lot of people woke up and
sent urgent fixes for 3.9. I pushed back on a number of them that got
deferred to 3.10, but these are the ones that seemed important.
Regression in 3.9:
- Multiple regressions in OMAP2+ clock cleanup
- SH-Mobile frame buffer bug fix that merged here because of
maintainer MIA
- ux500 prcmu changes broke DT booting
- MMCI duplicated regulator setup on ux500
- New ux500 clock driver broke ethernet on snowball
- Local interrupt driver for mvebu broke ethernet
- MVEBU GPIO driver did not get set up right on Orion DT
- incorrect interrupt number on Orion crypto for DT
Long-standing bugs, including candidates for stable:
- Kirkwood MMC needs to disable invalid card detect pins
- MV SDIO pinmux was wrong on Mirabox
- GoFlex Net board file needs to set NAND chip delay
- MSM timer restart race
- ep93xx early debug code broke in 3.7
- i.MX CPU hotplug race
- Incorrect clock setup for OMAP1 USB
- Workaround for bad clock setup by some old OMAP4 boot loaders
- Static I/O mappings on cns3xxx since 3.2"
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: cns3xxx: fix mapping of private memory region
arm: mvebu: Fix pinctrl for Armada 370 Mirabox SDIO port.
arm: orion5x: correct IRQ used in dtsi for mv_cesa
arm: orion5x: fix orion5x.dtsi gpio parameters
ARM: Kirkwood: fix unused mvsdio gpio pins
arm: mvebu: Use local interrupt only for the timer 0
ARM: kirkwood: Fix chip-delay for GoFlex Net
ARM: ux500: Enable the clock controlling Ethernet on Snowball
ARM: ux500: Stop passing ios_handler() as an MMCI power controlling call-back
ARM: ux500: Apply the TCPM and TCDM locations and sizes to dbx5x0 DT
fbdev: sh_mobile_lcdc: fixup B side hsync adjust settings
ARM: OMAP: clocks: Delay clk inits atleast until slab is initialized
ARM: imx: fix sync issue between imx_cpu_die and imx_cpu_kill
ARM: msm: Stop counting before reprogramming clockevent
ARM: ep93xx: Fix wait for UART FIFO to be empty
ARM: OMAP4: PM: fix PM regression introduced by recent clock cleanup
ARM: OMAP3: hwmod data: keep MIDLEMODE in force-standby for musb
ARM: OMAP4: clock data: lock USB DPLL on boot
ARM: OMAP1: fix USB host on 1710
From Anton Vorontsov <anton@enomsg.org>:
This tag includes Mac Lin's work to revive CNS3xxx booting:
"Since commit 0536bdf33f (ARM: move iotable mappings within the vmalloc
region), [...] the pre-defined iotable mappings is not in the vmalloc
region. [...] move the iotable mappings into the vmalloc region, and
merge the MPCore private memory region (containing the SCU, the GIC and
the TWD) as a single region."
Plus there is a small cosmetic fix, also from Mac Lin.
* tag 'v3.9-rc1_cns3xxx_fixes' of git://git.infradead.org/users/cbou/linux-cns3xxx:
ARM: cns3xxx: fix mapping of private memory region
[arnd: dropped the cosmetic fix from the merge as it is not needed for 3.9]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit ac3ebafa81 "ACPI / idle: remove usage of the statedata"
changed the percpu processor cstate to a unified cstate in ACPI idle.
That caused all our NHM boxes to boot hang or panic.
2178751 Task dump for CPU 1:
2178752 swapper/1 R running task 6736 0 1 0x00000000
2178753 ffff8801e8029dc8 ffffffff8101cf96 ffff8801e8029e28 ffffffff813d294b
2178754 0000000000000f99 0000000000000003 00000000003cf654 0000000025c17d03
2178755 ffff8801e8029e38 ffff8801e74fc000 00000002590dc5c4 ffffffff8163cdb0
2178756 Call Trace:
2178757 [<ffffffff8101cf96>] ? acpi_processor_ffh_cstate_enter+0x2d/0x2f
2178758 [<ffffffff813d294b>] acpi_idle_enter_bm+0x1b1/0x236
2178759 [<ffffffff8163cdb0>] ? disable_cpuidle+0x10/0x10
2178760 [<ffffffff8163cdc2>] cpuidle_enter+0x12/0x14
2178761 [<ffffffff8163d286>] cpuidle_wrap_enter+0x2f/0x6d
2178762 [<ffffffff8163d2d4>] cpuidle_enter_tk+0x10/0x12
2178763 [<ffffffff8163cdd6>] cpuidle_enter_state+0x12/0x3a
2178764 [<ffffffff8163d4a7>] cpuidle_idle_call+0xe8/0x161
2178765 [<ffffffff81008d99>] cpu_idle+0x5e/0xa4
2178766 [<ffffffff8174c6c1>] start_secondary+0x1a9/0x1ad
2178767 Task dump for CPU 2:
In fact, the ACPI idle is based on the assumption of difference percpu
cstate structures that are necessary for the implementation to work
cprrectly. A unique acpi_processor_cx is not sifficient by far.
This patch is just a quick fix re-introducing the percpu cstates.
If someone really wants to unify the ACPI cstates, please make sure
that the whole software infrastructure is changed and take hardware
as well as many different kinds of BIOS settings into account.
[rjw: Changelog]
Reported-by: LKP project <lkp@linux.intel.com>
Reported-by: Xie ChanglongX <changlongx.xie@intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ACPI handle of struct i2c_adapter's dev member should not be
set, because this causes that struct i2c_adapter to be associated
with the ACPI device node corresponding to its parent as the
second "physical_device", which is incorrect (this happens during
the registration of struct i2c_adapter). Consequently,
acpi_i2c_register_devices() should use the ACPI handle of the
parent of the struct i2c_adapter it is called for rather than the
struct i2c_adapter's ACPI handle (which should be NULL).
Make that happen and modify the i2c-designware-platdrv driver,
which currently is the only driver for ACPI-enumerated I2C
controller chips, not to set the ACPI handle for the
struct i2c_adapter it creates.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
It should be "governor".
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
platform_device_alloc could failed and return NULL,
we should check this before call platform_device_put.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
VESA_DMT_VSYNC_HIGH should be used instead of VESA_DMT_HSYNC_HIGH,
because FB_SYNC_VERT_HIGH_ACT is related to vsync, not to hsync.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
This patch let ELD debug message show 'pin_eld->monitor_present' which reflects
the real pin response to verb GET_PIN_SENSE.
'eld->monitor_present' should not be used here because 'eld' is a temp
structure now and so its "monitor_present" is not set.
Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Acked-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In function snd_hdmi_get_eld(), the variable 'ret' should be initialized to 0.
Otherwise it will be returned uninitialized as non-zero after ELD info is got
successfully. Thus hdmi_present_sense() will always assume ELD info is invalid
by mistake, and /proc file system cannot show the proper ELD info.
Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Cc: stable@vger.kernel.org
Acked-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A few more fixes here and there, including quite a few nasty driver
specific ones, but nothing that has a major general impact.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRUawIAAoJELSic+t+oim9RqsP/RGgUqjX68ewq4Ub9BbhsYGa
Xz4dCN6LACQVXvgNlMhR/TypaJ+vJ3xjlgb/WXNDRYPnEoffoqzDhZOwIaJH8rPR
wGvZGs+UZNuPeIwZ3TF3o0Ct3XYPzzPZPmLnZKL+ZWVgggbHgqGJmkK7smB1koNU
ppnqhPaid6DIlTcSP3jHsUCxcahtCUDUMNPqIzQx7k1QquucUXvs7HWWXaboqyh3
aci3NtDtaMJwM/qheBz0rxv0bk7FrtrukSdN0muSKmcd2Fy8VuwZWIXdB467gXUs
I46ZEOxV2XM+/Tq7HfzLiTu35cgCwXv3z7ayr6Z5ZXjn8B1FnDmqanh3XaawwKbr
Bx53x1X1ga83ms61z+S9QOZ8K74EcJcL6wE77QxsnuaOkgHZYdoZNC7u8i5+cHhN
Lr9ymozK52dH1uJo0/kWvYmK/3SFeHDOMVGiBVVeRQw81PKEPrPZid7veL10UUoG
WX224U1HeuH+0+3scFkaomd4mzhPoj//seIM+806OF4dzsiTLHC7SFU4i3HY23Fx
OH9LJUGvr3FygNqgbdsuO3jEu5N5lXn6zY/Dumi7dLga75vtn0llhXqgM6ec1CLF
jvlHN5UUN8hnxam4SBme5AcX2+rUbEXjD8VA82Y2NOtRWbkiYI5KYINj+SKR0lt+
uRfo30zqYhhaUwmCfXcI
=rovv
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.9
A few more fixes here and there, including quite a few nasty driver
specific ones, but nothing that has a major general impact.
All architectures need to provide a check_pgt_cache() function. The s390 one
got lost somewhere.
So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's
idle loop rework patches get merged.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When translating user space addresses to kernel addresses the follow_table()
function had two bugs:
- PROT_NONE mappings could be read accessed via the kernel mapping. That is
e.g. putting a filename into a user page, then protecting the page with
PROT_NONE and afterwards issuing the "open" syscall with a pointer to
the filename would incorrectly succeed.
- when walking the page tables it used the pgd/pud/pmd/pte primitives which
with dynamic page tables give no indication which real level of page tables
is being walked (region2, region3, segment or page table). So in case of an
exception the translation exception code passed to __handle_fault() is not
necessarily correct.
This is not really an issue since __handle_fault() doesn't evaluate the code.
Only in case of e.g. a SIGBUS this code gets passed to user space. If user
space can do something sane with the value is a different question though.
To fix these issues don't use any Linux primitives. Only walk the page tables
like the hardware would do it, however we leave quite some checks away since
we know that we only have full size page tables and each index is within bounds.
In theory this should fix all issues...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The ACPI handle of struct spi_master's dev member should not be
set, because this causes that struct spi_master to be associated
with the ACPI device node corresponding to its parent as the
second "physical_device", which is incorrect (this happens during
the registration of struct spi_master). Consequently,
acpi_register_spi_devices() should use the ACPI handle of the
parent of the struct spi_master it is called for rather than that
struct spi_master's ACPI handle (which should be NULL).
Make that happen and modify the spi-pxa2xx driver, which currently is
the only driver for ACPI-enumerated SPI controller chips, not to set
the ACPI handle for the struct spi_master it creates.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Fix compiler warnings generated when devfreq is not enabled
(CONFIG_PM_DEVFREQ is not set).
Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.
If E. g.
1) IRQ affinity is used to handle packets on a particular CPU already
(both single-queue or multi-queue case)
and/or
2) RPS is used to steer packets to a specific softirq
the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.
The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.
E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:
+-----+ +-----+ +-----+ +-----+
|NFQ#0| |NFQ#1| |NFQ#2| |NFQ#3|
+-----+ +-----+ +-----+ +-----+
^ ^ ^ ^
| |NFQUEUE | |
+ + + +
+-----+ +-----+ +-----+ +-----+
|rx-0 | |rx-1 | |rx-2 | |rx-3 |
+-----+ +-----+ +-----+ +-----+
The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.
This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.
By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit b81ea1b (PM / QoS: Fix concurrency issues and memory leaks in
device PM QoS) put calls to pm_qos_sysfs_add_latency(),
pm_qos_sysfs_add_flags(), pm_qos_sysfs_remove_latency(), and
pm_qos_sysfs_remove_flags() under dev_pm_qos_mtx, which was a
mistake, because it may lead to deadlocks in some situations.
For example, if pm_qos_remote_wakeup_store() is run in parallel
with dev_pm_qos_constraints_destroy(), they may deadlock in the
following way:
======================================================
[ INFO: possible circular locking dependency detected ]
3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319 Tainted: G W
-------------------------------------------------------
trinity-child6/12371 is trying to acquire lock:
(s_active#54){++++.+}, at: [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60
but task is already holding lock:
(dev_pm_qos_mtx){+.+.+.}, at: [<ffffffff81f07cc3>] dev_pm_qos_constraints_destroy+0x23/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dev_pm_qos_mtx){+.+.+.}:
[<ffffffff811811da>] lock_acquire+0x1aa/0x240
[<ffffffff83dab809>] __mutex_lock_common+0x59/0x5e0
[<ffffffff83dabebf>] mutex_lock_nested+0x3f/0x50
[<ffffffff81f07f2f>] dev_pm_qos_update_flags+0x3f/0xc0
[<ffffffff81f05f4f>] pm_qos_remote_wakeup_store+0x3f/0x70
[<ffffffff81efbb43>] dev_attr_store+0x13/0x20
[<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
[<ffffffff8127f2c1>] __kernel_write+0x81/0x150
[<ffffffff812afc2d>] write_pipe_buf+0x4d/0x80
[<ffffffff812af57c>] splice_from_pipe_feed+0x7c/0x120
[<ffffffff812afa25>] __splice_from_pipe+0x45/0x80
[<ffffffff812b14fc>] splice_from_pipe+0x4c/0x70
[<ffffffff812b1538>] default_file_splice_write+0x18/0x30
[<ffffffff812afae3>] do_splice_from+0x83/0xb0
[<ffffffff812afb2e>] direct_splice_actor+0x1e/0x20
[<ffffffff812b0277>] splice_direct_to_actor+0xe7/0x200
[<ffffffff812b15bc>] do_splice_direct+0x4c/0x70
[<ffffffff8127eda9>] do_sendfile+0x169/0x300
[<ffffffff8127ff94>] SyS_sendfile64+0x64/0xb0
[<ffffffff83db7d18>] tracesys+0xe1/0xe6
-> #0 (s_active#54){++++.+}:
[<ffffffff811800cf>] __lock_acquire+0x15bf/0x1e50
[<ffffffff811811da>] lock_acquire+0x1aa/0x240
[<ffffffff81300aa2>] sysfs_deactivate+0x122/0x1a0
[<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60
[<ffffffff812ff77f>] sysfs_hash_and_remove+0x7f/0xb0
[<ffffffff813035a1>] sysfs_unmerge_group+0x51/0x70
[<ffffffff81f068f4>] pm_qos_sysfs_remove_flags+0x14/0x20
[<ffffffff81f07490>] __dev_pm_qos_hide_flags+0x30/0x70
[<ffffffff81f07cd5>] dev_pm_qos_constraints_destroy+0x35/0x250
[<ffffffff81f06931>] dpm_sysfs_remove+0x11/0x50
[<ffffffff81efcf6f>] device_del+0x3f/0x1b0
[<ffffffff81efd128>] device_unregister+0x48/0x60
[<ffffffff82d4083c>] usb_hub_remove_port_device+0x1c/0x20
[<ffffffff82d2a9cd>] hub_disconnect+0xdd/0x160
[<ffffffff82d36ab7>] usb_unbind_interface+0x67/0x170
[<ffffffff81f001a7>] __device_release_driver+0x87/0xe0
[<ffffffff81f00559>] device_release_driver+0x29/0x40
[<ffffffff81effc58>] bus_remove_device+0x148/0x160
[<ffffffff81efd07f>] device_del+0x14f/0x1b0
[<ffffffff82d344f9>] usb_disable_device+0xf9/0x280
[<ffffffff82d34ff8>] usb_set_configuration+0x268/0x840
[<ffffffff82d3a7fc>] usb_remove_store+0x4c/0x80
[<ffffffff81efbb43>] dev_attr_store+0x13/0x20
[<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
[<ffffffff8127f71d>] do_loop_readv_writev+0x4d/0x90
[<ffffffff8127f999>] do_readv_writev+0xf9/0x1e0
[<ffffffff8127faba>] vfs_writev+0x3a/0x60
[<ffffffff8127fc60>] SyS_writev+0x50/0xd0
[<ffffffff83db7d18>] tracesys+0xe1/0xe6
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dev_pm_qos_mtx);
lock(s_active#54);
lock(dev_pm_qos_mtx);
lock(s_active#54);
*** DEADLOCK ***
To avoid that, remove the calls to functions mentioned above from
under dev_pm_qos_mtx and introduce a separate lock to prevent races
between functions that add or remove device PM QoS sysfs attributes
from happening.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the call to dev_pm_qos_hide_flags(), added by commit 6e30d7cb
"usb: Add driver/usb/core/(port.c,hub.h) files", from
usb_port_device_release(), because (1) it is completely unnecessary
(the flags have been removed already by the PM core during the
unregistration of the device object) and (2) it triggers a NULL
pointer dereference in sysfs_find_dirent() (dev->kobj.sd is NULL at
this point).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.
Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This is the final step in RCU conversion.
Things that are removed:
- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
RCU and work in parallel with configuration
Other changes:
- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
the schedulers are prepared for such RCU readers
even after done_service is called but they need
to use synchronize_rcu because last ip_vs_scheduler_put
can happen while RCU read-side critical sections
use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
If dests were present, the services are freed from
the dest trash code
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.
We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.
Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.
As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Use the new add_dest and del_dest methods
to reassign dests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>