These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.
In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts. These callbacks allow the same
coordination directly, and more simply.
Signed-off-by: David Teigland <teigland@redhat.com>
Pointer coding style changes
: add space between return type and function pointer
ex) u8(*get_batt_present) (void)
-> u8 (*get_batt_present) (void)
Signed-off-by: Woogyom Kim <milo.kim@ti.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
For the default value of power supply type, "unknown" is added.
With default prop value, supply type property can be displayed
as default - "Unknown".
Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Charger Manager provides power-supply-class aggregating
information from multiple chargers and a fuel-gauge.
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Because battery health monitoring should be done even when suspended,
it needs to wake up and suspend periodically. Thus, userspace battery
monitoring may incur too much overhead; every device and task is woken
up periodically. Charger Manager uses suspend-again to provide
in-suspend monitoring.
This patch allows to monitor battery health in-suspend state.
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
No need to duplicate them in both callers; make it return
ERR_PTR(-ENOMEM) on allocation failure instead of NULL and
it'll be able to report rpc_lookup_cred() failures just
fine. Callers are much happier that way...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
SMSC generation 4 LAN chips integrate an IEEE 802.3 ethernet physical layer.
The ethernet driver for this family of devices needs to access the SMSC PHY
registers and bit-fields.
So, this patch moves these constants to a place where it can be used for both
the PHY and LAN drivers.
Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1e39f384bb ("evm: fix build problems") makes the stub version
of security_old_inode_init_security() return 0 when CONFIG_SECURITY is
not set.
But that makes callers such as reiserfs_security_init() assume that
security_old_inode_init_security() has set name, value, and len
arguments properly - but security_old_inode_init_security() left them
uninitialized which then results in interesting failures.
Revert security_old_inode_init_security() to the old behavior of
returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this
just fine.
[ Also fixed the S_PRIVATE(inode) case of the actual non-stub
security_old_inode_init_security() function to return EOPNOTSUPP
for the same reason, as pointed out by Mimi Zohar.
It got incorrectly changed to match the new function in commit
fb88c2b6cbb1: "evm: fix security/security_old_init_security return
code". - Linus ]
Reported-by: Jorge Bastos <mysql.jorge@decimal.pt>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows consumers to forcibly disable multiple regulator
clients in a single API call.
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch removes maxpin member in the pin control descriptor
because we don't need this value as we enumerate a pin space
using offset.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Obtaining a "struct pinctrl_dev *" is difficult for code not directly
related to the pinctrl subsystem. However, the device name of the pinctrl
device is fairly well known. So, modify pin_config_*() to take the device
name instead of the "struct pinctrl_dev *".
Signed-off-by: Stephen Warren <swarren@nvidia.com>
[rebased on top of refactoring code]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This allows one to include pinconf.h without having to include other
headers first.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
To create elegant tables for pinmux hogs on the PXA MMP platform,
we need this hog macro that can specify both function and group in
one go.
Acked-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Pin controllers should already be instantiated as a device, so there's
no need for the pinctrl core to create a new struct device for each
controller.
This allows the controller's real name to be used in the mux mapping
table, rather than e.g. "pinctrl.0", "pinctrl.1", etc.
This necessitates removal of the PINMUX_MAP_PRIMARY*() macros, since
their sole purpose was to hard-code the .ctrl_dev_name field to be
"pinctrl.0".
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This is the same as PINMUX_MAP_PRIMARY_SYS_HOG, except that it allows
you to specify a particular control device.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This add per-pin and per-group pin config interfaces for biasing,
driving and other such electronic properties. The details of passed
configurations are passed in an opaque unsigned long which may be
dereferences to integer types, structs or lists on either side
of the configuration interface.
ChangeLog v1->v2:
- Clear split of terminology: we now have pin controllers, and
those may support two interfaces using vtables: pin
multiplexing and pin configuration.
- Break out pin configuration to its own C file, controllers may
implement only config without mux, and vice versa, so keep each
sub-functionality of pin controllers separate. Introduce
CONFIG_PINCONF in Kconfig.
- Implement some core logic around pin configuration in the
pinconf.c file.
- Remove UNKNOWN config states, these were just surplus baggage.
- Remove FLOAT config state - HIGH_IMPEDANCE should be enough for
everyone.
- PIN_CONFIG_POWER_SOURCE added to handle switching the power
supply for the pin logic between different sources
- Explicit DISABLE config enums to turn schmitt-trigger,
wakeup etc OFF.
- Update documentation to reflect all the recent reasoning.
ChangeLog v2->v3:
- Twist API around to pass around arrays of config tuples instead
of (param, value) pairs everywhere.
- Explicit drive strength semantics for push/pull and similar
drive modes, this shall be the number of drive stages vs
nominal load impedance, which should match the actual
electronics used in push/pull CMOS or TTY totempoles.
- Drop load capacitance configuration - I probably don't know
what I'm doing here so leave it out.
- Drop PIN_CONFIG_INPUT_SCHMITT_OFF, instead the argument zero to
PIN_CONFIG_INPUT_SCHMITT turns schmitt trigger off.
- Drop PIN_CONFIG_NORMAL_POWER_MODE and have a well defined
argument to PIN_CONFIG_LOW_POWER_MODE to get out of it instead.
- Drop PIN_CONFIG_WAKEUP_ENABLE/DISABLE and just use
PIN_CONFIG_WAKEUP with defined value zero to turn wakeup off.
- Add PIN_CONFIG_INPUT_DEBOUNCE for configuring debounce time
on input lines.
- Fix a bug when we tried to configure pins for pin controllers
without pinconf support.
- Initialized debugfs properly so it works.
- Initialize the mutex properly and lock around config tampering
sections.
- Check the return value from get_initial_config() properly.
ChangeLog v3->v4:
- Export the pin_config_get(), pin_config_set() and
pin_config_group() functions.
- Drop the entire concept of just getting initial config and
keeping track of pin states internally, instead ask the pins
what state they are in. Previous idea was plain wrong, if the
device cannot keep track of its state, the driver should do
it.
- Drop the generic configuration layout, it seems this impose
too much restriction on some pin controllers, so let them do
things the way they want and split off support for generic
config as an optional add-on.
ChangeLog v4->v5:
- Introduce two symmetric driver calls for group configuration,
.pin_config_group_[get|set] and corresponding external calls.
- Remove generic semantic meanings of return values from config
calls, these belong in the generic config patch. Just pass the
return value through instead.
- Add a debugfs entry "pinconf-groups" to read status from group
configuration only, also slam in a per-group debug callback in
the pinconf_ops so custom drivers can display something
meaningful for their pins.
- Fix some dangling newline.
- Drop dangling #else clause.
- Update documentation to match the above.
ChangeLog v5->v6:
- Change to using a pin name as parameter for the
[get|set]_config() functions, as suggested by Stephen Warren.
This is more natural as names will be what a developer has
access to in written documentation etc.
ChangeLog v6->v7:
- Refactor out by-pin and by-name get/set functions, only expose
the by-name functions externally, expose the by-pin functions
internally.
- Show supported pin control functionality in the debugfs
pinctrl-devices file.
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This makes a deep copy of the pinmux function map instead of
keeping the copy supplied from the platform around. This makes
it possible to tag the platforms map with __initdata as is also
done as part of this patch.
Rationale: a certain target platform (PXA) has numerous
pinmux maps, many of which will be lying around unused after
boot in a multi-platform binary. Instead, deep-copy the one
we're going to use and tag them all __initdata so they go away
after boot.
ChangeLog v1->v2:
- Fixup the deep copy, missed a few items on the struct,
plus mark bool member non-const since we're making runtime
copies if this stuff now.
ChangeLog v2->v3:
- Make a shallow copy (just copy the array of map structs)
as Arnd noticed, string constants never get discarded by the
kernel anyway, so these pointers may be safely copied over.
Reviewed-by: Arnd Bergmann <arnd.bergmann@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When requesting a single GPIO pin to be muxed in, some controllers
will need to poke a different value into the control register
depending on whether the pin will be used for GPIO output or GPIO
input. So create pinmux counterparts to gpio_direction_[input|output]
in the pinctrl framework.
ChangeLog v1->v2:
- This also amends the documentation to make it clear the this
function and associated machinery is *ONLY* intended as a backend
to gpiolib machinery, not for everyone and his dog to start playing
around with pins.
ChangeLog v2->v3:
- Don't pass an argument to the common request function, instead
provide pinmux_* counterparts to the gpio_direction_[input|output]
calls, simpler and anyone can understand it.
ChangeLog v3->v4:
- Fix numerous spelling mistakes and dangling text in documentation.
Add Ack and Rewewed-by.
Cc: Igor Grinberg <grinberg@compulab.co.il>
Acked-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Thomas Abraham <thomas.abraham@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This patch enables mapping a base offset of gpio ranges with
a pin offset even if does'nt matched. A base of pinctrl_gpio_range
means a base offset of gpio. However, we cannot convert gpio to pin
number for sparse gpio ranges just only using a gpio base offset.
We can convert a gpio to real pin number(even if not matched) using
a new pin_base which means a base pin offset of requested gpio range.
Now, the pin control subsystem passes the pin base offset to the
pinmux driver.
For example, let's assume below two gpio ranges in the system.
static struct pinctrl_gpio_range gpio_range_a = {
.name = "chip a",
.id = 0,
.base = 32,
.pin_base = 32,
.npins = 16,
.gc = &chip_a;
};
static struct pinctrl_gpio_range gpio_range_b = {
.name = "chip b",
.id = 0,
.base = 48,
.pin_base = 64,
.npins = 8,
.gc = &chip_b;
};
We can calucalate a exact pin ranges even if doesn't matched with gpio ranges.
chip a:
gpio-range : [32 .. 47]
pin-range : [32 .. 47]
chip b:
gpio-range : [48 .. 55]
pin-range : [64 .. 71]
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Some pinctrl drivers (Tegra at least) program a pin to be a GPIO in a
completely different manner than they select which function to mux out of
that pin. In order to support a single "free" pinmux_op, the driver would
need to maintain a per-pin state of requested-for-gpio vs. requested-for-
function. However, that's a lot of work when the core already has explicit
separate paths for gpio request/free and function request/free.
So, add a gpio_disable_free op to struct pinmux_ops, and make pin_free()
call it when appropriate.
When doing this, I noticed that when calling pin_request():
!!gpio == (gpio_range != NULL)
... and so I collapsed those two parameters in both pin_request(), and
when adding writing the new code in pin_free().
Also, for pin_free():
!!free_func == (gpio_range != NULL)
However, I didn't want pin_free() to know about the GPIO function naming
special case, so instead, I reworked pin_free() to always return the pin's
previously requested function, and now pinmux_free_gpio() calls
kfree(function). This is much more balanced with the allocation having
been performed in pinmux_request_gpio().
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
regulator_force_disable() was omitted in consumer.h for
!CONFIG_REGULATOR case.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.
I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Fix indentation of sock_diag*() calls. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a routine that dumps memory-related values of a socket.
It's made as an array to make it possible to add more stuff
here later without breaking compatibility.
Since v1: The SK_MEMINFO_ constants are in userspace
visible part of sock_diag.h, the rest is under __KERNEL__.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The headers check complains it should include the linux/types.h
withing, thus add this one.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly toss existing components around the ifdef __KERNEL__
and include the header into the header-y target.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we clear revoked flag only when a block is reused. However,
this can tigger a false journal error. Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file. At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.
We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add empty of_get_node/of_put_node functions for !CONFIG_OF builds.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
It's for migrating to generic clk framework API.
The helper functions help cases clk_enable/clk_disable is used
in non-atomic context.
For example, Call clk_enable in probe and clk_disable in remove.
Signed-off-by: Richard Zhao <richard.zhao@linaro.org>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Marek Vasut <marek.vasut@gmail.com>
Some displays from AUO have a so called in-cell touchscreen, meaning it
is built directly into the display unit.
Touchdata is gathered through PIXCIR Tango-ICs and processed in an
Atmel ATmega168P with custom firmware. Communication between the host
system and ATmega is done via I2C.
Devices using this touch solution include the Dell Streak5 and the family
of Qisda ebook readers.
The driver reports single- and multi-touch events including touch area
values.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Use the new macro and struct names in xt_ecn.h, and put the old
definitions into a definition-forwarding ipt_ecn.h.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Prepare the ECN match for augmentation by an IPv6 counterpart. Since
no symbol dependencies to ipv6.ko are added, having a single ecn match
module is the more so welcome.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make this macro easier to use(do not need to pass properties, a node is
enough), also change to a more sensible name as for_each_child_of_node.
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
This allows dt_compat to point to a constant list of compatible strings.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
When syncing KVM headers with QEMU I (or whoever applies the
diff) end up automatically fixing whitespaces. One of them
is in kvm_para.h.
It's a lot more consistent for people who don't do the whitespace
fixups automatically to already have fixed headers in Linux. So
remove the sparse empty line at the end of kvm_para.h and everyone's
happy.
Reported-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Use perf_events to emulate an architectural PMU, version 2.
Based on PMU version 1 emulation by Avi Kivity.
[avi: adjust for cpuid.c]
[jan: fix anonymous field initialization for older gcc]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Drop bsp_vcpu pointer from kvm struct since its only use is incorrect
anyway.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The operation of getting dirty log is frequent when framebuffer-based
displays are used(for example, Xwindow), so, we introduce a mapping table
to speed up id_to_memslot()
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sort memslots base on its size and use line search to find it, so that the
larger memslots have better fit
The idea is from Avi
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce id_to_memslot to get memslot by slot id
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce kvm_for_each_memslot to walk all valid memslot
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce update_memslots to update slot which will be update to
kvm->memslots
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Needed for the next patch which uses this number to decide how to write
protect a slot.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
The kvm_host struct can include an mmu_notifier struct but mmu_notifier.h is
not included directly.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
This bit requests that when next entering the guest, we should run it only
for as little as possible, and exit again.
We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
to continue running so it can inject an event to it, we unfortunately cannot
just pretend to have run L2 for a little while - We must really launch L2,
otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
will be lost. So the existing code runs L2 in this case.
But L2 could potentially run for a long time until it exits, and the
injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
to request that L2 will be entered, as necessary, but will exit as soon as
possible after entry.
Our implementation of this request uses smp_send_reschedule() to send a
self-IPI, with interrupts disabled. The interrupts remain disabled until the
guest is entered, and then, after the entry is complete (often including
processing an injection and jumping to the relevant handler), the physical
interrupt is noticed and causes an exit.
On recent Intel processors, we could have achieved the same goal by using
MTF instead of a self-IPI. Another technique worth considering in the future
is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
slightly improve performance by avoiding the useless interrupt handler
which ends up being called when smp_send_reschedule() is used.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
...it should return the return code from schedule_timeout_killable(),
not the one from freezer_count().
All of the current callers ignore the return code so the bug is
harmless but it's worth fixing.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.
This is broken in several ways:
- if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
deadline timer feature, a guest that uses the feature will break
- live migration to older host kernels that don't support the TSC deadline
timer will cause the feature to be pulled from under the guest's feet;
breaking it
- guests that are broken wrt the feature will fail.
Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.
Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.
[avi: add the KVM_CAP + documentation]
Reported-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* pm-domains:
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
PM / Domains: Provide an always on power domain governor
PM / Domains: Fix default system suspend/resume operations
PM / Domains: Make it possible to assign names to generic PM domains
PM / Domains: fix compilation failure for CONFIG_PM_GENERIC_DOMAINS unset
PM / Domains: Automatically update overoptimistic latency information
PM / Domains: Add default power off governor function (v4)
PM / Domains: Add device stop governor function (v4)
PM / Domains: Rework system suspend callback routines (v2)
PM / Domains: Introduce "save/restore state" device callbacks
PM / Domains: Make it possible to use per-device domain callbacks
* pm-sleep: (51 commits)
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
PM / Sleep: Make [un]lock_system_sleep() generic
PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
PM / Sleep: Unify diagnostic messages from device suspend/resume
ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
PM / Hibernate: Remove deprecated hibernation test modes
PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
...
Conflicts:
kernel/kmod.c
Some devices, like the I2C controller on SH7372, are not
necessary for providing power to their children or forwarding
wakeup signals (and generally interrupts) from them. They are
only needed by their children when there's some data to transfer,
so they may be suspended for the majority of time and resumed
on demand, when the children have data to send or receive. For this
purpose, however, their power.ignore_children flags have to be set,
or the PM core wouldn't allow them to be suspended while their
children were active.
Unfortunately, in some situations it may take too much time to
resume such devices so that they can assist their children in
transferring data. For example, if such a device belongs to a PM
domain which goes to the "power off" state when that device is
suspended, it may take too much time to restore power to the
domain in response to the request from one of the device's
children. In that case, if the parent's resume time is critical,
the domain should stay in the "power on" state, although it still may
be desirable to power manage the parent itself (e.g. by manipulating
its clock).
In general, device PM QoS may be used to address this problem.
Namely, if the device's children added PM QoS latency constraints
for it, they would be able to prevent it from being put into an
overly deep low-power state. However, in some cases the devices
needing to be serviced are not the immediate children of a
"children-ignoring" device, but its grandchildren or even less
direct descendants. In those cases, the entity wanting to add a
PM QoS request for a given device's ancestor that ignores its
children will have to find it in the first place, so introduce a new
helper function that may be used to achieve that. This function,
dev_pm_qos_add_ancestor_request(), will search for the first
ancestor of the given device whose power.ignore_children flag is
set and will add a device PM QoS latency request for that ancestor
on behalf of the caller. The request added this way may be removed
with the help of dev_pm_qos_remove_request() in the future, like
any other device PM QoS latency request.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Since the SH7372's INTCS in included into syscore suspend/resume,
which causes the chip to be accessed when PM domains have been
turned off during system suspend, the A4R domain containing the
INTCS has to stay on during system sleep, which is suboptimal
from the power consumption point of view.
For this reason, add a new INTC flag, skip_syscore_suspend, to mark
the INTCS for intc_suspend() and intc_resume(), so that they don't
touch it. This allows the A4R domain to be turned off during
system suspend and the INTCS state is resrored during system
resume by the A4R's "power on" code.
Suggested-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@opensource.se>
We currently have two ways to account traffic in netfilter:
- iptables chain and rule counters:
# iptables -L -n -v
Chain INPUT (policy DROP 3 packets, 867 bytes)
pkts bytes target prot opt in out source destination
8 1104 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
- use flow-based accounting provided by ctnetlink:
# conntrack -L
tcp 6 431999 ESTABLISHED src=192.168.1.130 dst=212.106.219.168 sport=58152 dport=80 packets=47 bytes=7654 src=212.106.219.168 dst=192.168.1.130 sport=80 dport=58152 packets=49 bytes=66340 [ASSURED] mark=0 use=1
While trying to display real-time accounting statistics, we require
to pool the kernel periodically to obtain this information. This is
OK if the number of flows is relatively low. However, in case that
the number of flows is huge, we can spend a considerable amount of
cycles to iterate over the list of flows that have been obtained.
Moreover, if we want to obtain the sum of the flow accounting results
that match some criteria, we have to iterate over the whole list of
existing flows, look for matchings and update the counters.
This patch adds the extended accounting infrastructure for
nfnetlink which aims to allow displaying real-time traffic accounting
without the need of complicated and resource-consuming implementation
in user-space. Basically, this new infrastructure allows you to create
accounting objects. One accounting object is composed of packet and
byte counters.
In order to manipulate create accounting objects, you require the
new libnetfilter_acct library. It contains several examples of use:
libnetfilter_acct/examples# ./nfacct-add http-traffic
libnetfilter_acct/examples# ./nfacct-get
http-traffic = { pkts = 000000000000, bytes = 000000000000 };
Then, you can use one of this accounting objects in several iptables
rules using the new nfacct match (which comes in a follow-up patch):
# iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
# iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic
The idea is simple: if one packet matches the rule, the nfacct match
updates the counters.
Thanks to Patrick McHardy, Eric Dumazet, Changli Gao for reviewing and
providing feedback for this contribution.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.
Theorical limit on number of flows is 2^32
Fix some buggy RPS/RFS macros as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
CC: Xi Wang <xi.wang@gmail.com>
CC: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
irqdomain support is used in interrupt controller drivers that may not
have device tree support but only need the basic HW->Linux irq
translation. Rather than having each of these implement their own IRQ
domain, allow them to use the simple ops.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <robherring2@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Conflicts:
net/bluetooth/l2cap_core.c
Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to accommodate a 64K buffer we need 64K/PAGE_SIZE plus one more page
in order to allow for a buffer which does not start on a page boundary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export the NAT definitions to userspace. So far userspace (specifically,
iptables) has been copying the headers files from include/net. Also
rename some structures and definitions in preparation for IPv6 NAT.
Since these have never been officially exported, this doesn't affect
existing userspace code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This partially reworks bc01befdcf
which added userspace expectation support.
This patch removes the nf_ct_userspace_expect_list since now we
force to use the new iptables CT target feature to add the helper
extension for conntracks that have attached expectations from
userspace.
A new version of the proof-of-concept code to implement userspace
helpers from userspace is available at:
http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2
This patch also modifies the CT target to allow to set the
conntrack's userspace helper status flags. This flag is used
to tell the conntrack system to explicitly allocate the helper
extension.
This helper extension is useful to link the userspace expectations
with the master conntrack that is being tracked from one userspace
helper.
This feature fixes a problem in the current approach of the
userspace helper support. Basically, if the master conntrack that
has got a userspace expectation vanishes, the expectations point to
one invalid memory address. Thus, triggering an oops in the
expectation deletion event path.
I decided not to add a new revision of the CT target because
I only needed to add a new flag for it. I'll document in this
issue in the iptables manpage. I have also changed the return
value from EINVAL to EOPNOTSUPP if one flag not supported is
specified. Thus, in the future adding new features that only
require a new flag can be added without a new revision.
There is no official code using this in userspace (apart from
the proof-of-concept) that uses this infrastructure but there
will be some by beginning 2012.
Reported-by: Sam Roberts <vieuxtech@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The transfer direction for a channel can be inferred from the transfer
request and the need for specifying transfer direction in platfrom data
can be eliminated. So the structure definition 'struct dma_pl330_peri'
is no longer required.
The channel's private data is set to point to a channel id specified in
the platform data (instead of an instance of type 'struct dma_pl330_peri').
The filter function is correspondingly modified to match the channel id.
With the 'struct dma_pl330_peri' removed from platform data, the dma
controller transfer capabilities cannot be inferred any more. Hence,
the dma controller capabilities is specified using platform data.
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Acked-by: Boojin Kim <boojin.kim@samsung.com>
Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The dma channel selection filter function is moved from plat-samsung
into the pl330 driver. In additon to that, a check is added in the
filter function to ensure that the channel on which the filter has
been invoked is pl330 channel instance (and avoid any incorrect
access of chan->private in a system with multiple types of DMA
drivers).
Suggested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
hot-replace is a feature being added to md which will allow a
device to be replaced without removing it from the array first.
With hot-replace a spare can be activated and recovery can start while
the original device is still in place, thus allowing a transition from
an unreliable device to a reliable device without leaving the array
degraded during the transition. It can also be use when the original
device is still reliable but it not wanted for some reason.
This will eventually be supported in RAID4/5/6 and RAID10.
This patch adds a super-block flag to distinguish the replacement
device. If an old kernel sees this flag it will reject the device.
It also adds two per-device flags which are viewable and settable via
sysfs.
"want_replacement" can be set to request that a device be replaced.
"replacement" is set to show that this device is replacing another
device.
The "rd%d" links in /sys/block/mdXx/md only apply to the original
device, not the replacement. We currently don't make links for the
replacement - there doesn't seem to be a need.
Signed-off-by: NeilBrown <neilb@suse.de>
While using etags to find free_pages(), I stumbled across this debug
definition of free_pages() that is to be used while debugging some raid
code in userspace. The __get_free_pages() allocates the correct size,
but the free_pages() does not match. free_pages(), like
__get_free_pages(), takes an order and not a size.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NeilBrown <neilb@suse.de>
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.
-tj: This is part of on-going percpu API cleanup. For detailed
discussion of the subject, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1222078
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
It adds device tree probe support for mc13892-regulator driver.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Currently, the *_global_[un]lock_online() routines are not at all synchronized
with CPU hotplug. Soft-lockups detected as a consequence of this race was
reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
for finding out that the root-cause of this issue is the race condition
between br_write_[un]lock() and CPU hotplug, which results in the lock states
getting messed up).
Fixing this race by just adding {get,put}_online_cpus() at appropriate places
in *_global_[un]lock_online() is not a good option, because, then suddenly
br_write_[un]lock() would become blocking, whereas they have been kept as
non-blocking all this time, and we would want to keep them that way.
So, overall, we want to ensure 3 things:
1. br_write_lock() and br_write_unlock() must remain as non-blocking.
2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
for different sets of CPUs.
3. Either prevent any new CPU online operation in between this lock-unlock, or
ensure that the newly onlined CPU does not proceed with its corresponding
per-cpu spinlock unlocked.
To achieve all this:
(a) We introduce a new spinlock that is taken by the *_global_lock_online()
routine and released by the *_global_unlock_online() routine.
(b) We register a callback for CPU hotplug notifications, and this callback
takes the same spinlock as above.
(c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
initialized in the lock_init() code, all future updates to it are done in
the callback, under the above spinlock.
(d) The above bitmap is used (instead of cpu_online_mask) while locking and
unlocking the per-cpu locks.
The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
br_write_lock-unlock sequence is in progress, the callback keeps spinning,
thus preventing the CPU online operation till the lock-unlock sequence is
complete. This takes care of requirement (3).
The bitmap that we maintain remains unmodified throughout the lock-unlock
sequence, since all updates to it are managed by the callback, which takes
the same spinlock as the one taken by the lock code and released only by the
unlock routine. Combining this with (d) above, satisfies requirement (2).
Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
operations from racing with br_write_lock-unlock, requirement (1) is also
taken care of.
By the way, it is to be noted that a CPU offline operation can actually run
in parallel with our lock-unlock sequence, because our callback doesn't react
to notifications earlier than CPU_DEAD (in order to maintain our bitmap
properly). And this means, since we use our own bitmap (which is stale, on
purpose) during the lock-unlock sequence, we could end up unlocking the
per-cpu lock of an offline CPU (because we had locked it earlier, when the
CPU was online), in order to satisfy requirement (2). But this is harmless,
though it looks a bit awkward.
Debugged-by: Cong Meng <mc@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Now that there are no in-kernel users of this function, remove it as it
is no longer needed.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since the PM core is now going to execute driver callbacks directly
if the corresponding subsystem callbacks are not present,
forward-only subsystem callbacks (i.e. such that only execute the
corresponding driver callbacks) are not necessary any more. Thus
it is possible to remove generic_subsys_pm_ops, because the only
callback in there that is not forward-only, .runtime_idle, is not
really used by the only user of generic_subsys_pm_ops, which is
vio_bus_type.
However, the generic callback routines themselves cannot be removed
from generic_ops.c, because they are used individually by a number
of subsystems.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The forward-only PM callbacks provided by the platform bus type are
not necessary any more, because the PM core executes driver callbacks
when the corresponding subsystem callbacks are not present, so drop
them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* master: (848 commits)
SELinux: Fix RCU deref check warning in sel_netport_insert()
binary_sysctl(): fix memory leak
mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
ipmi_watchdog: restore settings when BMC reset
oom: fix integer overflow of points in oom_badness
memcg: keep root group unchanged if creation fails
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
nilfs2: unbreak compat ioctl
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mmc: vub300: fix type of firmware_rom_wait_states module parameter
Revert "mmc: enable runtime PM by default"
mmc: sdhci: remove "state" argument from sdhci_suspend_host
x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
cgroups: fix a css_set not found bug in cgroup_attach_proc
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
...
Conflicts:
kernel/cgroup_freezer.c
Some controllers support scatter/gather transfers
and that might be very useful for some gadget drivers.
This means that we can make use of larger buffer
allocations which means we will have less completion
IRQs overtime, thus improving the perceived performance.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Gadget drivers should be compilable on all architectures.
This patch removes one dependency on architecture-specific code.
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Defer LED setting action to a workqueue.
This is more likely to send all LED change events in a single URB.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This event counts the number of reference core cpu cycles.
Reference means that the event increments at a constant rate which
is not subject to core CPU frequency adjustments. The event may
not count when the processor is in halted (low power) state.
As such, it may not be equivalent to wall clock time. However,
when the processor is not halted state, the event keeps
a constant correlation with wall clock time.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323559734-3488-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
-> #2 (&tty->write_wait){-.-...}:
is a lot more informative than:
-> #2 (key#19){-.....}:
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-8zpopbny51023rdb0qq67eye@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
to record the state of SACK/FACK and DSACK for better readability and maintenance.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/clocksource: Fix kernel-doc warnings
rtc: m41t80: Workaround broken alarm functionality
rtc: Expire alarms after the time is set.
Merge in the upstream tree to bring in the mainline fixes.
Conflicts:
drivers/gpu/drm/exynos/exynos_drm_fbdev.c
drivers/gpu/drm/nouveau/nouveau_sgdma.c
This patch adds support for EHCI compliant HSUSB Host controller found
on Marvell Socs.
It fits both OTG and SPH controller on marvell Socs, including
PXA9xx/MMP2/MMP3/MGx.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This driver is for ChipIdea USB OTG controller on Marvell Socs.
PXA9xx/MMP2/MMP3/MGx all have this USB OTG controller.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This patch do the following things:
1. Change the Kconfig information.
2. Rename the driver name.
3. Don't do any type cast to io memory.
4. Add dummy stub for clk framework.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This documents the fields added to security_file_mmap() that were
introduced in ed03218951.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <jmorris@namei.org>
This reverts commit 5c3ddec73d.
S390 qeth driver actually still uses the setup ops.
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.
2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
IB or Ethernet.
The driver will use this hint in case the user didn't specify explicitly the link layer
type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
and default the port to link sensing
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ConnectX3 devices, we allow link sensing only if FW explicitly
reported it supports the feature.
For older versions (ConnectX1 and 2), if the card supports both link layer types
(Ethenet and Infiniband), link sensing is supported.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If station info contains a beacon loss count, return
it to userspace.
Signed-off-by: Paul Stewart <pstew@chromium.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For super speed bulk transfer, the max burst size
is 16, so that 4 bits of maxburst cannot store it.
Signed-off-by: Yu Xu <yuxu@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Compensate the task's think time when computing the final pause time,
so that ->dirty_ratelimit can be executed accurately.
think time := time spend outside of balance_dirty_pages()
In the rare case that the task slept longer than the 200ms period time
(result in negative pause time), the sleep time will be compensated in
the following periods, too, if it's less than 1 second.
Accumulated errors are carefully avoided as long as the max pause area
is not hitted.
Pseudo code:
period = pages_dirtied / task_ratelimit;
think = jiffies - dirty_paused_when;
pause = period - think;
1) normal case: period > think
pause = period - think
dirty_paused_when = jiffies + pause
nr_dirtied = 0
period time
|===============================>|
think time pause time
|===============>|==============>|
------|----------------|---------------|------------------------
dirty_paused_when jiffies
2) no pause case: period <= think
don't pause; reduce future pause time by:
dirty_paused_when += period
nr_dirtied = 0
period time
|===============================>|
think time
|===================================================>|
------|--------------------------------+-------------------|----
dirty_paused_when jiffies
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
De-account the accumulative dirty counters on page redirty.
Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)
a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN
This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints).
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
It's a years long problem that a large number of short-lived dirtiers
(eg. gcc instances in a fast kernel build) may starve long-run dirtiers
(eg. dd) as well as pushing the dirty pages to the global hard limit.
The solution is to charge the pages dirtied by the exited gcc to the
other random dirtying tasks. It sounds not perfect, however should
behave good enough in practice, seeing as that throttled tasks aren't
actually running so those that are running are more likely to pick it up
and get throttled, therefore promoting an equal spread.
Randy: fix compile error: 'dirty_throttle_leaks' undeclared in exit.c
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux:
drm/i915/dp: Dither down to 6bpc if it makes the mode fit
drm/i915: enable semaphores on per-device defaults
drm/i915: don't set unpin_work if vblank_get fails
drm/i915: By default, enable RC6 on IVB and SNB when reasonable
iommu: Export intel_iommu_enabled to signal when iommu is in use
drm/i915/sdvo: Include LVDS panels for the IS_DIGITAL check
drm/i915: prevent division by zero when asking for chipset power
drm/i915: add PCH info to i915_capabilities
drm/i915: set the right SDVO transcoder for CPT
drm/i915: no-lvds quirk for ASUS AT5NM10T-I
drm/i915: Treat pre-gen4 backlight duty cycle value consistently
drm/i915: Hook up Ivybridge eDP
drm/i915: add multi-threaded forcewake support
All drivers that support modification of the RX flow hash indirection
table initialise it in the same way: RX rings are assigned to table
entries in rotation. Make that default policy explicit by having them
call a ethtool_rxfh_indir_default() function.
In the ethtool core, add support for a zero size value for
ETHTOOL_SRXFHINDIR, which resets the table to this default.
Partly-suggested-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new ethtool operation (get_rxfh_indir_size) to get the
indirectional table size. Use this to validate the user buffer size
before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get
the number of RX rings, and validate the contents of the new
indirection table before calling set_rxfh_indir. Remove this
validation from drivers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to find out the device's RX flow hash table size, ethtool
initially uses ETHTOOL_GRXFHINDIR with a buffer size of zero. This
must be supported, but it is not necessary to support any other user
buffer size less than the device table size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When establishing a unix connection on stream sockets the
server end receives an skb with socket in its receive queue.
Report who is waiting for these ends to be accepted for
listening sockets via NLA.
There's a lokcing issue with this -- the unix sk state lock is
required to access the peer, and it is taken under the listening
sk's queue lock. Strictly speaking the queue lock should be taken
inside the state lock, but since in this case these two sockets
are different it shouldn't lead to deadlock.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the peer socket inode ID as NLA. With this it's finally
possible to find out the other end of an interesting unix connection.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Actually, the socket path if it's not anonymous doesn't give
a clue to which file the socket is bound to. Even if the path
is absolute, it can be unlinked and then new socket can be
bound to it.
With this NLA it's possible to check which file a particular
socket is really bound to.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the sun_path when requested as NLA. With leading '\0' if
present but without the leading AF_UNIX bits.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes basic module_init/_exit functionality, dump/get_exact stubs
and declares the basic API structures for request and response.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sk address is used as a cookie between dump/get_exact calls.
It will be required for unix socket sdumping, so move it from
inet_diag to sock_diag.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should belong to sock_diag, not inet_diag.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In i915 driver, we do not enable either rc6 or semaphores on SNB when dmar
is enabled. The new 'intel_iommu_enabled' variable signals when the
iommu code is in operation.
Cc: Ted Phelps <phelps@gnusto.com>
Cc: Peter <pab1612@gmail.com>
Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Cc: Andrew Lutomirski <luto@mit.edu>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
The nl80211 station handling code is a bit messy
and doesn't do a lot of validation. It seems like
this could be an issue for drivers that don't use
mac80211 to validate everything.
As cfg80211 doesn't keep station state, move the
validation of allowing supported_rates to change
for TDLS only in station mode to mac80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
They need to be available for other protocols as well, since
they are used in sock.c openly
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This call-back is invoked when the task that is bound to a
pasid is about to exit. The driver can use it to shutdown
all context related to that context in a safe way.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This function can be used to find out which features
necessary for IOMMUv2 usage are available on a given device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The parent and real_parent pointers should be considered __rcu,
since they should be held under either tasklist_lock or
rcu_read_lock.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/20111214223925.GA27578@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All sysdev classes and sysdev devices will converted to regular devices
and buses to properly hook userspace into the event processing.
There is no interesting difference between a 'sysdev' and 'device' which
would justify to roll an entire own subsystem with different userspace
export semantics. Userspace relies on events and generic sysfs subsystem
infrastructure from sysdev devices, which are currently not properly
available.
Every converted sysdev class will create a regular device with the class
name in /sys/devices/system and all registered devices will becom a children
of theses devices.
For compatibility reasons, the sysdev class-wide attributes are created
at this parent device. (Do not copy that logic for anything new, subsystem-
wide properties belong to the subsystem, not to some fake parent device
created in /sys/devices.)
Every sysdev driver is implemented as a simple subsystem interface now,
and no longer called a driver.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is an initial implementation for the NFC Logical Link Control
protocol. It's also known as NFC peer to peer mode.
This is a basic implementation as it lacks SDP (services Discovery
Protocol), frames aggregation support, and frame rejecion parsing.
Follow up patches will implement those missing features.
This code has been tested against a Nexus S phone implementing LLCP 1.0.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
NFC-DEP (Data Exchange Protocol) is an NFC MAC layer.
This command allows to enable and disable the DEP link on to which e.g.
LLCP can run.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It turns out that some memory allocators use kobjects, which use krefs,
and kref.h was wanting to figure out the address of kfree(), which ended
up in a loop.
kfree was only being needed for a warning to tell the caller that they
were doing something stupid. Now we just move that warning into the
comments for the functions, which results in a bit more fun as everyone
enjoys digging for people to mock at times of boredom.
So, remove the dependancy of slab.h on kref.h, and fix up the other
include file as well (we really only need bug.h and atomic.h, not
types.h).
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The DA9052/53 is a highly integrated PMIC subsystem with supply domain
flexibility to support wide range of high performance application.
It provides voltage regulators, GPIO controller, Touch Screen, RTC, Battery
control and other functionality.
This patch is functionally tested on Samsung SMDKV6410.
Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Use an getter function in plat-orion/addr-map.c to get the address map
structure, rather than pass it to drivers in the platform_data
structures. When the drivers are built for none orion platforms, a
dummy function is provided instead which returns NULL.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
It's simpler to just keep these things out until there is a real user
of them, so we can see what the needs actually are, rather than keep
these things around as useless overhead.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just scratching an itch here, but it makes more sense to use the
static keyword if you think about how the compiler treats inline
functions.
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Alwin Beukers <alwin@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Franky Lin <frankyl@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The BCMA header only had definitions for 32-bit register access. Used
those as a template for the 16-bit flavour. Also changed them to inline
functions to be on the safe side. As offset parameter is used twice there
would be a problem when used like this: bcma_set32(core, offset++, val);
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Alwin Beukers <alwin@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Franky Lin <frankyl@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
1. Added module parameters sr_iov and probe_vf for controlling enablement of
SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
ports configured as desired.
In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
for the case where the FW supports both port types (ETH and IB), the
port_type_array parameter is used.
By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
VFs obtain their startup information from the PF (master) device via the
comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership: Currently, only one PF is supported by the driver.
If the HCA is burned with FW which enables more than one PF, only one
of the PFs is allowed to run. The first one up grabs a FW ownership
semaphone -- all other PFs will find that semaphore taken, and the
driver will not allow them to run.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the previous implementation mtts are managed by:
1. order - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order - log(mtts) rather than segments
2. offset - mtt index in mtt table
Note: The actual mtt allocation is made in segments but it is
transparent to callers.
Rational: The mtt resource holders are not interested on how the allocation
of mtt is done, but rather on how they will use it.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The physical port is now common to the PF and VFs.
The port resources and configuration is managed by the PF, VFs can
only influence the MTU of the port, it is set as max among all functions,
Each function allocates RX buffers of required size to meet it's MTU enforcement.
Port management code was moved to mlx4_core, as the mlx4_en module is
virtualization unaware
Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp
including reserve/release range and add/release unicast steering.
Let mlx4_register/unregister_mac deal only with MAC (un)registration.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SRIOV is enabled on the chip (at FW burning time),
the HCA uses only 17 bits for the PD. The remaining 7 high-order bits
are ignored.
Change the allocator to return only 17 bits for the PD. The MSB 7
bits will be used to encode the slave number for consistency
checking later on in the resource tracker.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For SRIOV, some Hypervisor commands can be executed directly (native = 1).
Others should go through the command wrapper flow (for tracking resource
usage, for example, or for changing some HCA configurations that slaves
need to be notified of).
This patch sets the groundwork for this capability -- adding the correct
value of "native" in each case.
Note that if SRIOV is not activated, this parameter has no effect.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port mask now has additional state.
Port can be set as "none". In this case neither the mlx4_en or mlx4_ib
drivers take ownership of the port.
In multifunction mode there is an option to set the vfs as single ported devices.
(in single function mode, both physical ports belong to same function)
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
These changes will not affect module operation as yet. They
are only to get some structs and enums in place for use by
subsequent patches (making those smaller).
Added here:
* sriov state structs and inlines (mlx4_is_master/slave/mfunc)
* comm-channel and vhcr support structures
* enum values for new FW and comm-channel virtual commands
(i.e., commands, passed via the comm channel to the PF-driver).
* prototypes for many command wrapper functions (used by the
PF context for processing FW commands passed to it by the VFs).
* struct mlx4_eqe is moved from eq.c to mlx4.h (it will be used
by other mlx4_core source files).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1b0b3b9980 ("kref: fix CPU ordering with respect to krefs")
wrongly adds memory barriers to kref.
It states:
some atomic operations are only atomic, not ordered. Thus a CPU is allowed
to reorder memory references to an object to before the reference is
obtained. This fixes it.
While true, it fails to show why this is a problem. I say it is not a
problem because if there is a race with kref_put() such that we could
end up referencing a free'd object without this memory barrier, we
would still have that race with the memory barrier.
The kref_put() in question could complete (and free the object) before
the atomic_inc() and we'd still be up shit creek.
The kref_init() case is even worse, if your object is published at this
time you're so wrong the memory barrier won't make a difference what
so ever. If its not published, the act of publishing should include
the needed barriers/locks to make sure all writes prior to the act of
publishing are complete such that others will only observe a complete
object.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These are tiny functions, there's no point in having them out-of-line.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8eccvi2ur2fzgi00xdjlbf5z@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Exactly like roundup_pow_of_two(1), the rounddown version was buggy for
the case of a compile-time constant '1' argument. Probably because it
originated from the same code, sharing history with the roundup version
from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix
roundup_pow_of_two(1)").
However, unlike the roundup version, the fix for rounddown is to just
remove the broken special case entirely. It's simply not needed - the
generic code
1UL << ilog2(n)
does the right thing for the constant '1' argment too. The only reason
roundup needed that special case was because rounding up does so by
subtracting one from the argument (and then adding one to the result)
causing the obvious problems with "ilog2(0)".
But rounddown doesn't do any of that, since ilog2() naturally truncates
(ie "rounds down") to the right rounded down value. And without the
ilog2(0) case, there's no reason for the special case that had the wrong
value.
tl;dr: rounddown_pow_of_two(1) should be 1, not 0.
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: core: Fix deadlock when the CONFIG_MMC_UNSAFE_RESUME is not defined
mmc: sdhci-s3c: Remove old and misprototyped suspend operations
mmc: tmio: fix clock gating on platforms with a .set_pwr() method
mmc: sh_mmcif: fix clock gating on platforms with a .down_pwr() method
mmc: core: Fix typo at mmc_card_sleep
mmc: core: Fix power_off_notify during suspend
mmc: core: Fix setting power notify state variable for non-eMMC
mmc: core: Add quirk for long data read time
mmc: Add module.h include to sdhci-cns3xxx.c
mmc: mxcmmc: fix falling back to PIO
mmc: omap_hsmmc: DMA unmap only once in case of MMC error
These three methods are no longer used. Kill them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods. This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.
Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset. This unifies passed parameters so
that all methods have access to all information. Conversions in this
patchset are identical and don't introduce any behavior change.
-v2: documentation updated as per Paul Menage's suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup. On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.
This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec. For
exit, threadgroup_change_begin/end() calls are added to exit_signals
around assertion of PF_EXITING. For exec, threadgroup_[un]lock() are
updated to also grab and release cred_guard_mutex.
With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.
The next patch will update cgroup so that it can take full advantage
of this change.
-v2: beefed up comment as suggested by Frederic.
-v3: narrowed scope of protection in exit path as suggested by
Frederic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Make the following renames to prepare for extension of threadgroup
locking.
* s/signal->threadgroup_fork_lock/signal->group_rwsem/
* s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
* s/threadgroup_fork_read_unlock()/threadgroup_change_end()/
* s/threadgroup_fork_write_lock()/threadgroup_lock()/
* s/threadgroup_fork_write_unlock()/threadgroup_unlock()/
This patch doesn't cause any behavior change.
-v2: Rename threadgroup_change_done() to threadgroup_change_end() per
KAMEZAWA's suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
This extension can be used to simulate special link layer
characteristics. Simulate because packet data is not modified, only the
calculation base is changed to delay a packet based on the original
packet size and artificial cell information.
packet_overhead can be used to simulate a link layer header compression
scheme (e.g. set packet_overhead to -20) or with a positive
packet_overhead value an additional MAC header can be simulated. It is
also possible to "replace" the 14 byte Ethernet header with something
else.
cell_size and cell_overhead can be used to simulate link layer schemes,
based on cells, like some TDMA schemes. Another application area are MAC
schemes using a link layer fragmentation with a (small) header each.
Cell size is the maximum amount of data bytes within one cell. Cell
overhead is an additional variable to change the per-cell-overhead
(e.g. 5 byte header per fragment).
Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
cell overhead 5 byte):
tc qdisc add dev eth0 root netem rate 5kbit 20 100 5
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.
To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.
This patch handles the generic part of the code, and has nothing
tcp-specific.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macro HUB_SET_DEPTH is defined twice in ch11.h (introduced by
commit 0eadcc0 "usb: USB3.0 ch11 definitions" and dbe79bb "USB 3.0
Hub Changes"), so remove the duplicate one in the USB 2.0 part.
Signed-off-by: Qinglin Ye <yestyle@gmail.com>
Cc: John Youn <John.Youn@synopsys.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
of_parse_phandle_with_args() needs to return quite a bit of data. Rather
than making each datum a separate **out_ argument, this patch creates
struct of_phandle_args to contain all the returned data and reworks the
user of the function. This patch also enables of_parse_phandle_with_args()
to return the device node pointer for the phandle node.
This patch also ends up being fairly major surgery to
of_parse_handle_with_args(). The existing structure didn't work well
when extending to use of_phandle_args, and I discovered bugs during testing.
I also took the opportunity to rename the function to be like the
existing of_parse_phandle().
v2: - moved declaration of of_phandle_args to fix compile on non-DT builds
- fixed incorrect index in example usage
- fixed incorrect return code handling for empty entries
Reviewed-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds the amd_iommu_init_device() and
amd_iommu_free_device() functions which make a device and
the IOMMU ready for IOMMUv2 usage.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This commit renames the “speed” field of the usb_gadget_driver
structure to “max_speed”. This is so that to make it more
apparent that the field represents the maximum speed gadget
driver can support.
This also make the field look more like fields with the same
name in usb_gadget and usb_composite_driver structures. All
of those represent the *maximal* speed given entity supports.
After this commit, there are the following fields in various
structures:
* usb_gadget::speed - the current connection speed,
* usb_gadget::max_speed - maximal speed UDC supports,
* usb_gadget_driver::max_speed - maximal speed gadget driver
supports, and
* usb_composite_driver::max_speed - maximal speed composite
gadget supports.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This commit replaces usb_gadget's is_dualspeed field with
a max_speed field.
[ balbi@ti.com : Fixed DWC3 driver ]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This driver adds support for Sharp's GP2AP002A00F proximity sensor. The
proximity is measured as a binary switch, i.e. an object is either
detected or not detected. Hence, this driver is implemented as a switch
that reports SW_FRONT_PROXIMITY.
Reviewed-by: Datta Shubhrajyoti <shubhrajyoti@ti.com>
Signed-off-by: Courtney Cavin <courtney.cavin@sonyericsson.com>
Signed-off-by: Oskar Andero <oskar.andero@sonyericsson.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When architectures register CPUs, they indicate whether the CPU allows
hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0.
Userspace can easily query the hotpluggability of a CPU via sysfs;
however, the kernel has no convenient way of accessing that property in
an architecture-independent way. While the kernel can simply try it and
see, some code needs to distinguish between "hotplug failed" and
"hotplug has no hope of working on this CPU"; for example, rcutorture's
CPU hotplug tests want to avoid drowning out real hotplug failures with
expected failures.
Expose this property via a new cpu_is_hotpluggable function, so that the
rest of the kernel can access it in an architecture-independent way.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The intent is that a given RCU read-side critical section be confined
to a single context. For example, it is illegal to invoke rcu_read_lock()
in an exception handler and then invoke rcu_read_unlock() from the
context of the task that received the exception.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.
Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit 908a3283 (Fix idle_cpu()) invalidated some uses of idle_cpu(),
which used to say whether or not the CPU was running the idle task,
but now instead says whether or not the CPU is running the idle task
in the absence of pending wakeups. Although this new implementation
gives a better answer to the question "is this CPU idle?", it also
invalidates other uses that were made of idle_cpu().
This commit therefore introduces a new is_idle_task() API member
that determines whether or not the specified task is one of the
idle tasks, allowing open-coded "->pid == 0" sequences to be replaced
by something more meaningful.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The RCU implementations, including SRCU, are designed to be used in a
lock-like fashion, so that the read-side lock and unlock primitives must
execute in the same context for any given read-side critical section.
This constraint is enforced by lockdep-RCU. However, there is a need
to enter an SRCU read-side critical section within the context of an
exception and then exit in the context of the task that encountered the
exception. The cost of this capability is that the read-side operations
incur the overhead of disabling interrupts.
Note that although the current implementation allows a given read-side
critical section to be entered by one task and then exited by another, all
known possible implementations that allow this have scalability problems.
Therefore, a given read-side critical section must be exited by the same
task that entered it, though perhaps from an interrupt or exception
handler running within that task's context. But if you are thinking
in terms of interrupt handlers, make sure that you have considered the
possibility of threaded interrupt handlers.
Credit goes to Peter Zijlstra for suggesting use of the existing _raw
suffix to indicate disabling lockdep over the earlier "bulkref" names.
Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>