This patch reverts commit
80b45261a0
which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.
This implementation was not used anywhere and therefore removed.
Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds new firmware command and new firmware event. The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.
Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP checksums are optional, hence pktgen has been omitting them in
favour of performance. The optional flag UDPCSUM enables UDP
checksumming. If the output device supports hardware checksumming
the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
checksum is generated in software.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful for other VSOCK transport implemented outside the
net/vmw_vsock/ directory to use these headers.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch centralizes computing of max rx/tx qlen, because:
- RX_QLEN()/TX_QLEN() is called in hot path
- computing depends on device's usb speed, now we have ls/fs, hs, ss,
so more checks need to be involved
- in fact, max rx/tx qlen should not only depend on device USB
speed, but also depend on ethernet link speed, so we need to
consider that in future.
- if SG support is done, max tx qlen may need change too
Generally, hard_mtu and rx_urb_size are changed in bind(), reset()
and link_reset() callback, and change mtu network operation, this
patches introduces the API of usbnet_update_max_qlen(), and calls
it in above path.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
No users outside net/core/dev.c.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.
TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :
Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)
For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())
This patch adds two ways to set the limit :
1) Per socket option TCP_NOTSENT_LOWAT
2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat
Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.
Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)
Tested:
netperf sessions, and watching /proc/net/protocols "memory" column for TCP
With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.
A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
412,514 context-switches
200.034645535 seconds time elapsed
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
2,675,818 context-switches
200.029651391 seconds time elapsed
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several call sites use the hardcoded following condition :
sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to what is implemented in bonding. User is able to ask team
driver to send IGMP rejoins in case port is enabled or disabled. Using
previously introduced netdev notifier.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, bond_resend_igmp_join_requests() looks for vlans attached to
bonding device, bridge where bonding act as port manually. It does not
care of other scenarios, like stacked bonds or team device above. Make
this more generic and use netdev notifier to propagate the event to
upper devices and to actually call ip_mc_rejoin_groups().
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When port is enabled or disabled, allow to notify peers by unsolicitated
NAs or gratuitous ARPs. Disabled by default.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prefer packet timings to TS-ecr for RTT measurements when both
sources are available. That's because broken middle-boxes and remote
peer can return packets with corrupted TS ECR fields. Similarly most
congestion controls that require RTT signals favor timing-based
sources as well. Also check for bad TS ECR values to avoid RTT
blow-ups. It has happened on production Web servers.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first patch consolidates SYNACK and other RTT measurement to use a
central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
RTT measurement happens after PAWS check, potentially reducing the
impact of RTO seeding on bad TCP timestamps values.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds transmit time stamping to the tun/tap driver. Similar
support already exists for UDP, can, and raw packets.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the private error queue delivery function from the
af_packet code to the core socket method. In this way, network layers
only needing the error queue for transmit time stamping can share common
code.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
bits here and there:
1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
Tim Gardner.
2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
address printing helper.
3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
device can't do checksumming offloads then it shouldn't say it can
do SG either. From Haiyang Zhang.
4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.
5) Don't leak DMA mappings on mapping failures, from Neil Horman.
6) We need to reset the transport header of SKBs in ipv4 before we
attempt to perform early socket demux, just like ipv6 does. From
Eric Dumazet.
7) Add missing locking on vxlan device removal, from Stephen
Hemminger.
8) xen-netfront has to make two passes over an SKB to prepare it for
transfer. One pass calculates the number of slots needed, the
second massages the SKB and fills the slots. Unfortunately, the
first pass doesn't calculate the number of slots properly so we
can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
work out so well. Fix from Jan Beulich with help and discussion
with several others.
9) Fix a similar problem in tun and macvtap, which have to split up
scatter-gather elements at PAGE_SIZE boundaries. Don't do
zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
Jason Wang.
10) On receive, once we've decoded the VLAN state completely, clear
skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
the VLAN code again, corrupting the packet. Fix from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a race in egress prio management
vlan: mask vlan prio bits
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
pkt_sched: sch_qfq: remove a source of high packet delay/jitter
xen-netfront: pull on receive skb may need to happen earlier
vxlan: add necessary locking on device removal
hyperv: Fix the NETIF_F_SG flag setting in netvsc
net: Fix sysfs_format_mac() code duplication.
be2net: Fix to avoid hardware workaround when not needed
macvtap: do not assume 802.1Q when send vlan packets
macvtap: fix the missing ret value of TUNSETQUEUE
ipv4: set transport header earlier
mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
bgmac: add dependency to phylib
net/irda: fixed style issues in irlan_eth
ethtool: fixed trailing statements in ethtool
ndisc: bool initializations should use true and false
atl1e: unmap partially mapped skb on dma error and free skb
In commit 48cc32d38a
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.
But we also have a problem if prio bits are set.
Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.
Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :
16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
It seems we are not really ready to properly cope with this right now.
We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.
For stable kernels, lets clear vlan_tci to fix the bugs.
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some driver core patches for 3.11-rc2. They aren't really
bugfixes, but a bunch of new helper macros for drivers to properly
create attribute groups, which drivers and subsystems need to fix up a
ton of race issues with incorrectly creating sysfs files (binary and
normal) after userspace has been told that the device is present.
Also here is the ability to create binary files as attribute groups, to
solve that race condition, which was impossible to do before this, so
that's my fault the drivers were broken.
The majority of the .c changes is indenting and moving code around a
bit. It affects no existing code, but allows the large backlog of 70+
patches that I already have created to start flowing into the different
subtrees, instead of having to live in my driver-core tree, causing
merge nightmares in linux-next for the next few months.
These were finalized too late for the -rc1 merge window, which is why
they were didn't make that pull request, testing and review from others
didn't happen until a few weeks ago, and then there's the whole
distraction of the past few days, which prevented these from getting to
you sooner, sorry about that.
Oh, and there's a bugfix for the documentation build warning in here as
well. All of these have been in linux-next this week, with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iEYEABECAAYFAlHoRUUACgkQMUfUDdst+ymkNACdHAjEXZZmXohDuCb2SqyMeQsz
AZcAn3qqJa/NoPEgTCgOkDlAQZM6BnC5
=+Gqk
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
"Here are some driver core patches for 3.11-rc2. They aren't really
bugfixes, but a bunch of new helper macros for drivers to properly
create attribute groups, which drivers and subsystems need to fix up a
ton of race issues with incorrectly creating sysfs files (binary and
normal) after userspace has been told that the device is present.
Also here is the ability to create binary files as attribute groups,
to solve that race condition, which was impossible to do before this,
so that's my fault the drivers were broken.
The majority of the .c changes is indenting and moving code around a
bit. It affects no existing code, but allows the large backlog of 70+
patches that I already have created to start flowing into the
different subtrees, instead of having to live in my driver-core tree,
causing merge nightmares in linux-next for the next few months.
These were finalized too late for the -rc1 merge window, which is why
they were didn't make that pull request, testing and review from
others didn't happen until a few weeks ago, and then there's the whole
distraction of the past few days, which prevented these from getting
to you sooner, sorry about that.
Oh, and there's a bugfix for the documentation build warning in here
as well. All of these have been in linux-next this week, with no
reported problems"
* tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver-core: fix new kernel-doc warning in base/platform.c
sysfs: use file mode defines from stat.h
sysfs: add more helper macro's for (bin_)attribute(_groups)
driver core: add default groups to struct class
driver core: Introduce device_create_groups
sysfs: prevent warning when only using binary attributes
sysfs: add support for binary attributes in groups
driver core: device.h: add RW and RO attribute macros
sysfs.h: add BIN_ATTR macro
sysfs.h: add ATTRIBUTE_GROUPS() macro
sysfs.h: add __ATTR_RW() macro
Pull phase two of __cpuinit removal from Paul Gortmaker:
"With the __cpuinit infrastructure removed earlier, this group of
commits only removes the function/data tagging that was done with the
various (now no-op) __cpuinit related prefixes.
Now that the dust has settled with yesterday's v3.11-rc1, there
hopefully shouldn't be any new users leaking back in tree, but I think
we can leave the harmless no-op stubs there for a release as a
courtesy to those who still have out of tree stuff and weren't paying
attention.
Although the commits are against the recent tag to allow for minor
context refreshes for things like yesterday's v3.11-rc1~ slab content,
the patches have been largely unchanged for weeks, aside from such
trivial updates.
For detail junkies, the largely boring and mostly irrelevant history
of the patches can be viewed at:
http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git
If nothing else, I guess it does at least demonstrate the level of
involvement required to shepherd such a treewide change to completion.
This is the same repository of patches that has been applied to the
end of the daily linux-next branches for the past several weeks"
* 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
block: delete __cpuinit usage from all block files
drivers: delete __cpuinit usage from all remaining drivers files
kernel: delete __cpuinit usage from all core kernel files
rcu: delete __cpuinit usage from all rcu files
net: delete __cpuinit usage from all net files
acpi: delete __cpuinit usage from all acpi files
hwmon: delete __cpuinit usage from all hwmon files
cpufreq: delete __cpuinit usage from all cpufreq files
clocksource+irqchip: delete __cpuinit usage from all related files
x86: delete __cpuinit usage from all x86 files
score: delete __cpuinit usage from all score files
xtensa: delete __cpuinit usage from all xtensa files
openrisc: delete __cpuinit usage from all openrisc files
m32r: delete __cpuinit usage from all m32r files
hexagon: delete __cpuinit usage from all hexagon files
frv: delete __cpuinit usage from all frv files
cris: delete __cpuinit usage from all cris files
metag: delete __cpuinit usage from all metag files
tile: delete __cpuinit usage from all tile files
sh: delete __cpuinit usage from all sh files
...
__list_for_each used to be the non prefetch() aware list walking
primitive. When we removed the prefetch macros from the list routines,
it became redundant. Given it does exactly the same thing as
list_for_each now, we might as well remove it and call list_for_each
directly.
All users of __list_for_each have been converted to list_for_each calls
in the current merge window.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the last patches stat.h was included to the header, and thus those
permission defines should be used.
Signed-off-by: Oliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the recent changes to sysfs there's various helper macro's.
However there's no RW, RO BIN_ helper macro's. This patch adds them.
Signed-off-by: Oliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should be using groups, not attribute lists, for classes to allow
subdirectories, and soon, binary files. Groups are just more flexible
overall, so add them.
The dev_attrs list will go away after all in-kernel users are converted
to use dev_groups.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
device_create_groups lets callers create devices as well as associated
sysfs attributes with a single call. This avoids race conditions seen
if sysfs attributes on new devices are created later.
[fixed up comment block placement and add checks for printk buffer
formats - gregkh]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
groups should be able to support binary attributes, just like it
supports "normal" attributes. This lets us only handle one type of
structure, groups, throughout the driver core and subsystems, making
binary attributes a "full fledged" part of the driver model, and not
something just "tacked on".
Reported-by: Oliver Schinagl <oliver@schinagl.nl>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make it easier to create attributes without having to always audit the
mode settings.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This makes it easier to create static binary attributes, which is needed
in a number of drivers, instead of "open coding" them.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To make it easier for driver subsystems to work with attribute groups,
create the ATTRIBUTE_GROUPS macro to remove some of the repetitive
typing for the most common use for attribute groups.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A number of parts of the kernel created their own version of this, might
as well have the sysfs core provide it instead.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.
[1] https://lkml.org/lkml/2013/5/20/589
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Pull slab update from Pekka Enberg:
"Highlights:
- Fix for boot-time problems on some architectures due to
init_lock_keys() not respecting kmalloc_caches boundaries
(Christoph Lameter)
- CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)
- Fix for excessive slab freelist draining (Wanpeng Li)
- SLUB and SLOB cleanups and fixes (various people)"
I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
slub: Check for page NULL before doing the node_match check
mm/slab: Give s_next and s_stop slab-specific names
slob: Check for NULL pointer before calling ctor()
slub: Make cpu partial slab support configurable
slab: add kmalloc() to kernel API documentation
slab: fix init_lock_keys
slob: use DIV_ROUND_UP where possible
slub: do not put a slab to cpu partial list when cpu_partial is 0
mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
mm/slub: Drop unnecessary nr_partials
mm/slab: Fix /proc/slabinfo unwriteable for slab
mm/slab: Sharing s_next and s_stop between slab and slub
mm/slab: Fix drain freelist excessively
slob: Rework #ifdeffery in slab.h
mm, slab: moved kmem_cache_alloc_node comment to correct place
Pull more vfs stuff from Al Viro:
"O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
making simple_lookup() usable for filesystems with non-NULL s_d_op,
which allows us to get rid of quite a bit of ugliness"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sunrpc: now we can just set ->s_d_op
cgroup: we can use simple_lookup() now
efivarfs: we can use simple_lookup() now
make simple_lookup() usable for filesystems that set ->s_d_op
configfs: don't open-code d_alloc_name()
__rpc_lookup_create_exclusive: pass string instead of qstr
rpc_create_*_dir: don't bother with qstr
llist: llist_add() can use llist_add_batch()
llist: fix/simplify llist_add() and llist_add_batch()
fput: turn "list_head delayed_fput_list" into llist_head
fs/file_table.c:fput(): add comment
Safer ABI for O_TMPFILE
Pull networking fixes from David Miller:
"Just a bunch of small fixes and tidy ups:
1) Finish the "busy_poll" renames, from Eliezer Tamir.
2) Fix RCU stalls in IFB driver, from Ding Tianhong.
3) Linearize buffers properly in tun/macvtap zerocopy code.
4) Don't crash on rmmod in vxlan, from Pravin B Shelar.
5) Spinlock used before init in alx driver, from Maarten Lankhorst.
6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
Kravkov.
7) Dummy and ifb driver load failure paths can oops, fixes from Tan
Xiaojun and Ding Tianhong.
8) Correct MTU calculations in IP tunnels, from Alexander Duyck.
9) Account all TCP retransmits in SNMP stats properly, from Yuchung
Cheng.
10) atl1e and via-rhine do not handle DMA mapping failures properly,
from Neil Horman.
11) Various equal-cost multipath route fixes in ipv6 from Hannes
Frederic Sowa"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
ipv6: only static routes qualify for equal cost multipathing
via-rhine: fix dma mapping errors
atl1e: fix dma mapping warnings
tcp: account all retransmit failures
usb/net/r815x: fix cast to restricted __le32
usb/net/r8152: fix integer overflow in expression
net: access page->private by using page_private
net: strict_strtoul is obsolete, use kstrtoul instead
drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
net/usb: add relative mii functions for r815x
net/tipc: use %*phC to dump small buffers in hex form
qlcnic: Adding Maintainers.
gre: Fix MTU sizing check for gretap tunnels
pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
pkt_sched: sch_qfq: improve efficiency of make_eligible
gso: Update tunnel segmentation to support Tx checksum offload
inet: fix spacing in assignment
ifb: fix oops when loading the ifb failed
...
Pull timer updates from Thomas Gleixner:
- watchdog fixes for full dynticks
- improved debug output for full dynticks
- remove an obsolete full dynticks check
- two ARM SoC clocksource drivers for sharing across SoCs
- tick broadcast fix for CPU hotplug
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: broadcast: Check broadcast mode on CPU hotplug
clocksource: arm_global_timer: Add ARM global timer support
clocksource: Add Marvell Orion SoC timer
nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs
watchdog: Boot-disable by default on full dynticks
watchdog: Rename confusing state variable
watchdog: Register / unregister watchdog kthreads on sysctl control
nohz: Warn if the machine can not perform nohz_full
Pull core locking updates from Thomas Gleixner:
"Header cleanup as requested by Linus"
(This is the "don't include support for ww_mutex in a header file that
everybody wants, when almost nobody wants the ww part" change)
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Move ww_mutex definitions to ww_mutex.h
Pull MIPS updates from Ralf Baechle:
"MIPS updates:
- All the things that didn't make 3.10.
- Removes the Windriver PPMC platform. Nobody will miss it.
- Remove a workaround from kernel/irq/irqdomain.c which was there
exclusivly for MIPS. Patch by Grant Likely.
- More small improvments for the SEAD 3 platform
- Improvments on the BMIPS / SMP support for the BCM63xx series.
- Various cleanups of dead leftovers.
- Platform support for the Cavium Octeon-based EdgeRouter Lite.
Two large KVM patchsets didn't make it for this pull request because
their respective authors are vacationing"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
MIPS: SEAD3: Disable L2 cache on SEAD-3.
MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
MIPS: BCM63xx: Add SMP support to prom.c
MIPS: define write{b,w,l,q}_relaxed
MIPS: Expose missing pci_io{map,unmap} declarations
MIPS: Malta: Update GCMP detection.
Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
MIPS: APSP: Remove <asm/kspd.h>
SSB: Kconfig: Amend SSB_EMBEDDED dependencies
MIPS: microMIPS: Fix improper definition of ISA exception bit.
MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
MIPS: Declare emulate_load_store_microMIPS as a static function.
MIPS: Fix typos and cleanup comment
MIPS: Cleanup indentation and whitespace
MIPS: BMIPS: support booting from physical CPU other than 0
MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
MIPS: GIC: Fix gic_set_affinity infinite loop
MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
...
- AF_IB (native IB addressing) for CMA from Sean Hefty
- New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- Resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- Other small changes and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
GBYjHWMUpZXIFHQaH7am
=nZh8
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- AF_IB (native IB addressing) for CMA from Sean Hefty
- new mlx5 driver for Mellanox Connect-IB adapters (including post
merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- other small changes and fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
mlx5: Return -EFAULT instead of -EPERM
IB/qib: Log all SDMA errors unconditionally
IB/qib: Fix module-level leak
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
mlx5_core: Fixes for sparse warnings
IB/mlx5: Make profile[] static in main.c
mlx5: Fix parameter type of health_handler_t
mlx5: Add driver for Mellanox Connect-IB adapters
IB/core: Add reserved values to enums for low-level driver use
IB/srp: Bump driver version and release date
IB/srp: Make HCA completion vector configurable
IB/srp: Maintain a single connection per I_T nexus
IB/srp: Fail I/O fast if target offline
IB/srp: Skip host settle delay
IB/srp: Avoid skipping srp_reset_host() after a transport error
IB/srp: Fix remove_one crash due to resource exhaustion
IB/qib: New transmitter tunning settings for Dell 1.1 backplane
IB/core: Fix error return code in add_port()
...
Pull media updates from Mauro Carvalho Chehab:
"This series contain:
- new i2c video drivers: ml86v7667 (video decoder),
ths8200 (video encoder)
- a new video driver for EasyCap cards based on Fushicai USBTV007
- Improved support for OF and embedded systems, with V4L2 async
initialization and a better support for clocks
- API cleanups on the ioctls used by the v4l2 debug tool
- Lots of cleanups
- As usual, several driver improvements and new cards additions
- Revert two changesets that change the minimal symbol rate for
stv0399, as request by Manu
- Update MAINTAINERS and other files to point to my new e-mail"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (378 commits)
MAINTAINERS & ABI: Update to point to my new email
[media] stb0899: restore minimal rate to 5Mbauds
[media] exynos4-is: Correct colorspace handling at FIMC-LITE
[media] exynos4-is: Set valid initial format on FIMC.n subdevs
[media] exynos4-is: Set valid initial format on FIMC-IS-ISP subdev pads
[media] exynos4-is: Fix format propagation on FIMC-IS-ISP subdev
[media] exynos4-is: Set valid initial format at FIMC-LITE
[media] exynos4-is: Fix format propagation on FIMC-LITE.n subdevs
[media] MAINTAINERS: Update S5P/Exynos FIMC driver entry
[media] Documentation: Update driver's directory in video4linux/fimc.txt
[media] exynos4-is: Change fimc-is firmware file names
[media] exynos4-is: Add support for Exynos5250 MIPI-CSIS
[media] exynos4-is: Add Exynos5250 SoC support to fimc-lite driver
[media] exynos4-is: Drop drvdata handling in fimc-lite for non-dt platforms
[media] media: i2c: tvp514x: remove manual setting of subdev name
[media] media: i2c: tvp7002: remove manual setting of subdev name
[media] mem2mem: set missing v4l2_dev pointer
[media] wl128x: add missing struct v4l2_device
[media] tvp514x: Fix init seqeunce
[media] saa7134: Fix sparse warnings by adding __user annotation
...
llist_add(new, head) can simply use llist_add_batch(new, new, head),
no need to duplicate the code.
This obviously uninlines llist_add() and to me this is a win. But we
can make llist_add_batch() inline if this is desirable, in this case
gcc can notice that new_first == new_last if the caller is llist_add().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1. This is mostly theoretical, but llist_add*() need ACCESS_ONCE().
Otherwise it is not guaranteed that the first cmpxchg() uses the
same value for old_entry and new_last->next.
2. These helpers cache the result of cmpxchg() and read the initial
value of head->first before the main loop. I do not think this
makes sense. In the likely case cmpxchg() succeeds, otherwise
it doesn't hurt to reload head->first.
I think it would be better to simplify the code and simply read
->first before cmpxchg().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fput() and delayed_fput() can use llist and avoid the locking.
This is unlikely path, it is not that this change can improve
the performance, but this way the code looks simpler.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
that will fail on old kernels in a lot more cases than what I came up with.
And make sure O_CREAT doesn't get there...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the definitions for wound/wait mutexes out to a separate
header, ww_mutex.h. This reduces clutter in mutex.h, and
increases readability.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Dave Airlie <airlied@gmail.com>
Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com
[ Tidied up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sparse reported an endianness bug in the assignment to hca_cap.uar_page_sz.
Fix the declaration of this field to be __be16 (which is what is in
the firmware spec), renaming the field to log_uar_pg_size to conform
to the spec, which fixes the endianness bug reported by sparse.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>