Add a generic implementation of the old mmap() syscall, which expects its
argument in a memory block and switch all architectures over to use it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.
Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.
This takes into account comments made by:
. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.
. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.
If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.
. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.
This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for TIF_RESTORE_SIGMASK to ARM's
signal handling, which allows to hook up the pselect6, ppoll,
and epoll_pwait syscalls on ARM.
Tested here with eabi userspace and a test program with a
deliberate race between a child's exit and the parent's
sigprocmask/select sequence. Using sys_pselect6() instead
of sigprocmask/select reliably prevents the race.
The other arch's support for TIF_RESTORE_SIGMASK has evolved
over time:
In 2.6.16:
- add TIF_RESTORE_SIGMASK which parallels TIF_SIGPENDING
- test both when checking for pending signal [changed later]
- reimplement sys_sigsuspend() to use current->saved_sigmask,
TIF_RESTORE_SIGMASK [changed later], and -ERESTARTNOHAND;
ditto for sys_rt_sigsuspend(), but drop private code and
use common code via __ARCH_WANT_SYS_RT_SIGSUSPEND;
- there are now no "extra" calls to do_signal() so its oldset
parameter is always ¤t->blocked so need not be passed,
also its return value is changed to void
- change handle_signal() to return 0/-errno
- change do_signal() to honor TIF_RESTORE_SIGMASK:
+ get oldset from current->saved_sigmask if TIF_RESTORE_SIGMASK
is set
+ if handle_signal() was successful then clear TIF_RESTORE_SIGMASK
+ if no signal was delivered and TIF_RESTORE_SIGMASK is set then
clear it and restore the sigmask
- hook up sys_pselect6() and sys_ppoll()
In 2.6.19:
- hook up sys_epoll_pwait()
In 2.6.26:
- allow archs to override how TIF_RESTORE_SIGMASK is implemented;
default set_restore_sigmask() sets both TIF_RESTORE_SIGMASK and
TIF_SIGPENDING; archs need now just test TIF_SIGPENDING again
when checking for pending signal work; some archs now implement
TIF_RESTORE_SIGMASK as a secondary/non-atomic thread flag bit
- call set_restore_sigmask() in sys_sigsuspend() instead of setting
TIF_RESTORE_SIGMASK
In 2.6.29-rc:
- kill sys_pselect7() which no arch wanted
So for 2.6.31-rc6/ARM this patch does the following:
- Add TIF_RESTORE_SIGMASK. Use the generic set_restore_sigmask()
which sets both TIF_SIGPENDING and TIF_RESTORE_SIGMASK, so
TIF_RESTORE_SIGMASK need not claim one of the scarce low thread
flags, and existing TIF_SIGPENDING and _TIF_WORK_MASK tests need
not be extended for TIF_RESTORE_SIGMASK.
- sys_sigsuspend() is reimplemented to use current->saved_sigmask
and set_restore_sigmask(), making it identical to most other archs
- The private code for sys_rt_sigsuspend() is removed, instead
generic code supplies it via __ARCH_WANT_SYS_RT_SIGSUSPEND.
- sys_sigsuspend() and sys_rt_sigsuspend() no longer need a pt_regs
parameter, so their assembly code wrappers are removed.
- handle_signal() is changed to return 0 on success or -errno.
- The oldset parameter to do_signal() is now redundant and removed,
and the return value is now also redundant and changed to void.
- do_signal() is changed to honor TIF_RESTORE_SIGMASK:
+ get oldset from current->saved_sigmask if TIF_RESTORE_SIGMASK
is set
+ if handle_signal() was successful then clear TIF_RESTORE_SIGMASK
+ if no signal was delivered and TIF_RESTORE_SIGMASK is set then
clear it and restore the sigmask
- Hook up sys_pselect6, sys_ppoll, and sys_epoll_pwait.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Kernel 2.6.30-rc1 added sys_preadv and sys_pwritev to most archs
but not ARM, resulting in
<stdin>:1421:2: warning: #warning syscall preadv not implemented
<stdin>:1425:2: warning: #warning syscall pwritev not implemented
This patch adds sys_preadv and sys_pwritev to ARM.
These syscalls simply take five long-sized parameters, so they
should have no calling-convention/ABI issues in the kernel.
Tested on armv5tel eabi using a preadv/pwritev test program posted
on linuxppc-dev earlier this month.
It would be nice to get this into the kernel before 2.6.30 final,
so that glibc's kernel version feature test for these syscalls
doesn't have to special-case ARM.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Setup some missing syscall pointed out by the checksyscalls.sh script. Fix two
small whitespace issues while being there.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ccoreutils and other have started using fstatat64. Thus, we
need a shim for it if we want to support modern oldabi
userlands (such as Debian/arm/lenny) with EABI kernels.
See http://bugs.debian.org/462677
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Riku Voipio <riku.voipio@movial.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not all the world is an i386. Many architectures need 64-bit arguments to be
aligned in suitable pairs of registers, and the original
sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an
argument register for padding after the first integer. Since we don't
normally have more than 6 arguments for system calls, that left no room for
the final argument on some architectures.
Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which
all fits nicely. In fact, ARM already had that, but called it
sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement
the needed compatibility routine. And stop the missing syscall check from
bitching about the absence of sys_sync_file_range() if we've implemented
sys_sync_file_range2() instead.
Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add utimensat, signalfd, timerfd, eventfd syscalls. Add ignore
defines for sync_file_range and fadvise64_64 which we implement
differently.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add kexec support to ARM.
Improvements like commandline handling could be made but this patch gives
basic functional support. It uses the next available syscall number, 347.
Once the syscall number is known, userspace support will be
finalised/submitted to kexec-tools, various patches already exist.
Originally based on a patch by Maxim Syrchin but updated and forward
ported by various people.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add:
sys_unshare
sys_set_robust_list
sys_get_robust_list
sys_splice
sys_arm_sync_file_range
sys_tee
sys_vmsplice
sys_move_pages
sys_getcpu
Special note about sys_arm_sync_file_range(), which is implemented as:
asmlinkage long sys_arm_sync_file_range(int fd, unsigned int flags,
loff_t offset, loff_t nbytes)
{
return sys_sync_file_range(fd, offset, nbytes, flags);
}
We can't export sys_sync_file_range() directly on ARM because the
argument list someone picked does not fit in the available registers.
Would be nice if... there was an arch maintainer review mechanism for
new syscalls before they hit the kernel.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
Commit 99595d0237 forgot to intercept
sys_socketcall as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
struct sockaddr_un loses its padding with EABI. Since the size of the
structure is used as a validation test in unix_mkname(), we need to
change the length argument to 110 whenever it is 112.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM entry-common.S needs to know syscall table size; in itself that would
not be a problem, but there's an additional constraint - some of the
instructions using it want a constant that would be a multiple of 4.
So we have to pad syscall table with sys_ni_syscall and that's where
the trouble begins. .rept pseudo-op wants a constant expression for
number of repetitions and subtraction of two labels (before and after
syscall table) doesn't always get simplified to constant early enough
for .rept. If labels end up in different frags, we lose. And while
the frag size is large enough (slightly below 4Kb), the syscall table
is about 1/3 of that. We used to get away with that, but the recent
changes had been enough to trigger the breakage.
Proper fix is simple: have a macro (CALL(x)) to populate the table
instead of using explicit .long x and the first time we include calls.S
have it defined to .equ NR_syscalls,NR_syscalls+1. Then we can find
the proper amount of padding on the first inclusion simply by looking
at NR_syscalls at that time. And that will be constant, no matter what.
Moreover, the same trick kills the need of having an estimate of padded
NR_syscalls - it will be calculated for free at the same time.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
This patch adds the required code to support both user space ABIs at
the same time. A second syscall table is created to include legacy ABI
syscalls that need an ABI compat wrapper.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
struct statfs64 has extra padding with EABI growing its size from 84 to
88. This struct is now __attribute__((packed,aligned(4))) with a small
assembly wrapper to force the sz argument to 84 if it is 88 to avoid
copying the extra padding over user space memory unexpecting it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rather than providing more wrappers for 6-arg syscalls, arrange for
them to be supported as standard. This just means that we always
store the 6th argument on the stack, rather than in the wrappers.
This means we eliminate the wrappers for:
* sys_futex
* sys_arm_fadvise64_64
* sys_mbind
* sys_ipc
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from George G. Davis
As pointed out be Matthew Klahn <MKLAHN@motorola.com>, some sys_ipc()
call options require six args, e.g. SEMTIMEDOP. This patch adds an ARM sys_ipc_wrapper to save the sys_ipc() 'fifth' arg on the stack.
Signed-off-by: George G. Davis <gdavis@mvista.com>
arch/arm/kernel/calls.S | 2 +-
arch/arm/kernel/entry-common.S | 5 +++++
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
The prototype for sys_fadvise64_64() is:
long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
The argument list is therefore as follows on legacy ABI:
fd: type int (r0)
offset: type long long (r1-r2)
len: type long long (r3-sp[0])
advice: type int (sp[4])
With EABI this becomes:
fd: type int (r0)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
advice: type int (sp[8])
Not only do we have ABI differences here, but the EABI version requires
one additional word on the syscall stack.
To avoid the ABI mismatch and the extra stack space required with EABI
this syscall is now defined with a different argument ordering
on ARM as follows:
long sys_arm_fadvise64_64(int fd, int advice, loff_t offset, loff_t len)
This gives us the following ABI independent argument distribution:
fd: type int (r0)
advice: type int (r1)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
Now, since the syscall entry code takes care of 5 registers only by
default including the store of r4 to the stack, we need a wrapper to
store r5 to the stack as well. Because that wrapper was missing and was
always required this means that sys_fadvise64_64 never worked on ARM and
therefore we can safely reuse its syscall number for our new
sys_arm_fadvise64_64 interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!