Hardware manufacturers group keys in the weirdest way possible. This may
cause a power-key to be grouped together with normal keyboard keys and
thus be reported on the same kernel interface.
However, user-space is often only interested in specific sets of events.
For instance, daemons dealing with system-reboot (like systemd-logind)
listen for KEY_POWER, but are not interested in any main keyboard keys.
Usually, power keys are reported via separate interfaces, however,
some i8042 boards report it in the AT matrix. To avoid waking up those
system daemons on each key-press, we had two ideas:
- split off KEY_POWER into a separate interface unconditionally
- allow filtering a specific set of events on evdev FDs
Splitting of KEY_POWER is a rather weird way to deal with this and may
break backwards-compatibility. It is also specific to KEY_POWER and might
be required for other stuff, too. Moreover, we might end up with a huge
set of input-devices just to have them properly split.
Hence, this patchset implements the second idea: An event-mask to specify
which events you're interested in. Two ioctls allow setting this mask for
each event-type. If not set, all events are reported. The type==0 entry is
used same as in EVIOCGBIT to set the actual EV_* mask of filtered events.
This way, you have a two-level filter.
We are heavily forward-compatible to new event-types and event-codes. So
new user-space will be able to run on an old kernel which doesn't know the
given event-codes or event-types.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices. close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV. logind was overreacting to it and decided to kill
itself when an unexpected error code was received. What a tragedy.
The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().
Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.
Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
There is no point in queueing EV_SYN/SYN_DROPPED on clock type change when
there are no events in the client's queue and doing so confuses tests in
libinput package, so let's not do that.
Reported-and-tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUsuDQAAoJEHm+PkMAQRiGnecH/0RO9UKnEduOTRPaZGXAjGI8
N0FvNia8qn7f+XnvN62pG/YZZqi2uvuy37vwAXMtS6KXEgaDG9Wq4fVrhOaJ5VgL
QOmPdVGUa+1PuPcMYj/QLIFRfIHvIY/XVZWXrcIyYfQdBAAoJ2q23qx/yFmdyTwf
+enAv+PV4ZVNMEANyN9KS7xX5gPbSDl36AOhm6lXDvrlem4mbnhRuUtYez9R8KTK
VNfkKZQRDOgl4/ns0ndzpAUhaDj1JJGoLRgMXKna33XgtzSEL4XijvImdnoIXp5N
Z98Jc1N5Vg5OcUFeGJC3bRR27m39xoOHQk2ufY43uAIfB3Ez/C7m/r7b50ZVWfs=
=J7TO
-----END PGP SIGNATURE-----
Merge tag 'v3.19-rc4' into next
Merge with mainline to bring in the latest thermal and other changes.
When client changes the type of clock used for the time stamps in input
events flush pending events from the client's queue (since client would not
know which events have old time stamps and which ones have new ones) and
and queue SYN_DROPPED event.
Signed-off-by: Anshul Garg <anshul.g@samsung.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull input layer fixes from Dmitry Torokhov:
"Fixes for v7 protocol for ALPS devices and few other driver fixes.
Also users can request input events to be stamped with boot time
timestamps, in addition to real and monotonic timestamps"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: hil_kbd - fix incorrect use of init_completion
Input: alps - v7: document the v7 touchpad packet protocol
Input: alps - v7: fix finger counting for > 2 fingers on clickpads
Input: alps - v7: sometimes a single touch is reported in mt[1]
Input: alps - v7: ignore new packets
Input: evdev - add CLOCK_BOOTTIME support
Input: psmouse - expose drift duration for IBM trackpoints
Input: stmpe - bias keypad columns properly
Input: stmpe - enforce device tree only mode
mfd: stmpe: add pull up/down register offsets for STMPE
Input: optimize events_per_packet count calculation
Input: edt-ft5x06 - fixed a macro coding style issue
Input: gpio_keys - replace timer and workqueue with delayed workqueue
Input: gpio_keys - allow separating gpio and irq in device tree
This patch adds support for CLOCK_BOOTTIME for input event timestamp.
CLOCK_BOOTTIME includes suspend time, so it would allow aplications
to get correct time difference between two events even when system
resumes from suspend state.
Signed-off-by: Aniroop Mathur <a.mathur@samsung.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
If kzalloc() failed and then evdev_open_device() fails, evdev_open()
will pass a vmalloc'ed pointer to kfree.
This might fix https://bugzilla.kernel.org/show_bug.cgi?id=88401, where
there was a crash in kfree().
Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Belatedly-Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Henrik Rydberg <rydberg@euromail.se>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'max' size passed into the function is measured in number of bits
(KEY_MAX, LED_MAX, etc) so we need to convert it accordingly before trying
to copy the data out, otherwise we will try copying too much and end up
with up with a page fault.
Reported-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Convert the monotonic timestamp with ktime_mono_to_real() in
evdev_events().
In evdev_queue_syn_dropped() we can call either ktime_get() or
ktime_get_real() depending on the clkid. No point in having two calls
for CLOCK_REALTIME.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTbTZoAAoJEHm+PkMAQRiGLssH/3Rv6e9rAkEw9Ey0ymwGsCvf
6DEAcSgaWEyYWEU+DmkEQ3hbiOyESHFOsWE4zA2F5WY0w2Xsr+wDCS+14WCN6NHT
6sxSYGj1hd+ni6GhlGxfRpUXtY59h7TKHaaHYKPIOsO4OWXVeD53trpq416kvqal
zWkFOWeiEyeJnNKv0z0+5QWTeFDjTd1YawWcK/8kFez1Y4BXBECTgKoJaEcvowfU
j7oQ0BJxtLlxFgCB84bZTUbGuyn1x9FiS7Z2w9JcqSkTLMabFjsbA15eZIV66N4s
boaxlRvyvelR09eiiYYqOLxUmeEi1wRqtAM3yln1Y5/MX+DAmf5/sylqKtC5eLg=
=ezkr
-----END PGP SIGNATURE-----
Merge tag 'v3.15-rc5' into next
Merge with Linux 3.15-rc5 to sync up Wacom and other changes.
We put this workaround in 2008 and the offending userspace has been fixed
up long time ago; the link in the message is no longer valid either, so it
is time to retire it.
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
If a new (id == -1) ff effect was uploaded from userspace,
ff-core.c::input_ff_upload() will have assigned a positive number to the
new effect id. Currently, evdev.c::evdev_do_ioctl() will save this new id
to userspace, regardless of whether the upload succeeded or not.
On upload failure, this can be confusing because the dev->ff->effects[]
array will not contain an element at the index of that new effect id.
This patch fixes this by leaving the id unchanged after upload fails.
Note: Unfortunately applications should still expect changed effect id for
quite some time.
This has been discussed on:
http://www.mail-archive.com/linux-input@vger.kernel.org/msg08513.html
("ff-core effect id handling in case of a failed effect upload")
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Elias Vanderstuyft <elias.vds@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
evdev always tries to allocate the event buffer for clients using
kzalloc rather than vmalloc, presumably to avoid mapping overhead where
possible. However, drivers like bcm5974, which claims support for
reporting 16 fingers simultaneously, can have an extraordinarily large
buffer. The resultant contiguous order-4 allocation attempt fails due
to fragmentation, and the device is thus unusable until reboot.
Try kzalloc if we can to avoid the mapping overhead, but if that fails,
fall back to vzalloc.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
If we have multiple sessions on a system, we normally don't want
background sessions to read input events. Otherwise, it could capture
passwords and more entered by the user on the foreground session. This is
a real world problem as the recent XMir development showed:
http://mjg59.dreamwidth.org/27327.html
We currently rely on sessions to release input devices when being
deactivated. This relies on trust across sessions. But that's not given on
usual systems. We therefore need a way to control which processes have
access to input devices.
With VTs the kernel simply routed them through the active /dev/ttyX. This
is not possible with evdev devices, though. Moreover, we want to avoid
routing input-devices through some dispatcher-daemon in userspace (which
would add some latency).
This patch introduces EVIOCREVOKE. If called on an evdev fd, this revokes
device-access irrecoverably for that *single* open-file. Hence, once you
call EVIOCREVOKE on any dup()ed fd, all fds for that open-file will be
rather useless now (but still valid compared to close()!). This allows us
to pass fds directly to session-processes from a trusted source. The
source keeps a dup()ed fd and revokes access once the session-process is
no longer active.
Compared to the EVIOCMUTE proposal, we can avoid the CAP_SYS_ADMIN
restriction now as there is no way to revive the fd again. Hence, a user
is free to call EVIOCREVOKE themself to kill the fd.
Additionally, this ioctl allows multi-layer access-control (again compared
to EVIOCMUTE which was limited to one layer via CAP_SYS_ADMIN). A middle
layer can simply request a new open-file from the layer above and pass it
to the layer below. Now each layer can call EVIOCREVOKE on the fds to
revoke access for all layers below, at the expense of one fd per layer.
There's already ongoing experimental user-space work which demonstrates
how it can be used:
http://lists.freedesktop.org/archives/systemd-devel/2013-August/012897.html
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
If userspace requests current KEY-state, they very likely assume that no
such events are pending in the output queue of the evdev device.
Otherwise, they will parse events which they already handled via
EVIOCGKEY(). For XKB applications this can cause irreversible keyboard
states if a modifier is locked multiple times because a CTRL-DOWN event is
handled once via EVIOCGKEY() and once from the queue via read(), even
though it should handle it only once.
Therefore, lets do the only logical thing and flush the evdev queue
atomically during this ioctl. We only flush events that are affected by
the given ioctl.
This only affects boolean events like KEY, SND, SW and LED. ABS, REL and
others are not affected as duplicate events can be handled gracefully by
user-space.
Note: This actually breaks semantics of the evdev ABI. However,
investigations showed that userspace already expects the new semantics and
we end up fixing at least all XKB applications.
All applications that are aware of this race-condition mirror the KEY
state for each open-file and detect/drop duplicate events. Hence, they do
not care whether duplicates are posted or not and work fine with this fix.
Also note that we need proper locking to guarantee atomicity and avoid
dead-locks. event_lock must be locked before queue_lock (see input-core).
However, we can safely release event_lock while flushing the queue. This
allows the input-core to proceed with pending events and only stop if it
needs our queue_lock to post new events.
This should guarantee that we don't block event-dispatching for too long
while flushing a single event queue.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Commit 7f8d4cad1e ("Input: extend the number of event (and other)
devices") made evdev, joydev and mousedev to embed struct cdev into
their respective structures representing input devices.
Unfortunately character device structure may outlive the parent
structure unless we do not set it up as parent of character device so
that it will stay pinned until character device is freed.
Also, now that parent structure is pinned while character device exists
we do not need to pin and unpin it every time user opens or closes it.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend the amount of character devices, such as eventX, mouseX and jsX,
from a hard limit of 32 per input handler to about 1024 shared across
all handlers.
To be compatible with legacy installations input handlers will start
creating char devices with minors in their legacy range, however once
legacy range is exhausted they will start allocating minors from the
dynamic range 256-1024.
Reviewed-by: David Herrmann <dh.herrmann@googlemail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
By sending a full frame of events at the same time, the irqsoff
latency at heavy load is brought down from 200 us to 100 us.
Cc: Daniel Kurtz <djkurtz@chromium.org>
Tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Move all MT-related things to a separate place. This saves some
bytes for non-mt input devices, and prepares for new MT features.
Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
According to the standard count 0 is special - no IO should happen but we
can check error conditions (device gone away, etc), and return 0 if there
are no errors. We used to return -EINVAL instead and we also could return 0
if an event was "stolen" by another thread.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
We should use rcu_dereference_protected() when checking if given client
is the one that grabbed the device. This fixes warnings produced by
sparse.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Commit 509f87c5f5 (evdev - do not block waiting for an event if fd
is nonblock) created a code path were it was possible to use retval
uninitialized.
This could lead to the xorg evdev input driver getting corrupt data
and refusing to work with log messages like
AUO-Pixcir touchscreen: Read error: Success
sg060_keys: Read error: Success
AUO-Pixcir touchscreen: Read error: Success
sg060_keys: Read error: Success
(for drivers auo-pixcir-ts and gpio-keys).
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Dima Zavin <dima@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch adds the ability to extract MT slot data via a new ioctl,
EVIOCGMTSLOTS. The function returns an array of slot values for the
specified ABS_MT event type.
Example of user space usage:
struct { unsigned code; int values[64]; } req;
req.code = ABS_MT_POSITION_X;
if (ioctl(fd, EVIOCGMTSLOTS(sizeof(req)), &req) < 0)
return -1;
for (i = 0; i < 64; i++)
printf("slot %d: %d\n", i, req.values[i]);
Reviewed-by: Chase Douglas <chase.douglas@canonical.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
As noted by Arve and others, since wall time can jump backwards, it is
difficult to use for input because one cannot determine if one event
occurred before another or for how long a key was pressed.
However, the timestamp field is part of the kernel ABI, and cannot be
changed without possibly breaking existing users.
This patch adds a new IOCTL that allows a clockid to be set in the
evdev_client struct that will specify which time base to use for event
timestamps (ie: CLOCK_MONOTONIC instead of CLOCK_REALTIME).
For now we only support CLOCK_MONOTONIC and CLOCK_REALTIME, but
in the future we could support other clockids if appropriate.
The default remains CLOCK_REALTIME, so we don't change the ABI.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Daniel Kurtz <djkurtz@google.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Commit 509f87c5f5 (evdev - do not block waiting for an event if fd
is nonblock) created a code path were it was possible to use retval
uninitialized.
This could lead to the xorg evdev input driver getting corrupt data
and refusing to work with log messages like
AUO-Pixcir touchscreen: Read error: Success
sg060_keys: Read error: Success
AUO-Pixcir touchscreen: Read error: Success
sg060_keys: Read error: Success
(for drivers auo-pixcir-ts and gpio-keys).
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Dima Zavin <dima@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
If there is a full packet in the buffer, and we overflow that buffer
right after checking for that condition, it would have been possible
for us to block indefinitely (rather, until the next full packet) even if
the file was marked as O_NONBLOCK.
Cc: Jeff Brown <jeffbrown@android.com>
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Without this, it was possible for the reader to get ahead of packet_head.
If the input device generated a partial packet *right* after the reader
got ahead, then we can get into a situation where the device is marked
readable, but read always returns 0 until the next packet is finished
(i.e a SYN is generated by the input driver).
This situation can also happen if we overflow the buffer while a reader
is trying to read an event out.
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
We should only wake waiters on the event device when we actually post
an EV_SYN/SYN_REPORT to the queue. Otherwise we end up making waiting
threads runnable only to go right back to sleep because the device
still isn't readable.
Reported-by: Jeffrey Brown <jeffbrown@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
There is no need to call synchronize_rcu() after a list insertion,
or a NULL->ptr assignment.
However, the reverse operations do need this call.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch modifies evdev so that it only becomes readable when
the buffer contains an EV_SYN/SYN_REPORT event.
On SMP systems, it is possible for an evdev client blocked on poll()
to wake up and read events from the evdev ring buffer at the same
rate as they are enqueued. This can result in high CPU usage,
particularly for MT devices, because the client ends up reading
events one at a time instead of reading complete packets.
We eliminate this problem by making the device readable only when
the buffer contains at least one complete packet. This causes
clients to block until the entire packet is available.
Signed-off-by: Jeff Brown <jeffbrown@android.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Add a new EV_SYN code, SYN_DROPPED, to inform the client when input
events have been dropped from the evdev input buffer due to a
buffer overrun. The client should use this event as a hint to
reset its state or ignore all following events until the next
packet begins.
Signed-off-by: Jeff Brown <jeffbrown@android.com>
[dtor@mail.ru: Implement Henrik's suggestion and drop old events in
case of overflow.]
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
As was recently brought up on the busybox list
(http://lists.busybox.net/pipermail/busybox/2011-January/074565.html),
evdev_write doesn't properly check the count argument, which will
lead to a return value > count on partial writes if the remaining bytes
are accessible - causing userspace confusion.
Fix it by only handling each full input_event structure and return -EINVAL
if less than 1 struct was written, similar to how it is done in evdev_read.
Reported-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Today, userspace sets up an input device based on the data it emits.
This is not always enough; a tablet and a touchscreen may emit exactly
the same data, for instance, but the former should be set up with a
pointer whereas the latter does not need to. Recently, a new type of
touchpad has emerged where the buttons are under the pad, which
changes logic without changing the emitted data. This patch introduces
a new ioctl, EVIOCGPROP, which enables user access to a set of device
properties useful during setup. The properties are given as a bitmap
in the same fashion as the event types, and are also made available
via sysfs, uevent and /proc/bus/input/devices.
Acked-by: Ping Cheng <pingc@wacom.com>
Acked-by: Chase Douglas <chase.douglas@canonical.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while
extending them to support large scancodes was a mistake. While we tried
to keep ABI intact (and we succeeded in doing that, programs compiled
on older kernels will work on newer ones) there is still a problem with
recompiling existing software with newer kernel headers.
New kernel headers will supply updated ioctl numbers and kernel will
expect that userspace will use struct input_keymap_entry to set and
retrieve keymap data. But since the names of ioctls are still the same
userspace will happily compile even if not adjusted to make use of the
new structure and will start miraculously fail in the field.
To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions
and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly
select the style of ioctls it wants to employ.
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
448cd16 ("Input: evdev - rearrange ioctl handling") broke EVIOCSABS by
checking for the wrong direction bit.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Tested-by: Sven Neumann <s.neumann@raumfeld.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>