If we exclude directories and symlinks from the set of sysfs
dirents where we need active references we are left with
sysfs attributes (binary or not).
- Tweak sysfs_deactivate to only do something on attributes
- Move lockdep initialization into sysfs_file_add_mode to
limit it to just attributes.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out that holding an active reference on a directory is
pointless. The purpose of the active references are to allows us to
block when removing sysfs entries that have custom methods so we don't
remove modules while running modular code and to keep those custom
methods from accessing data structures after the files have been
removed. Further sysfs_remove_dir remove all elements in the
directory before removing the directory itself, so there is no chance
we will remove a directory with active children.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adding/Removing a whole array of attributes is very common. Add a standard
utility function to do this with a simple function call, instead of
requiring drivers to open code this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache. Reducing the amount of work needed and
simplifying the locking.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy. Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.
Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_notify_dirent is a simple atomic operation that can be used to
alert user-space that new data can be read from a sysfs attribute.
Unfortunately it cannot currently be called from non-process context
because of its use of spin_lock which is sometimes taken with
interrupts enabled.
So change all lockers of sysfs_open_dirent_lock to disable interrupts,
thus making sysfs_notify_dirent safe to be called from non-process
context (as drivers/md does in md_safemode_timeout).
sysfs_get_open_dirent is (documented as being) only called from
process context, so it uses spin_lock_irq. Other places
use spin_lock_irqsave.
The usage for sysfs_notify_dirent in md_safemode_timeout was
introduced in 2.6.28, so this patch is suitable for that and more
recent kernels.
Reported-by: Joel Andres Granados <jgranado@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a kernel thread per CPU for this application.
Acked-by: Alex Chiang <achiang@hp.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, following test programs don't finished.
% ruby -e '
Thread.new { sleep }
File.read("/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies")
'
strace expose the reason.
...
open("/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies", O_RDONLY|O_LARGEFILE) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9fa6b8) = -1 ENOTTY (Inappropriate ioctl for device)
fstat64(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
_llseek(3, 0, [0], SEEK_CUR) = 0
select(4, [3], NULL, NULL, NULL) = 1 (in [3])
read(3, "1400000 1300000 1200000 1100000 1"..., 4096) = 62
select(4, [3], NULL, NULL, NULL
Because Ruby (the scripting language) VM assume select system-call
against regular file don't block. it because SUSv3 says "Regular files
shall always poll TRUE for reading and writing". see
http://www.opengroup.org/onlinepubs/009695399/functions/poll.html it
seems valid assumption.
But sysfs_poll() don't keep this rule although sysfs file can read and
write always.
This patch restore proper poll behavior to sysfs.
/sys/block/md*/md/sync_action polling application and another sysfs
updating sensitive application still can use POLLERR and POLLPRI.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A sysfs attribute using sysfs_schedule_callback() to commit suicide
may end up calling device_unregister(), which will eventually call
a driver's ->remove function.
Drivers may call flush_scheduled_work() in their shutdown routines,
in which case lockdep will complain with something like the following:
=============================================
[ INFO: possible recursive locking detected ]
2.6.29-rc8-kk #1
---------------------------------------------
events/4/56 is trying to acquire lock:
(events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
but task is already holding lock:
(events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
other info that might help us debug this:
3 locks held by events/4/56:
#0: (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
#1: (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
#2: (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
stack backtrace:
Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1
Call Trace:
[<ffffffff8026dfcd>] validate_chain+0xb7d/0x1260
[<ffffffff8026eade>] __lock_acquire+0x42e/0xa40
[<ffffffff8026f148>] lock_acquire+0x58/0x80
[<ffffffff80257fc0>] ? flush_workqueue+0x0/0xa0
[<ffffffff8025800d>] flush_workqueue+0x4d/0xa0
[<ffffffff80257fc0>] ? flush_workqueue+0x0/0xa0
[<ffffffff80258070>] flush_scheduled_work+0x10/0x20
[<ffffffffa0144065>] e1000_remove+0x55/0xfe [e1000e]
[<ffffffff8033ee30>] ? sysfs_schedule_callback_work+0x0/0x50
[<ffffffff803bfeb2>] pci_device_remove+0x32/0x70
[<ffffffff80441da9>] __device_release_driver+0x59/0x90
[<ffffffff80441edb>] device_release_driver+0x2b/0x40
[<ffffffff804419d6>] bus_remove_device+0xa6/0x120
[<ffffffff8043e46b>] device_del+0x12b/0x190
[<ffffffff8043e4f6>] device_unregister+0x26/0x70
[<ffffffff803ba969>] pci_stop_dev+0x49/0x60
[<ffffffff803baab0>] pci_remove_bus_device+0x40/0xc0
[<ffffffff803c10d9>] remove_callback+0x29/0x40
[<ffffffff8033ee4f>] sysfs_schedule_callback_work+0x1f/0x50
[<ffffffff8025769a>] run_workqueue+0x15a/0x230
[<ffffffff80257648>] ? run_workqueue+0x108/0x230
[<ffffffff8025846f>] worker_thread+0x9f/0x100
[<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff802583d0>] ? worker_thread+0x0/0x100
[<ffffffff8025b89d>] kthread+0x4d/0x80
[<ffffffff8020d4ba>] child_rip+0xa/0x20
[<ffffffff8020cebc>] ? restore_args+0x0/0x30
[<ffffffff8025b850>] ? kthread+0x0/0x80
[<ffffffff8020d4b0>] ? child_rip+0x0/0x20
Although we know that the device_unregister path will never acquire
a lock that a driver might try to acquire in its ->remove, in general
we should never attempt to flush a workqueue from within the same
workqueue, and lockdep rightly complains.
So as long as sysfs attributes cannot commit suicide directly and we
are stuck with this callback mechanism, put the sysfs callbacks on
their own workqueue instead of the global one.
This has the side benefit that if a suicidal sysfs attribute kicks
off a long chain of ->remove callbacks, we no longer induce a long
delay on the global queue.
This also fixes a missing module_put in the error path introduced
by sysfs-only-allow-one-scheduled-removal-callback-per-kobj.patch.
We never destroy the workqueue, but I'm not sure that's a
problem.
Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The only way for a sysfs attribute to remove itself (without
deadlock) is to use the sysfs_schedule_callback() interface.
Vegard Nossum discovered that a poorly written sysfs ->store
callback can repeatedly schedule remove callbacks on the same
device over and over, e.g.
$ while true ; do echo 1 > /sys/devices/.../remove ; done
If the 'remove' attribute uses the sysfs_schedule_callback API
and also does not protect itself from concurrent accesses, its
callback handler will be called multiple times, and will
eventually attempt to perform operations on a freed kobject,
leading to many problems.
Instead of requiring all callers of sysfs_schedule_callback to
implement their own synchronization, provide the protection in
the infrastructure.
Now, sysfs_schedule_callback will only allow one scheduled
callback per kobject. On subsequent calls with the same kobject,
return -EAGAIN.
This is a short term fix. The long term fix is to allow sysfs
attributes to remove themselves directly, without any of this
callback hokey pokey.
[cornelia.huck@de.ibm.com: s390 ccwgroup bits]
Reported-by: vegard.nossum@gmail.com
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Because they can be, and because code like this produces a warning if
they're not:
struct device_attribute dev_attr;
sysfs_notify(&kobj, NULL, dev_attr.attr.name);
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
CC: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Support sysfs_notify from atomic context with new sysfs_notify_dirent
sysfs_notify currently takes sysfs_mutex.
This means that it cannot be called in atomic context.
sysfs_mutex is sometimes held over a malloc (sysfs_rename_dir)
so it can block on low memory.
In md I want to be able to notify on a sysfs attribute from
atomic context, and I don't want to block on low memory because I
could be in the writeout path for freeing memory.
So:
- export the "sysfs_dirent" structure along with sysfs_get, sysfs_put
and sysfs_get_dirent so I can get the sysfs_dirent that I want to
notify on and hold it in an md structure.
- split sysfs_notify_dirent out of sysfs_notify so the sysfs_dirent
can be notified on with no blocking (just a spinlock).
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print the name of the last-accessed sysfs file when we oops, to help track
down oopses which occur in sysfs store/read handlers. Because these oopses
tend to not leave any trace of the offending code in the stack traces.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes
part of the warning section for better reporting/collection. Also, with this,
one fo the if() sections collapses entirely into the WARN().
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs_chmod_file() calls notify_change() to change the permission bits
on a sysfs file. Replace with explicit call to sysfs_setattr() and
fsnotify_change().
This is equivalent, except that security_inode_setattr() is not
called. This function is called by drivers, so the security checks do
not make any sense.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a problem in scsi_transport_spi in that we need to customise
not only the visibility of the attributes, but also their mode. Fix
this by making the is_visible() callback return a mode, with 0
indicating is not visible.
Also add a sysfs_update_group() API to allow us to change either the
visibility or mode of the files at any time on the fly.
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Requiring userspace to close and re-open sysfs attributes has been the
policy since before 2.6.12. It allows userspace to get a consistent
snapshot of kernel state and consume it with incremental reads and seeks.
Now, if the file position is zero the kernel assumes userspace wants to see
the new value. The application for this change is to allow a userspace
RAID metadata handler to check the state of an array without causing any
memory allocations. Thus not causing writeback to a raid array that might
be blocked waiting for userspace to take action.
Cc: Neil Brown <neilb@suse.de>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After an experimental deletion of the unnecessary inclusion of
<linux/slab.h> from the header file <linux/percpu.h>, the following
files under fs/sysfs were exposed as needing to explicitly include
<linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Try to find the culprit who caused
http://bugzilla.kernel.org/show_bug.cgi?id=10150
Cc: <balajirrao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
[SCSI] usbstorage: use last_sector_bug flag universally
[SCSI] libsas: abstract STP task status into a function
[SCSI] ultrastor: clean up inline asm warnings
[SCSI] aic7xxx: fix firmware build
[SCSI] aacraid: fib context lock for management ioctls
[SCSI] ch: remove forward declarations
[SCSI] ch: fix device minor number management bug
[SCSI] ch: handle class_device_create failure properly
[SCSI] NCR5380: fix section mismatch
[SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
[SCSI] IB/iSER: add logical unit reset support
[SCSI] don't use __GFP_DMA for sense buffers if not required
[SCSI] use dynamically allocated sense buffer
[SCSI] scsi.h: add macro for enclosure bit of inquiry data
[SCSI] sd: add fix for devices with last sector access problems
[SCSI] fix pcmcia compile problem
[SCSI] aacraid: add Voodoo Lite class of cards.
[SCSI] aacraid: add new driver features flags
[SCSI] qla2xxx: Update version number to 8.02.00-k7.
[SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
...
Remove the no longer needed subsys_attributes, they are all converted to
the more sensical kobj_attributes.
There is no longer a magic fallback in sysfs attribute operations, all
kobjects which create simple attributes need explicitely a ktype
assigned, which tells the core what was intended here.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.
This patch is based on a lot of help from Kay Sievers.
Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I can't see a reason why these shouldn't work on every group. However,
they only seem to work on named groups. This patch allows the group
functions to work on anonymous groups (those with NULL names).
Acked-by: Tejun Heo <htejun@gmail.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
I found that there is a off-by-one problem in the following code.
Version: 2.6.24-rc2
File: fs/sysfs/file.c:118-122
Function: fill_read_buffer
--------------------------------------------------------------------
count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);
sysfs_put_active_two(attr_sd);
BUG_ON(count > (ssize_t)PAGE_SIZE);
--------------------------------------------------------------------
Because according to the specification of the sysfs and the implement of
the show methods, the show methods return the number of bytes which would
be generated for the given input, excluding the trailing null.So if the
return value of the show methods equals PAGE_SIZE - 1, the buffer is full
in fact. And if the return value equals PAGE_SIZE, the resulting string
was already truncated,or buffer overflow occurred.
This patch fixes an off-by-one error in fill_read_buffer.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sysfs file poll implementation is scattered over sysfs and kobject.
Event numbering is done in sysfs_dirent but wait itself is done on
kobject. This not only unecessarily bloats both kobject and
sysfs_dirent but is also buggy - if a sysfs_dirent is removed while
there still are pollers, the associaton betwen the kobject and
sysfs_dirent breaks and kobject may be freed with the pollers still
sleeping on it.
This patch moves whole poll implementation into sysfs_open_dirent.
Each time a sysfs_open_dirent is created, event number restarts from 1
and pollers sleep on sysfs_open_dirent. As event sequence number is
meaningless without any open file and pollers should have open file
and thus sysfs_open_dirent, this ephemeral event counting works and is
a saner implementation.
This patch fixes the dnagling sleepers bug and reduces the sizes of
kobject and sysfs_dirent by one pointer.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Implement sysfs_open_dirent which represents an open file (attribute)
sysfs_dirent. A file sysfs_dirent with one or more open files have
one sysfs_dirent and all sysfs_buffers (one for each open instance)
are linked to it.
sysfs_open_dirent doesn't actually do anything yet but will be used to
off-load things which are specific for open file sysfs_dirent from it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make s_elem an anonymous union. Prefixing with s_elem makes things
needlessly longer without any advantage.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In sysfs_release(), sysfs_buffer pointed to by filp->private_data is
guaranteed to exist. Kill the unnecessary NULL check. This also
makes the code more consistent with the counterpart in fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's no reason to get an extra reference to sysfs_dirent for an
open file. Open file has a reference to the dentry which in turn has
a reference to sysfs_dirent. This is fairly obvious as otherwise open
itself won't be able to access the sysfs_dirent. Kill the extra
sysfs_get() and matching sysfs_put().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now
reclaimable making the reported modification time unreliable. There's
only one user (pci hotplug) of this notification mechanism and it
reportedly isn't utilized from userland.
Kill sysfs_update_file().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_chmod_file() looked and updated only inode of the target file.
Dentry and inode are reclaimable and the update mode data will go away
when the inode is reclaimed. This patch makes sysfs_chmod_file()
update sd->s_mode too such that the change is permanent.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Looking carefully at the rename code we have a subtle dependency
that the structure of sysfs not change while we are performing
a rename. If the parent directory of the object we are renaming
changes while the rename is being performed nasty things could
happen when we go to release our locks.
So introduce a sysfs_rename_mutex to prevent this highly
unlikely theoretical issue.
In addition hold sysfs_rename_mutex over all calls to
sysfs_get_dentry. Allowing sysfs_get_dentry to be simplified
in the future.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make sysfs_add_one() check for duplicate entry and return -EEXIST if
such entry exists. This simplifies node addition code a bit.
This patch doesn't introduce any noticeable behavior change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When adding or removing a sysfs_dirent, the user used to be required
to call link/unlink separately. It was for two reasons - code looked
like that before sysfs_addrm_cxt conversion and to avoid looping
through parent_sd->children list twice during removal.
Performance optimization during removal just isn't worth it. Make
sysfs_add/remove_one() call sysfs_link/unlink_sibing() implicitly.
This makes code simpler albeit slightly less efficient. This change
doesn't introduce any noticeable behavior change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use mutex instead of semaphore in sysfs/file.c : sys_buffer.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Node addition failure is detected by testing return value of
sysfs_addfm_finish() which returns the number of added and removed
nodes. As the function is called as the last step of addition right
on top of error handling block, the if blocks looked like the
following.
if (sysfs_addrm_finish(&acxt))
success handling, usually return;
/* fall through to error handling */
This is the opposite of usual convention in sysfs and makes the code
difficult to understand. This patch inverts the test and makes those
blocks look more like others.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch makes dentries and inodes for sysfs directories
reclaimable.
* sysfs_notify() is modified to walk sysfs_dirent tree instead of
dentry tree.
* sysfs_update_file() and sysfs_chmod_file() use sysfs_get_dentry() to
grab the victim dentry.
* sysfs_rename_dir() and sysfs_move_dir() grab all dentries using
sysfs_get_dentry() on startup.
* Dentries for all shadowed directories are pinned in memory to serve
as lookup start point.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original add/remove code had the following problems.
* parent's timestamps are updated on dentry instantiation. this is
incorrect with reclaimable files.
* updating parent's timestamps isn't synchronized.
* parent nlink update assumes the inode is accessible which won't be
true once directory dentries are made reclaimable.
This patch restructures add/remove paths to resolve the above
problems. Add/removal are done in the following steps.
1. sysfs_addrm_start() : acquire locks including sysfs_mutex and other
resources.
2-a. sysfs_add_one() : add new sd. linking the new sd into the
children list is caller's responsibility.
2-b. sysfs_remove_one() : remove a sd. unlinking the sd from the
children list is caller's responsibility.
3. sysfs_addrm_finish() : release all resources and clean up.
Steps 2-a and/or 2-b can be repeated multiple times.
Parent's inode is looked up during sysfs_addrm_start(). If available
(always at the moment), it's pinned and nlink is updated as sd's are
added and removed. Timestamps are updated during finish if any sd has
been added or removed. If parent's inode is not available during
start, sysfs_mutex ensures that parent inode is not created till
add/remove is complete.
All the complexity is contained inside the helper functions.
Especially, dentry/inode handling is properly hidden from the rest of
sysfs which now mostly operate on sysfs_dirents. As an added bonus,
codes which use these helpers to add and remove sysfs_dirents are now
more structured and simpler.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As kobj sysfs dentries and inodes are gonna be made reclaimable,
i_mutex can't be used to protect sysfs_dirent tree. Use sysfs_mutex
globally instead. As the whole tree is protected with sysfs_mutex,
there is no reason to keep sysfs_rename_sem. Drop it.
While at it, add docbook comments to functions which require
sysfs_mutex locking.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As kobj sysfs dentries and inodes are gonna be made reclaimable,
dentry can't be used as naming token for sysfs file/directory, replace
kobj->dentry with kobj->sd. The only external interface change is
shadow directory handling. All other changes are contained in kobj
and sysfs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Implement sysfs_find_dirent() and sysfs_get_dirent().
sysfs_dirent_exist() is replaced by sysfs_find_dirent(). These will
be used to make directory entries reclamiable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.
This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.
For more info regarding lifetime rule cleanup, please read the
following message.
http://article.gmane.org/gmane.linux.kernel/510293
(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that sysfs_dirent can be disconnected from kobject on deletion,
there is no need to orphan each attribute files. All [bin_]attribute
nodes are automatically orphaned when the parent node is deleted.
Kill attribute file orphaning.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs: implement sysfs_dirent active reference and immediate disconnect
Opening a sysfs node references its associated kobject, so userland
can arbitrarily prolong lifetime of a kobject which complicates
lifetime rules in drivers. This patch implements active reference and
makes the association between kobject and sysfs immediately breakable.
Now each sysfs_dirent has two reference counts - s_count and s_active.
s_count is a regular reference count which guarantees that the
containing sysfs_dirent is accessible. As long as s_count reference
is held, all sysfs internal fields in sysfs_dirent are accessible
including s_parent and s_name.
The newly added s_active is active reference count. This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other). Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference. This includes access
to the associated kobjects, attributes and ops.
The active references can be drained and denied by calling
sysfs_deactivate(). All active sysfs_dirents must be deactivated
after deletion but before the default reference is dropped. This
enables immediate disconnect of sysfs nodes. Once a sysfs_dirent is
deleted, it won't access any entity external to sysfs proper.
Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references. Parent's is acquired first and released last.
Unlike other operations, mmapped area lingers on after mmap() is
finished and the module implement implementing it and kobj need to
stay referenced till all the mapped pages are gone. This is
accomplished by holding one set of active references to the bin_attr
and its parent if there have been any mmap during lifetime of an
openfile. The references are dropped when the openfile is released.
This change makes sysfs lifetime rules independent from both kobject's
and module's. It not only fixes several race conditions caused by
sysfs not holding onto the proper module when referencing kobject, but
also helps fixing and simplifying lifetime management in driver model
and drivers by taking sysfs out of the equation.
Please read the following message for more info.
http://article.gmane.org/gmane.linux.kernel/510293
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>