2014-01-16 23:16:47 +08:00
|
|
|
infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_cm.o
|
2006-12-01 08:53:41 +08:00
|
|
|
user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o
|
2006-06-18 11:37:28 +08:00
|
|
|
|
2016-05-19 22:12:33 +08:00
|
|
|
obj-$(CONFIG_INFINIBAND) += ib_core.o ib_cm.o iw_cm.o \
|
2014-01-16 23:16:47 +08:00
|
|
|
$(infiniband-y)
|
2005-09-08 03:43:08 +08:00
|
|
|
obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
|
2006-12-01 08:53:41 +08:00
|
|
|
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
|
|
|
|
$(user_access-y)
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2016-05-04 00:01:09 +08:00
|
|
|
ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
|
IB/core: Add RoCE GID table management
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.
Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.
In order to populate the GID table, we listen for events:
(a) netdev up/down/change_addr events - if a netdev is built onto
our RoCE device, we need to add/delete its IPs. This involves
adding all GIDs related to this ndev, add default GIDs, etc.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.
RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.
RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).
Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.
While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.
In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.
When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).
The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.
The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-30 23:33:26 +08:00
|
|
|
device.o fmr_pool.o cache.o netlink.o \
|
2016-05-19 22:12:33 +08:00
|
|
|
roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
multicast.o mad.o smi.o agent.o mad_rmpp.o \
|
2017-06-20 14:14:15 +08:00
|
|
|
security.o nldev.o
|
|
|
|
|
2007-03-05 08:15:11 +08:00
|
|
|
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
|
2014-12-11 23:04:18 +08:00
|
|
|
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
|
2017-01-10 08:02:14 +08:00
|
|
|
ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-07-28 02:45:40 +08:00
|
|
|
ib_cm-y := cm.o
|
|
|
|
|
2014-03-27 06:07:35 +08:00
|
|
|
iw_cm-y := iwcm.o iwpm_util.o iwpm_msg.o
|
2006-08-04 05:02:42 +08:00
|
|
|
|
2006-06-18 11:37:29 +08:00
|
|
|
rdma_cm-y := cma.o
|
|
|
|
|
2015-12-23 20:56:55 +08:00
|
|
|
rdma_cm-$(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS) += cma_configfs.o
|
|
|
|
|
2006-12-01 08:53:41 +08:00
|
|
|
rdma_ucm-y := ucma.o
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
ib_umad-y := user_mad.o
|
2005-07-08 08:57:14 +08:00
|
|
|
|
2005-07-28 02:45:45 +08:00
|
|
|
ib_ucm-y := ucm.o
|
|
|
|
|
2017-04-04 18:31:42 +08:00
|
|
|
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
|
IB/core: Add new ioctl interface
In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.
Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.
For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.
Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.
When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.
Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR
If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.
objects
+--------+
| |
| | methods +--------+
| | ns method method_spec +-----+ |len |
+--------+ +------+[d]+-------+ +----------------+[d]+------------+ |attr1+-> |type |
| object +> |method+-> | spec +-> + attr_buckets +-> |default_chain+--> +-----+ |idr_type|
+--------+ +------+ |handler| | | +------------+ |attr2| |access |
| | | | +-------+ +----------------+ |driver chain| +-----+ +--------+
| | | | +------------+
| | +------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+--------+
[d] = Hash ids to groups using the high order bits
The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.
Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
struct uverbs_attr_bundle *ctx);
ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:
ctx core
+----------------------------+ +------------+
| core: +---> | valid |
+----------------------------+ | cmd_attr |
| driver: | +------------+
|----------------------------+--+ | valid |
| | cmd_attr |
| +------------+
| | valid |
| | obj_attr |
| +------------+
|
| drivers
| +------------+
+> | valid |
| cmd_attr |
+------------+
| valid |
| cmd_attr |
+------------+
| valid |
| obj_attr |
+------------+
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-03 21:06:57 +08:00
|
|
|
rdma_core.o uverbs_std_types.o uverbs_ioctl.o
|