linux/include
Nirmoy Das 0cdea4455a drm/mm: optimize rb_hole_addr rbtree search
Userspace can severely fragment rb_hole_addr rbtree by manipulating
alignment while allocating buffers. Fragmented rb_hole_addr rbtree
would result in large delays while allocating buffer object for a
userspace application. It takes long time to find suitable hole
because if we fail to find a suitable hole in the first attempt
then we look for neighbouring nodes using rb_prev()/rb_next().
Traversing rbtree using rb_prev()/rb_next() can take really long
time if the tree is fragmented.

This patch improves searches in fragmented rb_hole_addr rbtree by
modifying it to an augmented rbtree which will store an extra field
in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum
hole size for its subtree in drm_mm_node->subtree_max_hole. Using
drm_mm_node->subtree_max_hole, it is possible to eliminate a complete
subtree if that subtree is unable to serve a request hence reducing
number of rb_prev()/rb_next() used.

With this patch applied, 1 million bo allocs on amdgpu took ~8 sec,
compared to 50k bo allocs which took 28 sec without it.

partial test code:
int test_fragmentation(void)
{

	int i = 0;
        uint32_t  minor_version;
        uint32_t  major_version;

        struct amdgpu_bo_alloc_request request = {};
        amdgpu_bo_handle vram_handle[MAX_ALLOC] = {};
        amdgpu_device_handle device_handle;

        request.alloc_size = 4096;
        request.phys_alignment = 8192;
        request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM;

        int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC);
        amdgpu_device_initialize(fd, &major_version,  &minor_version,
				 &device_handle);

        for (i = 0; i < MAX_ALLOC; i++) {
                amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]);
        }

        for (i = 0; i < MAX_ALLOC; i++)
                amdgpu_bo_free(vram_handle[i]);

        return 0;
}

v2:
Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole
v3:
insert_hole_addr() should be static a function
fix return value of next_hole_high_addr()/next_hole_low_addr()
Reported-by: kbuild test robot <lkp@intel.com>
v4:
Fix commit message.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/364341/
Signed-off-by: Christian König <christian.koenig@amd.com>
2020-05-05 13:39:38 +02:00
..
acpi Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
asm-generic userfaultfd: wp: add pmd_swp_*uffd_wp() helpers 2020-04-07 10:43:39 -07:00
clocksource pwm: omap-dmtimer: Drop unused header file 2020-03-30 18:03:06 +02:00
crypto crypto: curve25519 - do not pollute dispatcher based on assembler 2020-04-09 00:01:59 +09:00
drm drm/mm: optimize rb_hole_addr rbtree search 2020-05-05 13:39:38 +02:00
dt-bindings RISC-V Patches for the 5.7 Merge Window, Part 1 2020-04-09 10:51:30 -07:00
keys KEYS: Don't write out to userspace while holding key semaphore 2020-03-29 12:40:41 +01:00
kunit kunit: subtests should be indented 4 spaces according to TAP 2020-03-26 14:08:41 -06:00
kvm KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interrupts 2020-03-24 12:15:51 +00:00
linux Merge drm/drm-next into drm-misc-next 2020-04-17 08:12:22 +02:00
math-emu
media media: cec-notifier: make cec_notifier_get_conn() static 2020-03-20 09:02:45 +01:00
misc
net 9p pull request for inclusion in 5.7 2020-04-06 08:46:59 -07:00
pcmcia
ras
rdma IB/mlx5: Expose UAR object and its alloc/destroy commands 2020-03-27 12:59:04 -03:00
scsi SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
soc ARM: driver updates 2020-04-03 15:05:35 -07:00
sound ASoC: Fixes for v5.7 2020-04-08 18:08:09 +02:00
target scsi: target: fix hang when multiple threads try to destroy the same iscsi session 2020-03-26 21:47:47 -04:00
trace Merge branch 'akpm' (patches from Andrew) 2020-04-07 14:11:54 -07:00
uapi IOMMU Updates for Linux v5.7 2020-04-08 11:00:00 -07:00
vdso vdso: Fix clocksource.h macro detection 2020-03-23 18:51:08 +01:00
video
xen xen: Use evtchn_type_t as a type for event channels 2020-04-07 12:12:54 +02:00