2018-12-13 17:07:38 +08:00
|
|
|
# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
|
|
|
|
#
|
|
|
|
# system call numbers and entry vectors for mips
|
|
|
|
#
|
|
|
|
# The format is:
|
|
|
|
# <number> <abi> <name> <entry point> <compat entry point>
|
|
|
|
#
|
|
|
|
# The <abi> is always "n32" for this file.
|
|
|
|
#
|
|
|
|
0 n32 read sys_read
|
|
|
|
1 n32 write sys_write
|
|
|
|
2 n32 open sys_open
|
|
|
|
3 n32 close sys_close
|
|
|
|
4 n32 stat sys_newstat
|
|
|
|
5 n32 fstat sys_newfstat
|
|
|
|
6 n32 lstat sys_newlstat
|
|
|
|
7 n32 poll sys_poll
|
|
|
|
8 n32 lseek sys_lseek
|
|
|
|
9 n32 mmap sys_mips_mmap
|
|
|
|
10 n32 mprotect sys_mprotect
|
|
|
|
11 n32 munmap sys_munmap
|
|
|
|
12 n32 brk sys_brk
|
|
|
|
13 n32 rt_sigaction compat_sys_rt_sigaction
|
|
|
|
14 n32 rt_sigprocmask compat_sys_rt_sigprocmask
|
|
|
|
15 n32 ioctl compat_sys_ioctl
|
|
|
|
16 n32 pread64 sys_pread64
|
|
|
|
17 n32 pwrite64 sys_pwrite64
|
2020-09-25 12:51:43 +08:00
|
|
|
18 n32 readv sys_readv
|
|
|
|
19 n32 writev sys_writev
|
2018-12-13 17:07:38 +08:00
|
|
|
20 n32 access sys_access
|
|
|
|
21 n32 pipe sysm_pipe
|
|
|
|
22 n32 _newselect compat_sys_select
|
|
|
|
23 n32 sched_yield sys_sched_yield
|
|
|
|
24 n32 mremap sys_mremap
|
|
|
|
25 n32 msync sys_msync
|
|
|
|
26 n32 mincore sys_mincore
|
|
|
|
27 n32 madvise sys_madvise
|
|
|
|
28 n32 shmget sys_shmget
|
|
|
|
29 n32 shmat sys_shmat
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
30 n32 shmctl compat_sys_old_shmctl
|
2018-12-13 17:07:38 +08:00
|
|
|
31 n32 dup sys_dup
|
|
|
|
32 n32 dup2 sys_dup2
|
|
|
|
33 n32 pause sys_pause
|
2019-01-07 07:33:08 +08:00
|
|
|
34 n32 nanosleep sys_nanosleep_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
35 n32 getitimer compat_sys_getitimer
|
|
|
|
36 n32 setitimer compat_sys_setitimer
|
|
|
|
37 n32 alarm sys_alarm
|
|
|
|
38 n32 getpid sys_getpid
|
|
|
|
39 n32 sendfile compat_sys_sendfile
|
|
|
|
40 n32 socket sys_socket
|
|
|
|
41 n32 connect sys_connect
|
|
|
|
42 n32 accept sys_accept
|
|
|
|
43 n32 sendto sys_sendto
|
|
|
|
44 n32 recvfrom compat_sys_recvfrom
|
|
|
|
45 n32 sendmsg compat_sys_sendmsg
|
|
|
|
46 n32 recvmsg compat_sys_recvmsg
|
|
|
|
47 n32 shutdown sys_shutdown
|
|
|
|
48 n32 bind sys_bind
|
|
|
|
49 n32 listen sys_listen
|
|
|
|
50 n32 getsockname sys_getsockname
|
|
|
|
51 n32 getpeername sys_getpeername
|
|
|
|
52 n32 socketpair sys_socketpair
|
2020-07-17 14:23:15 +08:00
|
|
|
53 n32 setsockopt sys_setsockopt
|
|
|
|
54 n32 getsockopt sys_getsockopt
|
2018-12-13 17:07:38 +08:00
|
|
|
55 n32 clone __sys_clone
|
|
|
|
56 n32 fork __sys_fork
|
|
|
|
57 n32 execve compat_sys_execve
|
|
|
|
58 n32 exit sys_exit
|
|
|
|
59 n32 wait4 compat_sys_wait4
|
|
|
|
60 n32 kill sys_kill
|
|
|
|
61 n32 uname sys_newuname
|
|
|
|
62 n32 semget sys_semget
|
|
|
|
63 n32 semop sys_semop
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
64 n32 semctl compat_sys_old_semctl
|
2018-12-13 17:07:38 +08:00
|
|
|
65 n32 shmdt sys_shmdt
|
|
|
|
66 n32 msgget sys_msgget
|
|
|
|
67 n32 msgsnd compat_sys_msgsnd
|
|
|
|
68 n32 msgrcv compat_sys_msgrcv
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
69 n32 msgctl compat_sys_old_msgctl
|
2018-12-13 17:07:38 +08:00
|
|
|
70 n32 fcntl compat_sys_fcntl
|
|
|
|
71 n32 flock sys_flock
|
|
|
|
72 n32 fsync sys_fsync
|
|
|
|
73 n32 fdatasync sys_fdatasync
|
|
|
|
74 n32 truncate sys_truncate
|
|
|
|
75 n32 ftruncate sys_ftruncate
|
|
|
|
76 n32 getdents compat_sys_getdents
|
|
|
|
77 n32 getcwd sys_getcwd
|
|
|
|
78 n32 chdir sys_chdir
|
|
|
|
79 n32 fchdir sys_fchdir
|
|
|
|
80 n32 rename sys_rename
|
|
|
|
81 n32 mkdir sys_mkdir
|
|
|
|
82 n32 rmdir sys_rmdir
|
|
|
|
83 n32 creat sys_creat
|
|
|
|
84 n32 link sys_link
|
|
|
|
85 n32 unlink sys_unlink
|
|
|
|
86 n32 symlink sys_symlink
|
|
|
|
87 n32 readlink sys_readlink
|
|
|
|
88 n32 chmod sys_chmod
|
|
|
|
89 n32 fchmod sys_fchmod
|
|
|
|
90 n32 chown sys_chown
|
|
|
|
91 n32 fchown sys_fchown
|
|
|
|
92 n32 lchown sys_lchown
|
|
|
|
93 n32 umask sys_umask
|
|
|
|
94 n32 gettimeofday compat_sys_gettimeofday
|
|
|
|
95 n32 getrlimit compat_sys_getrlimit
|
|
|
|
96 n32 getrusage compat_sys_getrusage
|
|
|
|
97 n32 sysinfo compat_sys_sysinfo
|
|
|
|
98 n32 times compat_sys_times
|
|
|
|
99 n32 ptrace compat_sys_ptrace
|
|
|
|
100 n32 getuid sys_getuid
|
|
|
|
101 n32 syslog sys_syslog
|
|
|
|
102 n32 getgid sys_getgid
|
|
|
|
103 n32 setuid sys_setuid
|
|
|
|
104 n32 setgid sys_setgid
|
|
|
|
105 n32 geteuid sys_geteuid
|
|
|
|
106 n32 getegid sys_getegid
|
|
|
|
107 n32 setpgid sys_setpgid
|
|
|
|
108 n32 getppid sys_getppid
|
|
|
|
109 n32 getpgrp sys_getpgrp
|
|
|
|
110 n32 setsid sys_setsid
|
|
|
|
111 n32 setreuid sys_setreuid
|
|
|
|
112 n32 setregid sys_setregid
|
|
|
|
113 n32 getgroups sys_getgroups
|
|
|
|
114 n32 setgroups sys_setgroups
|
|
|
|
115 n32 setresuid sys_setresuid
|
|
|
|
116 n32 getresuid sys_getresuid
|
|
|
|
117 n32 setresgid sys_setresgid
|
|
|
|
118 n32 getresgid sys_getresgid
|
|
|
|
119 n32 getpgid sys_getpgid
|
|
|
|
120 n32 setfsuid sys_setfsuid
|
|
|
|
121 n32 setfsgid sys_setfsgid
|
|
|
|
122 n32 getsid sys_getsid
|
|
|
|
123 n32 capget sys_capget
|
|
|
|
124 n32 capset sys_capset
|
|
|
|
125 n32 rt_sigpending compat_sys_rt_sigpending
|
2019-01-07 07:33:08 +08:00
|
|
|
126 n32 rt_sigtimedwait compat_sys_rt_sigtimedwait_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
127 n32 rt_sigqueueinfo compat_sys_rt_sigqueueinfo
|
|
|
|
128 n32 rt_sigsuspend compat_sys_rt_sigsuspend
|
|
|
|
129 n32 sigaltstack compat_sys_sigaltstack
|
2019-01-07 07:33:08 +08:00
|
|
|
130 n32 utime sys_utime32
|
2018-12-13 17:07:38 +08:00
|
|
|
131 n32 mknod sys_mknod
|
|
|
|
132 n32 personality sys_32_personality
|
|
|
|
133 n32 ustat compat_sys_ustat
|
|
|
|
134 n32 statfs compat_sys_statfs
|
|
|
|
135 n32 fstatfs compat_sys_fstatfs
|
|
|
|
136 n32 sysfs sys_sysfs
|
|
|
|
137 n32 getpriority sys_getpriority
|
|
|
|
138 n32 setpriority sys_setpriority
|
|
|
|
139 n32 sched_setparam sys_sched_setparam
|
|
|
|
140 n32 sched_getparam sys_sched_getparam
|
|
|
|
141 n32 sched_setscheduler sys_sched_setscheduler
|
|
|
|
142 n32 sched_getscheduler sys_sched_getscheduler
|
|
|
|
143 n32 sched_get_priority_max sys_sched_get_priority_max
|
|
|
|
144 n32 sched_get_priority_min sys_sched_get_priority_min
|
2019-01-07 07:33:08 +08:00
|
|
|
145 n32 sched_rr_get_interval sys_sched_rr_get_interval_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
146 n32 mlock sys_mlock
|
|
|
|
147 n32 munlock sys_munlock
|
|
|
|
148 n32 mlockall sys_mlockall
|
|
|
|
149 n32 munlockall sys_munlockall
|
|
|
|
150 n32 vhangup sys_vhangup
|
|
|
|
151 n32 pivot_root sys_pivot_root
|
2020-08-15 08:31:07 +08:00
|
|
|
152 n32 _sysctl sys_ni_syscall
|
2018-12-13 17:07:38 +08:00
|
|
|
153 n32 prctl sys_prctl
|
2019-01-07 07:33:08 +08:00
|
|
|
154 n32 adjtimex sys_adjtimex_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
155 n32 setrlimit compat_sys_setrlimit
|
|
|
|
156 n32 chroot sys_chroot
|
|
|
|
157 n32 sync sys_sync
|
|
|
|
158 n32 acct sys_acct
|
|
|
|
159 n32 settimeofday compat_sys_settimeofday
|
2020-09-17 16:22:34 +08:00
|
|
|
160 n32 mount sys_mount
|
2018-12-13 17:07:38 +08:00
|
|
|
161 n32 umount2 sys_umount
|
|
|
|
162 n32 swapon sys_swapon
|
|
|
|
163 n32 swapoff sys_swapoff
|
|
|
|
164 n32 reboot sys_reboot
|
|
|
|
165 n32 sethostname sys_sethostname
|
|
|
|
166 n32 setdomainname sys_setdomainname
|
|
|
|
167 n32 create_module sys_ni_syscall
|
|
|
|
168 n32 init_module sys_init_module
|
|
|
|
169 n32 delete_module sys_delete_module
|
|
|
|
170 n32 get_kernel_syms sys_ni_syscall
|
|
|
|
171 n32 query_module sys_ni_syscall
|
|
|
|
172 n32 quotactl sys_quotactl
|
|
|
|
173 n32 nfsservctl sys_ni_syscall
|
|
|
|
174 n32 getpmsg sys_ni_syscall
|
|
|
|
175 n32 putpmsg sys_ni_syscall
|
|
|
|
176 n32 afs_syscall sys_ni_syscall
|
|
|
|
# 177 reserved for security
|
|
|
|
177 n32 reserved177 sys_ni_syscall
|
|
|
|
178 n32 gettid sys_gettid
|
|
|
|
179 n32 readahead sys_readahead
|
|
|
|
180 n32 setxattr sys_setxattr
|
|
|
|
181 n32 lsetxattr sys_lsetxattr
|
|
|
|
182 n32 fsetxattr sys_fsetxattr
|
|
|
|
183 n32 getxattr sys_getxattr
|
|
|
|
184 n32 lgetxattr sys_lgetxattr
|
|
|
|
185 n32 fgetxattr sys_fgetxattr
|
|
|
|
186 n32 listxattr sys_listxattr
|
|
|
|
187 n32 llistxattr sys_llistxattr
|
|
|
|
188 n32 flistxattr sys_flistxattr
|
|
|
|
189 n32 removexattr sys_removexattr
|
|
|
|
190 n32 lremovexattr sys_lremovexattr
|
|
|
|
191 n32 fremovexattr sys_fremovexattr
|
|
|
|
192 n32 tkill sys_tkill
|
|
|
|
193 n32 reserved193 sys_ni_syscall
|
2019-01-07 07:33:08 +08:00
|
|
|
194 n32 futex sys_futex_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
195 n32 sched_setaffinity compat_sys_sched_setaffinity
|
|
|
|
196 n32 sched_getaffinity compat_sys_sched_getaffinity
|
|
|
|
197 n32 cacheflush sys_cacheflush
|
|
|
|
198 n32 cachectl sys_cachectl
|
|
|
|
199 n32 sysmips __sys_sysmips
|
|
|
|
200 n32 io_setup compat_sys_io_setup
|
|
|
|
201 n32 io_destroy sys_io_destroy
|
2019-01-07 07:33:08 +08:00
|
|
|
202 n32 io_getevents sys_io_getevents_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
203 n32 io_submit compat_sys_io_submit
|
|
|
|
204 n32 io_cancel sys_io_cancel
|
|
|
|
205 n32 exit_group sys_exit_group
|
|
|
|
206 n32 lookup_dcookie sys_lookup_dcookie
|
|
|
|
207 n32 epoll_create sys_epoll_create
|
|
|
|
208 n32 epoll_ctl sys_epoll_ctl
|
|
|
|
209 n32 epoll_wait sys_epoll_wait
|
|
|
|
210 n32 remap_file_pages sys_remap_file_pages
|
|
|
|
211 n32 rt_sigreturn sysn32_rt_sigreturn
|
|
|
|
212 n32 fcntl64 compat_sys_fcntl64
|
|
|
|
213 n32 set_tid_address sys_set_tid_address
|
|
|
|
214 n32 restart_syscall sys_restart_syscall
|
2019-01-07 07:33:08 +08:00
|
|
|
215 n32 semtimedop sys_semtimedop_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
216 n32 fadvise64 sys_fadvise64_64
|
|
|
|
217 n32 statfs64 compat_sys_statfs64
|
|
|
|
218 n32 fstatfs64 compat_sys_fstatfs64
|
|
|
|
219 n32 sendfile64 sys_sendfile64
|
|
|
|
220 n32 timer_create compat_sys_timer_create
|
2019-01-07 07:33:08 +08:00
|
|
|
221 n32 timer_settime sys_timer_settime32
|
|
|
|
222 n32 timer_gettime sys_timer_gettime32
|
2018-12-13 17:07:38 +08:00
|
|
|
223 n32 timer_getoverrun sys_timer_getoverrun
|
|
|
|
224 n32 timer_delete sys_timer_delete
|
2019-01-07 07:33:08 +08:00
|
|
|
225 n32 clock_settime sys_clock_settime32
|
|
|
|
226 n32 clock_gettime sys_clock_gettime32
|
|
|
|
227 n32 clock_getres sys_clock_getres_time32
|
|
|
|
228 n32 clock_nanosleep sys_clock_nanosleep_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
229 n32 tgkill sys_tgkill
|
2019-01-07 07:33:08 +08:00
|
|
|
230 n32 utimes sys_utimes_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
231 n32 mbind compat_sys_mbind
|
|
|
|
232 n32 get_mempolicy compat_sys_get_mempolicy
|
|
|
|
233 n32 set_mempolicy compat_sys_set_mempolicy
|
|
|
|
234 n32 mq_open compat_sys_mq_open
|
|
|
|
235 n32 mq_unlink sys_mq_unlink
|
2019-01-07 07:33:08 +08:00
|
|
|
236 n32 mq_timedsend sys_mq_timedsend_time32
|
|
|
|
237 n32 mq_timedreceive sys_mq_timedreceive_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
238 n32 mq_notify compat_sys_mq_notify
|
|
|
|
239 n32 mq_getsetattr compat_sys_mq_getsetattr
|
|
|
|
240 n32 vserver sys_ni_syscall
|
|
|
|
241 n32 waitid compat_sys_waitid
|
|
|
|
# 242 was sys_setaltroot
|
|
|
|
243 n32 add_key sys_add_key
|
|
|
|
244 n32 request_key sys_request_key
|
|
|
|
245 n32 keyctl compat_sys_keyctl
|
|
|
|
246 n32 set_thread_area sys_set_thread_area
|
|
|
|
247 n32 inotify_init sys_inotify_init
|
|
|
|
248 n32 inotify_add_watch sys_inotify_add_watch
|
|
|
|
249 n32 inotify_rm_watch sys_inotify_rm_watch
|
|
|
|
250 n32 migrate_pages compat_sys_migrate_pages
|
|
|
|
251 n32 openat sys_openat
|
|
|
|
252 n32 mkdirat sys_mkdirat
|
|
|
|
253 n32 mknodat sys_mknodat
|
|
|
|
254 n32 fchownat sys_fchownat
|
2019-01-07 07:33:08 +08:00
|
|
|
255 n32 futimesat sys_futimesat_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
256 n32 newfstatat sys_newfstatat
|
|
|
|
257 n32 unlinkat sys_unlinkat
|
|
|
|
258 n32 renameat sys_renameat
|
|
|
|
259 n32 linkat sys_linkat
|
|
|
|
260 n32 symlinkat sys_symlinkat
|
|
|
|
261 n32 readlinkat sys_readlinkat
|
|
|
|
262 n32 fchmodat sys_fchmodat
|
|
|
|
263 n32 faccessat sys_faccessat
|
2019-01-07 07:33:08 +08:00
|
|
|
264 n32 pselect6 compat_sys_pselect6_time32
|
|
|
|
265 n32 ppoll compat_sys_ppoll_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
266 n32 unshare sys_unshare
|
|
|
|
267 n32 splice sys_splice
|
|
|
|
268 n32 sync_file_range sys_sync_file_range
|
|
|
|
269 n32 tee sys_tee
|
2020-09-25 12:51:44 +08:00
|
|
|
270 n32 vmsplice sys_vmsplice
|
2018-12-13 17:07:38 +08:00
|
|
|
271 n32 move_pages compat_sys_move_pages
|
|
|
|
272 n32 set_robust_list compat_sys_set_robust_list
|
|
|
|
273 n32 get_robust_list compat_sys_get_robust_list
|
|
|
|
274 n32 kexec_load compat_sys_kexec_load
|
|
|
|
275 n32 getcpu sys_getcpu
|
|
|
|
276 n32 epoll_pwait compat_sys_epoll_pwait
|
|
|
|
277 n32 ioprio_set sys_ioprio_set
|
|
|
|
278 n32 ioprio_get sys_ioprio_get
|
2019-01-07 07:33:08 +08:00
|
|
|
279 n32 utimensat sys_utimensat_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
280 n32 signalfd compat_sys_signalfd
|
|
|
|
281 n32 timerfd sys_ni_syscall
|
|
|
|
282 n32 eventfd sys_eventfd
|
|
|
|
283 n32 fallocate sys_fallocate
|
|
|
|
284 n32 timerfd_create sys_timerfd_create
|
2019-01-07 07:33:08 +08:00
|
|
|
285 n32 timerfd_gettime sys_timerfd_gettime32
|
|
|
|
286 n32 timerfd_settime sys_timerfd_settime32
|
2018-12-13 17:07:38 +08:00
|
|
|
287 n32 signalfd4 compat_sys_signalfd4
|
|
|
|
288 n32 eventfd2 sys_eventfd2
|
|
|
|
289 n32 epoll_create1 sys_epoll_create1
|
|
|
|
290 n32 dup3 sys_dup3
|
|
|
|
291 n32 pipe2 sys_pipe2
|
|
|
|
292 n32 inotify_init1 sys_inotify_init1
|
|
|
|
293 n32 preadv compat_sys_preadv
|
|
|
|
294 n32 pwritev compat_sys_pwritev
|
|
|
|
295 n32 rt_tgsigqueueinfo compat_sys_rt_tgsigqueueinfo
|
|
|
|
296 n32 perf_event_open sys_perf_event_open
|
|
|
|
297 n32 accept4 sys_accept4
|
2019-01-07 07:33:08 +08:00
|
|
|
298 n32 recvmmsg compat_sys_recvmmsg_time32
|
2018-12-13 17:07:38 +08:00
|
|
|
299 n32 getdents64 sys_getdents64
|
|
|
|
300 n32 fanotify_init sys_fanotify_init
|
|
|
|
301 n32 fanotify_mark sys_fanotify_mark
|
|
|
|
302 n32 prlimit64 sys_prlimit64
|
|
|
|
303 n32 name_to_handle_at sys_name_to_handle_at
|
|
|
|
304 n32 open_by_handle_at sys_open_by_handle_at
|
2019-01-07 07:33:08 +08:00
|
|
|
305 n32 clock_adjtime sys_clock_adjtime32
|
2018-12-13 17:07:38 +08:00
|
|
|
306 n32 syncfs sys_syncfs
|
|
|
|
307 n32 sendmmsg compat_sys_sendmmsg
|
|
|
|
308 n32 setns sys_setns
|
2020-09-25 12:51:45 +08:00
|
|
|
309 n32 process_vm_readv sys_process_vm_readv
|
|
|
|
310 n32 process_vm_writev sys_process_vm_writev
|
2018-12-13 17:07:38 +08:00
|
|
|
311 n32 kcmp sys_kcmp
|
|
|
|
312 n32 finit_module sys_finit_module
|
|
|
|
313 n32 sched_setattr sys_sched_setattr
|
|
|
|
314 n32 sched_getattr sys_sched_getattr
|
|
|
|
315 n32 renameat2 sys_renameat2
|
|
|
|
316 n32 seccomp sys_seccomp
|
|
|
|
317 n32 getrandom sys_getrandom
|
|
|
|
318 n32 memfd_create sys_memfd_create
|
|
|
|
319 n32 bpf sys_bpf
|
|
|
|
320 n32 execveat compat_sys_execveat
|
|
|
|
321 n32 userfaultfd sys_userfaultfd
|
|
|
|
322 n32 membarrier sys_membarrier
|
|
|
|
323 n32 mlock2 sys_mlock2
|
|
|
|
324 n32 copy_file_range sys_copy_file_range
|
|
|
|
325 n32 preadv2 compat_sys_preadv2
|
|
|
|
326 n32 pwritev2 compat_sys_pwritev2
|
|
|
|
327 n32 pkey_mprotect sys_pkey_mprotect
|
|
|
|
328 n32 pkey_alloc sys_pkey_alloc
|
|
|
|
329 n32 pkey_free sys_pkey_free
|
|
|
|
330 n32 statx sys_statx
|
|
|
|
331 n32 rseq sys_rseq
|
|
|
|
332 n32 io_pgetevents compat_sys_io_pgetevents
|
2019-01-10 19:45:11 +08:00
|
|
|
# 333 through 402 are unassigned to sync up with generic numbers
|
|
|
|
403 n32 clock_gettime64 sys_clock_gettime
|
|
|
|
404 n32 clock_settime64 sys_clock_settime
|
|
|
|
405 n32 clock_adjtime64 sys_clock_adjtime
|
|
|
|
406 n32 clock_getres_time64 sys_clock_getres
|
|
|
|
407 n32 clock_nanosleep_time64 sys_clock_nanosleep
|
|
|
|
408 n32 timer_gettime64 sys_timer_gettime
|
|
|
|
409 n32 timer_settime64 sys_timer_settime
|
|
|
|
410 n32 timerfd_gettime64 sys_timerfd_gettime
|
|
|
|
411 n32 timerfd_settime64 sys_timerfd_settime
|
|
|
|
412 n32 utimensat_time64 sys_utimensat
|
|
|
|
413 n32 pselect6_time64 compat_sys_pselect6_time64
|
|
|
|
414 n32 ppoll_time64 compat_sys_ppoll_time64
|
|
|
|
416 n32 io_pgetevents_time64 sys_io_pgetevents
|
|
|
|
417 n32 recvmmsg_time64 compat_sys_recvmmsg_time64
|
|
|
|
418 n32 mq_timedsend_time64 sys_mq_timedsend
|
|
|
|
419 n32 mq_timedreceive_time64 sys_mq_timedreceive
|
|
|
|
420 n32 semtimedop_time64 sys_semtimedop
|
|
|
|
421 n32 rt_sigtimedwait_time64 compat_sys_rt_sigtimedwait_time64
|
|
|
|
422 n32 futex_time64 sys_futex
|
|
|
|
423 n32 sched_rr_get_interval_time64 sys_sched_rr_get_interval
|
2019-02-28 20:59:19 +08:00
|
|
|
424 n32 pidfd_send_signal sys_pidfd_send_signal
|
|
|
|
425 n32 io_uring_setup sys_io_uring_setup
|
|
|
|
426 n32 io_uring_enter sys_io_uring_enter
|
|
|
|
427 n32 io_uring_register sys_io_uring_register
|
2019-05-16 19:52:34 +08:00
|
|
|
428 n32 open_tree sys_open_tree
|
|
|
|
429 n32 move_mount sys_move_mount
|
|
|
|
430 n32 fsopen sys_fsopen
|
|
|
|
431 n32 fsconfig sys_fsconfig
|
|
|
|
432 n32 fsmount sys_fsmount
|
|
|
|
433 n32 fspick sys_fspick
|
2019-05-24 18:44:59 +08:00
|
|
|
434 n32 pidfd_open sys_pidfd_open
|
2019-10-03 02:59:49 +08:00
|
|
|
435 n32 clone3 __sys_clone3
|
2019-05-24 17:31:44 +08:00
|
|
|
436 n32 close_range sys_close_range
|
open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb834 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 20:07:59 +08:00
|
|
|
437 n32 openat2 sys_openat2
|
2020-01-08 01:59:26 +08:00
|
|
|
438 n32 pidfd_getfd sys_pidfd_getfd
|
2020-05-14 22:44:25 +08:00
|
|
|
439 n32 faccessat2 sys_faccessat2
|
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.
The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.
To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).
Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.
ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.
I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.
So finally, the API is as follows,
ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);
DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.
The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)
The pointer iovec points to an array of iovec structures, defined in
<sys/uio.h> as:
struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};
The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).
The vlen represents the number of elements in iovec.
The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.
MADV_COLD
MADV_PAGEOUT
Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.
RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.
FAQ:
Q.1 - Why does any external entity have better knowledge?
Quote from Sandeep
"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.
After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.
In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.
So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.
Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.
So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.
- ssp
Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?
process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.
The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.
To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.
Q.3 - Why doesn't ptrace work?
Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.
[1] https://developer.android.com/topic/performance/memory"
[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224
[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/
[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 07:14:59 +08:00
|
|
|
440 n32 process_madvise sys_process_madvise
|
2020-12-20 17:58:30 +08:00
|
|
|
441 n32 epoll_pwait2 compat_sys_epoll_pwait2
|
fs: add mount_setattr()
This implements the missing mount_setattr() syscall. While the new mount
api allows to change the properties of a superblock there is currently
no way to change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on. In addition the old
mount api has the restriction that mount options cannot be applied
recursively. This hasn't changed since changing mount options on a
per-mount basis was implemented in [1] and has been a frequent request
not just for convenience but also for security reasons. The legacy
mount syscall is unable to accommodate this behavior without introducing
a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost
mount. Changing MS_REC to apply to the whole mount tree would mean
introducing a significant uapi change and would likely cause significant
regressions.
The new mount_setattr() syscall allows to recursively clear and set
mount options in one shot. Multiple calls to change mount options
requesting the same changes are idempotent:
int mount_setattr(int dfd, const char *path, unsigned flags,
struct mount_attr *uattr, size_t usize);
Flags to modify path resolution behavior are specified in the @flags
argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW,
and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to
restrict path resolution as introduced with openat2() might be supported
in the future.
The mount_setattr() syscall can be expected to grow over time and is
designed with extensibility in mind. It follows the extensible syscall
pattern we have used with other syscalls such as openat2(), clone3(),
sched_{set,get}attr(), and others.
The set of mount options is passed in the uapi struct mount_attr which
currently has the following layout:
struct mount_attr {
__u64 attr_set;
__u64 attr_clr;
__u64 propagation;
__u64 userns_fd;
};
The @attr_set and @attr_clr members are used to clear and set mount
options. This way a user can e.g. request that a set of flags is to be
raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in
@attr_set while at the same time requesting that another set of flags is
to be lowered such as removing noexec from a mount tree by specifying
MOUNT_ATTR_NOEXEC in @attr_clr.
Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0,
not a bitmap, users wanting to transition to a different atime setting
cannot simply specify the atime setting in @attr_set, but must also
specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that
MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set
can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in
@attr_clr.
The @propagation field lets callers specify the propagation type of a
mount tree. Propagation is a single property that has four different
settings and as such is not really a flag argument but an enum.
Specifically, it would be unclear what setting and clearing propagation
settings in combination would amount to. The legacy mount() syscall thus
forbids the combination of multiple propagation settings too. The goal
is to keep the semantics of mount propagation somewhat simple as they
are overly complex as it is.
The @userns_fd field lets user specify a user namespace whose idmapping
becomes the idmapping of the mount. This is implemented and explained in
detail in the next patch.
[1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")
Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21 21:19:53 +08:00
|
|
|
442 n32 mount_setattr sys_mount_setattr
|