linux/net/sctp/sysctl.c

506 lines
12 KiB
C
Raw Normal View History

/* SCTP kernel implementation
* (C) Copyright IBM Corp. 2002, 2004
* Copyright (c) 2002 Intel Corp.
*
* This file is part of the SCTP kernel implementation
*
* Sysctl related interfaces for SCTP.
*
* This SCTP implementation is free software;
* you can redistribute it and/or modify it under the terms of
* the GNU General Public License as published by
* the Free Software Foundation; either version 2, or (at your option)
* any later version.
*
* This SCTP implementation is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; without even the implied
* ************************
* warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with GNU CC; see the file COPYING. If not, see
* <http://www.gnu.org/licenses/>.
*
* Please send any bug reports or fixes you make to the
* email address(es):
* lksctp developers <linux-sctp@vger.kernel.org>
*
* Written or modified by:
* Mingqin Liu <liuming@us.ibm.com>
* Jon Grimm <jgrimm@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
* Ryan Layer <rmlayer@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com>
*/
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <net/sctp/structs.h>
#include <net/sctp/sctp.h>
#include <linux/sysctl.h>
static int zero = 0;
static int one = 1;
static int timer_max = 86400000; /* ms in one day */
static int int_max = INT_MAX;
static int sack_timer_min = 1;
static int sack_timer_max = 500;
static int addr_scope_max = 3; /* check sctp_scope_policy_t in include/net/sctp/constants.h for max entries */
static int rwnd_scale_max = 16;
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int rto_alpha_min = 0;
static int rto_beta_min = 0;
static int rto_alpha_max = 1000;
static int rto_beta_max = 1000;
static unsigned long max_autoclose_min = 0;
static unsigned long max_autoclose_max =
(MAX_SCHEDULE_TIMEOUT / HZ > UINT_MAX)
? UINT_MAX : MAX_SCHEDULE_TIMEOUT / HZ;
extern long sysctl_sctp_mem[3];
extern int sysctl_sctp_rmem[3];
extern int sysctl_sctp_wmem[3];
static int proc_sctp_do_hmac_alg(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static struct ctl_table sctp_table[] = {
{
.procname = "sctp_mem",
.data = &sysctl_sctp_mem,
.maxlen = sizeof(sysctl_sctp_mem),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax
},
{
.procname = "sctp_rmem",
.data = &sysctl_sctp_rmem,
.maxlen = sizeof(sysctl_sctp_rmem),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "sctp_wmem",
.data = &sysctl_sctp_wmem,
.maxlen = sizeof(sysctl_sctp_wmem),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{ /* sentinel */ }
};
static struct ctl_table sctp_net_table[] = {
{
.procname = "rto_initial",
.data = &init_net.sctp.rto_initial,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &timer_max
},
{
.procname = "rto_min",
.data = &init_net.sctp.rto_min,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_sctp_do_rto_min,
.extra1 = &one,
.extra2 = &init_net.sctp.rto_max
},
{
.procname = "rto_max",
.data = &init_net.sctp.rto_max,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_sctp_do_rto_max,
.extra1 = &init_net.sctp.rto_min,
.extra2 = &timer_max
},
{
.procname = "rto_alpha_exp_divisor",
.data = &init_net.sctp.rto_alpha,
.maxlen = sizeof(int),
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
.mode = 0644,
.proc_handler = proc_sctp_do_alpha_beta,
.extra1 = &rto_alpha_min,
.extra2 = &rto_alpha_max,
},
{
.procname = "rto_beta_exp_divisor",
.data = &init_net.sctp.rto_beta,
.maxlen = sizeof(int),
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
.mode = 0644,
.proc_handler = proc_sctp_do_alpha_beta,
.extra1 = &rto_beta_min,
.extra2 = &rto_beta_max,
},
{
.procname = "max_burst",
.data = &init_net.sctp.max_burst,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &int_max
},
{
.procname = "cookie_preserve_enable",
.data = &init_net.sctp.cookie_preserve_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "cookie_hmac_alg",
.data = &init_net.sctp.sctp_hmac_alg,
.maxlen = 8,
.mode = 0644,
.proc_handler = proc_sctp_do_hmac_alg,
},
{
.procname = "valid_cookie_life",
.data = &init_net.sctp.valid_cookie_life,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &timer_max
},
{
.procname = "sack_timeout",
.data = &init_net.sctp.sack_timeout,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &sack_timer_min,
.extra2 = &sack_timer_max,
},
{
.procname = "hb_interval",
.data = &init_net.sctp.hb_interval,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &timer_max
},
{
.procname = "association_max_retrans",
.data = &init_net.sctp.max_retrans_association,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &int_max
},
{
.procname = "path_max_retrans",
.data = &init_net.sctp.max_retrans_path,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &int_max
},
{
.procname = "max_init_retransmits",
.data = &init_net.sctp.max_retrans_init,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &int_max
},
{
.procname = "pf_retrans",
.data = &init_net.sctp.pf_retrans,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &int_max
},
{
.procname = "sndbuf_policy",
.data = &init_net.sctp.sndbuf_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "rcvbuf_policy",
.data = &init_net.sctp.rcvbuf_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "default_auto_asconf",
.data = &init_net.sctp.default_auto_asconf,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "addip_enable",
.data = &init_net.sctp.addip_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "addip_noauth_enable",
.data = &init_net.sctp.addip_noauth,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "prsctp_enable",
.data = &init_net.sctp.prsctp_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "auth_enable",
.data = &init_net.sctp.auth_enable,
.maxlen = sizeof(int),
.mode = 0644,
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
.proc_handler = proc_sctp_do_auth,
},
{
.procname = "addr_scope_policy",
.data = &init_net.sctp.scope_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &addr_scope_max,
},
{
.procname = "rwnd_update_shift",
.data = &init_net.sctp.rwnd_upd_shift,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dointvec_minmax,
.extra1 = &one,
.extra2 = &rwnd_scale_max,
},
{
.procname = "max_autoclose",
.data = &init_net.sctp.max_autoclose,
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = &proc_doulongvec_minmax,
.extra1 = &max_autoclose_min,
.extra2 = &max_autoclose_max,
},
{ /* sentinel */ }
};
static int proc_sctp_do_hmac_alg(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
struct ctl_table tbl;
bool changed = false;
char *none = "none";
char tmp[8];
int ret;
memset(&tbl, 0, sizeof(struct ctl_table));
if (write) {
tbl.data = tmp;
tbl.maxlen = sizeof(tmp);
} else {
tbl.data = net->sctp.sctp_hmac_alg ? : none;
tbl.maxlen = strlen(tbl.data);
}
ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
#ifdef CONFIG_CRYPTO_MD5
if (!strncmp(tmp, "md5", 3)) {
net->sctp.sctp_hmac_alg = "md5";
changed = true;
}
#endif
#ifdef CONFIG_CRYPTO_SHA1
if (!strncmp(tmp, "sha1", 4)) {
net->sctp.sctp_hmac_alg = "sha1";
changed = true;
}
#endif
if (!strncmp(tmp, "none", 4)) {
net->sctp.sctp_hmac_alg = NULL;
changed = true;
}
if (!changed)
ret = -EINVAL;
}
return ret;
}
static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
int ret, new_value;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.rto_min;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
return -EINVAL;
net->sctp.rto_min = new_value;
}
return ret;
}
static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
int ret, new_value;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.rto_max;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
return -EINVAL;
net->sctp.rto_max = new_value;
}
return ret;
}
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
if (write)
pr_warn_once("Changing rto_alpha or rto_beta may lead to "
"suboptimal rtt/srtt estimations!\n");
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
return proc_dointvec_minmax(ctl, write, buffer, lenp, ppos);
}
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
struct ctl_table tbl;
int new_value, ret;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.auth_enable;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
struct sock *sk = net->sctp.ctl_sock;
net->sctp.auth_enable = new_value;
/* Update the value in the control socket */
lock_sock(sk);
sctp_sk(sk)->ep->auth_enable = new_value;
release_sock(sk);
}
return ret;
}
int sctp_sysctl_net_register(struct net *net)
{
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
struct ctl_table *table;
int i;
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
table = kmemdup(sctp_net_table, sizeof(sctp_net_table), GFP_KERNEL);
if (!table)
return -ENOMEM;
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
for (i = 0; table[i].data; i++)
table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
if (net->sctp.sysctl_header == NULL) {
kfree(table);
return -ENOMEM;
}
return 0;
}
void sctp_sysctl_net_unregister(struct net *net)
{
SCTP: Free the per-net sysctl table on net exit. v2 Per-net sysctl table needs to be explicitly freed at net exit. Otherwise we see the following with kmemleak: unreferenced object 0xffff880402d08000 (size 2048): comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h...... ....... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815b4aad>] kmemleak_alloc+0x21/0x3e [<ffffffff81110352>] slab_post_alloc_hook+0x28/0x2a [<ffffffff81113fad>] __kmalloc_track_caller+0xf1/0x104 [<ffffffff810f10c2>] kmemdup+0x1b/0x30 [<ffffffff81571e9f>] sctp_sysctl_net_register+0x1f/0x72 [<ffffffff8155d305>] sctp_net_init+0x100/0x39f [<ffffffff814ad53c>] ops_init+0xc6/0xf5 [<ffffffff814ad5b7>] setup_net+0x4c/0xd0 [<ffffffff814ada5e>] copy_net_ns+0x6d/0xd6 [<ffffffff810938b1>] create_new_namespaces+0xd7/0x147 [<ffffffff810939f4>] copy_namespaces+0x63/0x99 [<ffffffff81076733>] copy_process+0xa65/0x1233 [<ffffffff81077030>] do_fork+0x10b/0x271 [<ffffffff8100a0e9>] sys_clone+0x23/0x25 [<ffffffff815dda73>] stub_clone+0x13/0x20 [<ffffffffffffffff>] 0xffffffffffffffff I fixed the spelling of sysctl_header so the code actually compiles. -- EWB. Reported-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-25 00:02:47 +08:00
struct ctl_table *table;
table = net->sctp.sysctl_header->ctl_table_arg;
unregister_net_sysctl_table(net->sctp.sysctl_header);
SCTP: Free the per-net sysctl table on net exit. v2 Per-net sysctl table needs to be explicitly freed at net exit. Otherwise we see the following with kmemleak: unreferenced object 0xffff880402d08000 (size 2048): comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h...... ....... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815b4aad>] kmemleak_alloc+0x21/0x3e [<ffffffff81110352>] slab_post_alloc_hook+0x28/0x2a [<ffffffff81113fad>] __kmalloc_track_caller+0xf1/0x104 [<ffffffff810f10c2>] kmemdup+0x1b/0x30 [<ffffffff81571e9f>] sctp_sysctl_net_register+0x1f/0x72 [<ffffffff8155d305>] sctp_net_init+0x100/0x39f [<ffffffff814ad53c>] ops_init+0xc6/0xf5 [<ffffffff814ad5b7>] setup_net+0x4c/0xd0 [<ffffffff814ada5e>] copy_net_ns+0x6d/0xd6 [<ffffffff810938b1>] create_new_namespaces+0xd7/0x147 [<ffffffff810939f4>] copy_namespaces+0x63/0x99 [<ffffffff81076733>] copy_process+0xa65/0x1233 [<ffffffff81077030>] do_fork+0x10b/0x271 [<ffffffff8100a0e9>] sys_clone+0x23/0x25 [<ffffffff815dda73>] stub_clone+0x13/0x20 [<ffffffffffffffff>] 0xffffffffffffffff I fixed the spelling of sysctl_header so the code actually compiles. -- EWB. Reported-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-25 00:02:47 +08:00
kfree(table);
}
static struct ctl_table_header *sctp_sysctl_header;
/* Sysctl registration. */
void sctp_sysctl_register(void)
{
sctp_sysctl_header = register_net_sysctl(&init_net, "net/sctp", sctp_table);
}
/* Sysctl deregistration. */
void sctp_sysctl_unregister(void)
{
unregister_net_sysctl_table(sctp_sysctl_header);
}