Merge branch 'xen-netback-netfront-multiqueue'

Wei Liu says: ==================== This is rebased version of Andrew's V8 patch series. The original cover letter: -------------------- xen-net{back, front}: Multiple transmit and receive queues This patch series implements multiple transmit and receive queues (i.e. multiple shared rings) for the xen virtual network interfaces. The series is split up as follows: - Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in preparation for multi-queue support. See the patch itself for more details. - Patches 2 and 4 factor out the queue-specific data for netback and netfront respectively, and modify the rest of the code to use these as appropriate. - Patches 3 and 5 introduce new XenStore keys to negotiate and use multiple shared rings and event channels, and code to connect these as appropriate. - Patch 6 documents the XenStore keys required for the new feature in include/xen/interface/io/netif.h All other transmit and receive processing remains unchanged, i.e. there is a kthread per queue and a NAPI context per queue. The performance of these patches has been analysed in detail, with results available at: http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing To summarise: * Using multiple queues allows a VM to transmit at line rate on a 10 Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s with a single queue. * For intra-host VM--VM traffic, eight queues provide 171% of the throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s. * There is a corresponding increase in total CPU usage, i.e. this is a scaling out over available resources, not an efficiency improvement. * Results depend on the availability of sufficient CPUs, as well as the distribution of interrupts and the distribution of TCP streams across the queues. Queue selection is currently achieved via an L4 hash on the packet (i.e. TCP src/dst port, IP src/dst address) and is not negotiated between the frontend and backend, since only one option exists. Future patches to support other frontends (particularly Windows) will need to add some capability to negotiate not only the hash algorithm selection, but also allow the frontend to specify some parameters to this. Note that queue selection is a decision by the transmitting system about which queue to use for a particular packet. In general, the algorithm may differ between the frontend and the backend with no adverse effects. Queue-specific XenStore entries for ring references and event channels are stored hierarchically, i.e. under .../queue-N/... where N varies from 0 to one less than the requested number of queues (inclusive). If only one queue is requested, it falls back to the flat structure where the ring references and event channels are written at the same level as other vif information. V8: - Squash the queue error handling code into patch 3. - Update the documentation (patch 6) according to comments on the equivalent patch to Xen. V7: - Rebase on latest net-next, which includes the netback grant mapping patch series from Zoltan Kiss - Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0' - Simplify the queue hashing by using (hash % num_queues) instead of multiply & shift. - Add ratelimited warning for invalid queue selection. - Fix error handling to correctly tear down already setup queues. - Use dev->real_num_tx_queues instead of separately maintaining a count of the number of queues. V6: - Use 'max_queues' as the module param. name for both netback and netfront. V5: - Fix bug in xenvif_free() that could lead to an attempt to transmit an skb after the queue structures had been freed. - Improve the XenStore protocol documentation in netif.h. - Fix IRQ_NAME_SIZE double-accounting for null terminator. - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue). - Don't initialise a local variable that is set in both branches (xspath). V4: - Add MODULE_PARM_DESC() for the multi-queue parameters for netback and netfront modules. - Move del_timer_sync() in netfront to after unregister_netdev, which restores the order in which these functions were called before applying these patches. V3: - Further indentation and style fixups. V2: - Rebase onto net-next. - Change queue->number to queue->id. - Add atomic operations around the small number of stats variables that are not queue-specific or per-cpu. - Fixup formatting and style issues. - XenStore protocol changes documented in netif.h. - Default max. number of queues to num_online_cpus(). - Check requested number of queues does not exceed maximum. -------------------- I rebased this on top of net-next. No functional change is introduced. The patch that needed some extra care was "xen-netback: Factor queue-specific data into queue struct" because it clashed with a fix introduced in net. A simple test of creating guest, iperf, then shutting down guest worked as expected. The last patch fixes a minor problem that queue name is not initialised in xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt. Changes since v9 (no functional change introduced): * include commit summary in the commit message of first patch * fold David Vrabel's Reviewed-by into last patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 14:48:30 -07:00 · 2014-06-04 14:48:30 -07:00 · 9ab89acc86
parent 9bcc14d239 8b715010aa
commit 9ab89acc86
6 changed files with 1695 additions and 1038 deletions
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@ -99,22 +99,43 @@ struct xenvif_rx_meta {
 */
 #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN

-struct xenvif {
-	/* Unique identifier for this interface. */
-	domid_t          domid;
-	unsigned int     handle;
+/* Queue name is interface name with "-qNNN" appended */
+#define QUEUE_NAME_SIZE (IFNAMSIZ + 5)

-	/* Is this interface disabled? True when backend discovers
-	 * frontend is rogue.
+/* IRQ name is queue name with "-tx" or "-rx" appended */
+#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
+
+struct xenvif;
+
+struct xenvif_stats {
+	/* Stats fields to be updated per-queue.
+	 * A subset of struct net_device_stats that contains only the
+	 * fields that are updated in netback.c for each queue.
 	 */
-	bool disabled;
+	unsigned int rx_bytes;
+	unsigned int rx_packets;
+	unsigned int tx_bytes;
+	unsigned int tx_packets;
+
+	/* Additional stats used by xenvif */
+	unsigned long rx_gso_checksum_fixup;
+	unsigned long tx_zerocopy_sent;
+	unsigned long tx_zerocopy_success;
+	unsigned long tx_zerocopy_fail;
+	unsigned long tx_frag_overflow;
+};
+
+struct xenvif_queue { /* Per-queue data for xenvif */
+	unsigned int id; /* Queue ID, 0-based */
+	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
+	struct xenvif *vif; /* Parent VIF */

 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int tx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
+	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
 	struct xen_netif_tx_back_ring tx;
 	struct sk_buff_head tx_queue;
 	struct page *mmap_pages[MAX_PENDING_REQS];
@ -150,7 +171,7 @@ struct xenvif {
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int rx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
+	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
 	RING_IDX rx_last_skb_slots;
@ -158,14 +179,29 @@ struct xenvif {

 	struct timer_list wake_queue;

-	/* This array is allocated seperately as it is large */
-	struct gnttab_copy *grant_copy_op;
+	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];

 	/* We create one meta structure per ring request we consume, so
 	 * the maximum number is the same as the ring size.
 	 */
 	struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE];

+	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
+	unsigned long   credit_bytes;
+	unsigned long   credit_usec;
+	unsigned long   remaining_credit;
+	struct timer_list credit_timeout;
+	u64 credit_window_start;
+
+	/* Statistics */
+	struct xenvif_stats stats;
+};
+
+struct xenvif {
+	/* Unique identifier for this interface. */
+	domid_t          domid;
+	unsigned int     handle;
+
 	u8               fe_dev_addr[6];

 	/* Frontend feature information. */
@ -179,19 +215,13 @@ struct xenvif {
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */

-	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
-	unsigned long   credit_bytes;
-	unsigned long   credit_usec;
-	unsigned long   remaining_credit;
-	struct timer_list credit_timeout;
-	u64 credit_window_start;
+	/* Is this interface disabled? True when backend discovers
+	 * frontend is rogue.
+	 */
+	bool disabled;

-	/* Statistics */
-	unsigned long rx_gso_checksum_fixup;
-	unsigned long tx_zerocopy_sent;
-	unsigned long tx_zerocopy_success;
-	unsigned long tx_zerocopy_fail;
-	unsigned long tx_frag_overflow;
+	/* Queues */
+	struct xenvif_queue *queues;

 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
@ -206,7 +236,10 @@ struct xenvif *xenvif_alloc(struct device *parent,
 			    domid_t domid,
 			    unsigned int handle);

-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
+int xenvif_init_queue(struct xenvif_queue *queue);
+void xenvif_deinit_queue(struct xenvif_queue *queue);
+
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn);
 void xenvif_disconnect(struct xenvif *vif);
@ -217,44 +250,47 @@ void xenvif_xenbus_fini(void);

 int xenvif_schedulable(struct xenvif *vif);

-int xenvif_must_stop_queue(struct xenvif *vif);
+int xenvif_must_stop_queue(struct xenvif_queue *queue);
+
+int xenvif_queue_stopped(struct xenvif_queue *queue);
+void xenvif_wake_queue(struct xenvif_queue *queue);

 /* (Un)Map communication rings. */
-void xenvif_unmap_frontend_rings(struct xenvif *vif);
-int xenvif_map_frontend_rings(struct xenvif *vif,
+void xenvif_unmap_frontend_rings(struct xenvif_queue *queue);
+int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 			      grant_ref_t tx_ring_ref,
 			      grant_ref_t rx_ring_ref);

 /* Check for SKBs from frontend and schedule backend processing */
-void xenvif_napi_schedule_or_enable_events(struct xenvif *vif);
+void xenvif_napi_schedule_or_enable_events(struct xenvif_queue *queue);

 /* Prevent the device from generating any further traffic. */
 void xenvif_carrier_off(struct xenvif *vif);

-int xenvif_tx_action(struct xenvif *vif, int budget);
+int xenvif_tx_action(struct xenvif_queue *queue, int budget);

 int xenvif_kthread_guest_rx(void *data);
-void xenvif_kick_thread(struct xenvif *vif);
+void xenvif_kick_thread(struct xenvif_queue *queue);

 int xenvif_dealloc_kthread(void *data);

 /* Determine whether the needed number of slots (req) are available,
 * and set req_event if not.
 */
-bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
+bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);

-void xenvif_stop_queue(struct xenvif *vif);
+void xenvif_carrier_on(struct xenvif *vif);

 /* Callback from stack when TX packet can be released */
 void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);

 /* Unmap a pending page and release it back to the guest */
-void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
+void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);

-static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
+static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
 	return MAX_PENDING_REQS -
-		vif->pending_prod + vif->pending_cons;
+		queue->pending_prod + queue->pending_cons;
 }

 /* Callback from stack when TX packet can be released */
@ -264,5 +300,6 @@ extern bool separate_tx_rx_irq;

 extern unsigned int rx_drain_timeout_msecs;
 extern unsigned int rx_drain_timeout_jiffies;
+extern unsigned int xenvif_max_queues;

 #endif /* __XEN_NETBACK__COMMON_H__ */
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@ -34,7 +34,6 @@
 #include <linux/ethtool.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
-#include <linux/vmalloc.h>

 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
@ -43,6 +42,16 @@
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64

+static inline void xenvif_stop_queue(struct xenvif_queue *queue)
+{
+	struct net_device *dev = queue->vif->dev;
+
+	if (!queue->vif->can_queue)
+		return;
+
+	netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
+}
+
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@ -50,33 +59,34 @@ int xenvif_schedulable(struct xenvif *vif)

 static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;

-	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
-		napi_schedule(&vif->napi);
+	if (RING_HAS_UNCONSUMED_REQUESTS(&queue->tx))
+		napi_schedule(&queue->napi);

 	return IRQ_HANDLED;
 }

-static int xenvif_poll(struct napi_struct *napi, int budget)
+int xenvif_poll(struct napi_struct *napi, int budget)
 {
-	struct xenvif *vif = container_of(napi, struct xenvif, napi);
+	struct xenvif_queue *queue =
+		container_of(napi, struct xenvif_queue, napi);
 	int work_done;

 	/* This vif is rogue, we pretend we've there is nothing to do
 	 * for this vif to deschedule it from NAPI. But this interface
 	 * will be turned off in thread context later.
 	 */
-	if (unlikely(vif->disabled)) {
+	if (unlikely(queue->vif->disabled)) {
 		napi_complete(napi);
 		return 0;
 	}

-	work_done = xenvif_tx_action(vif, budget);
+	work_done = xenvif_tx_action(queue, budget);

 	if (work_done < budget) {
 		napi_complete(napi);
-		xenvif_napi_schedule_or_enable_events(vif);
+		xenvif_napi_schedule_or_enable_events(queue);
 	}

 	return work_done;
@ -84,9 +94,9 @@ static int xenvif_poll(struct napi_struct *napi, int budget)

 static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;

-	xenvif_kick_thread(vif);
+	xenvif_kick_thread(queue);

 	return IRQ_HANDLED;
 }
@ -99,28 +109,80 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }

-static void xenvif_wake_queue(unsigned long data)
+int xenvif_queue_stopped(struct xenvif_queue *queue)
 {
-	struct xenvif *vif = (struct xenvif *)data;
+	struct net_device *dev = queue->vif->dev;
+	unsigned int id = queue->id;
+	return netif_tx_queue_stopped(netdev_get_tx_queue(dev, id));
+}

-	if (netif_queue_stopped(vif->dev)) {
-		netdev_err(vif->dev, "draining TX queue\n");
-		vif->rx_queue_purge = true;
-		xenvif_kick_thread(vif);
-		netif_wake_queue(vif->dev);
+void xenvif_wake_queue(struct xenvif_queue *queue)
+{
+	struct net_device *dev = queue->vif->dev;
+	unsigned int id = queue->id;
+	netif_tx_wake_queue(netdev_get_tx_queue(dev, id));
+}
+
+/* Callback to wake the queue and drain it on timeout */
+static void xenvif_wake_queue_callback(unsigned long data)
+{
+	struct xenvif_queue *queue = (struct xenvif_queue *)data;
+
+	if (xenvif_queue_stopped(queue)) {
+		netdev_err(queue->vif->dev, "draining TX queue\n");
+		queue->rx_queue_purge = true;
+		xenvif_kick_thread(queue);
+		xenvif_wake_queue(queue);
 	}
 }

+static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
+			       void *accel_priv, select_queue_fallback_t fallback)
+{
+	unsigned int num_queues = dev->real_num_tx_queues;
+	u32 hash;
+	u16 queue_index;
+
+	/* First, check if there is only one queue to optimise the
+	 * single-queue or old frontend scenario.
+	 */
+	if (num_queues == 1) {
+		queue_index = 0;
+	} else {
+		/* Use skb_get_hash to obtain an L4 hash if available */
+		hash = skb_get_hash(skb);
+		queue_index = hash % num_queues;
+	}
+
+	return queue_index;
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = dev->real_num_tx_queues;
+	u16 index;
 	int min_slots_needed;

 	BUG_ON(skb->dev != dev);

-	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL ||
-	    vif->dealloc_task == NULL ||
+	/* Drop the packet if queues are not set up */
+	if (num_queues < 1)
+		goto drop;
+
+	/* Obtain the queue to be used to transmit this packet */
+	index = skb_get_queue_mapping(skb);
+	if (index >= num_queues) {
+		pr_warn_ratelimited("Invalid queue %hu for packet on interface %s\n.",
+				    index, vif->dev->name);
+		index %= num_queues;
+	}
+	queue = &vif->queues[index];
+
+	/* Drop the packet if queue is not ready */
+	if (queue->task == NULL ||
+	    queue->dealloc_task == NULL ||
 	    !xenvif_schedulable(vif))
 		goto drop;

@ -139,16 +201,16 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
-		vif->wake_queue.function = xenvif_wake_queue;
-		vif->wake_queue.data = (unsigned long)vif;
-		xenvif_stop_queue(vif);
-		mod_timer(&vif->wake_queue,
+	if (!xenvif_rx_ring_slots_available(queue, min_slots_needed)) {
+		queue->wake_queue.function = xenvif_wake_queue_callback;
+		queue->wake_queue.data = (unsigned long)queue;
+		xenvif_stop_queue(queue);
+		mod_timer(&queue->wake_queue,
 			jiffies + rx_drain_timeout_jiffies);
 	}

-	skb_queue_tail(&vif->rx_queue, skb);
-	xenvif_kick_thread(vif);
+	skb_queue_tail(&queue->rx_queue, skb);
+	xenvif_kick_thread(queue);

 	return NETDEV_TX_OK;

@ -161,25 +223,65 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = dev->real_num_tx_queues;
+	unsigned long rx_bytes = 0;
+	unsigned long rx_packets = 0;
+	unsigned long tx_bytes = 0;
+	unsigned long tx_packets = 0;
+	unsigned int index;
+
+	if (vif->queues == NULL)
+		goto out;
+
+	/* Aggregate tx and rx stats from each queue */
+	for (index = 0; index < num_queues; ++index) {
+		queue = &vif->queues[index];
+		rx_bytes += queue->stats.rx_bytes;
+		rx_packets += queue->stats.rx_packets;
+		tx_bytes += queue->stats.tx_bytes;
+		tx_packets += queue->stats.tx_packets;
+	}
+
+out:
+	vif->dev->stats.rx_bytes = rx_bytes;
+	vif->dev->stats.rx_packets = rx_packets;
+	vif->dev->stats.tx_bytes = tx_bytes;
+	vif->dev->stats.tx_packets = tx_packets;
+
 	return &vif->dev->stats;
 }

 static void xenvif_up(struct xenvif *vif)
 {
-	napi_enable(&vif->napi);
-	enable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		enable_irq(vif->rx_irq);
-	xenvif_napi_schedule_or_enable_events(vif);
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = vif->dev->real_num_tx_queues;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_enable(&queue->napi);
+		enable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			enable_irq(queue->rx_irq);
+		xenvif_napi_schedule_or_enable_events(queue);
+	}
 }

 static void xenvif_down(struct xenvif *vif)
 {
-	napi_disable(&vif->napi);
-	disable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		disable_irq(vif->rx_irq);
-	del_timer_sync(&vif->credit_timeout);
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = vif->dev->real_num_tx_queues;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_disable(&queue->napi);
+		disable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			disable_irq(queue->rx_irq);
+		del_timer_sync(&queue->credit_timeout);
+	}
 }

 static int xenvif_open(struct net_device *dev)
@ -187,7 +289,7 @@ static int xenvif_open(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_up(vif);
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 	return 0;
 }

@ -196,7 +298,7 @@ static int xenvif_close(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_down(vif);
-	netif_stop_queue(dev);
+	netif_tx_stop_all_queues(dev);
 	return 0;
 }

@ -236,29 +338,29 @@ static const struct xenvif_stat {
 } xenvif_stats[] = {
 	{
 		"rx_gso_checksum_fixup",
-		offsetof(struct xenvif, rx_gso_checksum_fixup)
+		offsetof(struct xenvif_stats, rx_gso_checksum_fixup)
 	},
 	/* If (sent != success + fail), there are probably packets never
 	 * freed up properly!
 	 */
 	{
 		"tx_zerocopy_sent",
-		offsetof(struct xenvif, tx_zerocopy_sent),
+		offsetof(struct xenvif_stats, tx_zerocopy_sent),
 	},
 	{
 		"tx_zerocopy_success",
-		offsetof(struct xenvif, tx_zerocopy_success),
+		offsetof(struct xenvif_stats, tx_zerocopy_success),
 	},
 	{
 		"tx_zerocopy_fail",
-		offsetof(struct xenvif, tx_zerocopy_fail)
+		offsetof(struct xenvif_stats, tx_zerocopy_fail)
 	},
 	/* Number of packets exceeding MAX_SKB_FRAG slots. You should use
 	 * a guest with the same MAX_SKB_FRAG
 	 */
 	{
 		"tx_frag_overflow",
-		offsetof(struct xenvif, tx_frag_overflow)
+		offsetof(struct xenvif_stats, tx_frag_overflow)
 	},
 };

@ -275,11 +377,20 @@ static int xenvif_get_sset_count(struct net_device *dev, int string_set)
 static void xenvif_get_ethtool_stats(struct net_device *dev,
 				     struct ethtool_stats *stats, u64 * data)
 {
-	void *vif = netdev_priv(dev);
+	struct xenvif *vif = netdev_priv(dev);
+	unsigned int num_queues = dev->real_num_tx_queues;
 	int i;
+	unsigned int queue_index;
+	struct xenvif_stats *vif_stats;

-	for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++)
-		data[i] = *(unsigned long *)(vif + xenvif_stats[i].offset);
+	for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
+		unsigned long accum = 0;
+		for (queue_index = 0; queue_index < num_queues; ++queue_index) {
+			vif_stats = &vif->queues[queue_index].stats;
+			accum += *(unsigned long *)(vif_stats + xenvif_stats[i].offset);
+		}
+		data[i] = accum;
+	}
 }

 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * data)
@ -312,6 +423,7 @@ static const struct net_device_ops xenvif_netdev_ops = {
 	.ndo_fix_features = xenvif_fix_features,
 	.ndo_set_mac_address = eth_mac_addr,
 	.ndo_validate_addr   = eth_validate_addr,
+	.ndo_select_queue = xenvif_select_queue,
 };

 struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
@ -321,10 +433,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	struct net_device *dev;
 	struct xenvif *vif;
 	char name[IFNAMSIZ] = {};
-	int i;

 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
-	dev = alloc_netdev(sizeof(struct xenvif), name, ether_setup);
+	/* Allocate a netdev with the max. supported number of queues.
+	 * When the guest selects the desired number, it will be updated
+	 * via netif_set_real_num_tx_queues().
+	 */
+	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup,
+			      xenvif_max_queues);
 	if (dev == NULL) {
 		pr_warn("Could not allocate netdev for %s\n", name);
 		return ERR_PTR(-ENOMEM);
@ -334,28 +450,18 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,

 	vif = netdev_priv(dev);

-	vif->grant_copy_op = vmalloc(sizeof(struct gnttab_copy) *
-				     MAX_GRANT_COPY_OPS);
-	if (vif->grant_copy_op == NULL) {
-		pr_warn("Could not allocate grant copy space for %s\n", name);
-		free_netdev(dev);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
 	vif->ip_csum = 1;
 	vif->dev = dev;
-
 	vif->disabled = false;

-	vif->credit_bytes = vif->remaining_credit = ~0UL;
-	vif->credit_usec  = 0UL;
-	init_timer(&vif->credit_timeout);
-	vif->credit_window_start = get_jiffies_64();
-
-	init_timer(&vif->wake_queue);
+	/* Start out with no queues. The call below does not require
+	 * rtnl_lock() as it happens before register_netdev().
+	 */
+	vif->queues = NULL;
+	netif_set_real_num_tx_queues(dev, 0);

 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
@ -366,34 +472,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,

 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;

-	skb_queue_head_init(&vif->rx_queue);
-	skb_queue_head_init(&vif->tx_queue);
-
-	vif->pending_cons = 0;
-	vif->pending_prod = MAX_PENDING_REQS;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->pending_ring[i] = i;
-	spin_lock_init(&vif->callback_lock);
-	spin_lock_init(&vif->response_lock);
-	/* If ballooning is disabled, this will consume real memory, so you
-	 * better enable it. The long term solution would be to use just a
-	 * bunch of valid page descriptors, without dependency on ballooning
-	 */
-	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
-				       vif->mmap_pages,
-				       false);
-	if (err) {
-		netdev_err(dev, "Could not reserve mmap_pages\n");
-		return ERR_PTR(-ENOMEM);
-	}
-	for (i = 0; i < MAX_PENDING_REQS; i++) {
-		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
-			{ .callback = xenvif_zerocopy_callback,
-			  .ctx = NULL,
-			  .desc = i };
-		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
-	}
-
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
 	 * largest non-broadcast address to prevent the address getting
@ -403,8 +481,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	memset(dev->dev_addr, 0xFF, ETH_ALEN);
 	dev->dev_addr[0] &= ~0x01;

-	netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);
-
 	netif_carrier_off(dev);

 	err = register_netdev(dev);
@ -421,76 +497,56 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }

-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
-		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
-		   unsigned int rx_evtchn)
+int xenvif_init_queue(struct xenvif_queue *queue)
 {
-	struct task_struct *task;
-	int err = -ENOMEM;
+	int err, i;

-	BUG_ON(vif->tx_irq);
-	BUG_ON(vif->task);
-	BUG_ON(vif->dealloc_task);
+	queue->credit_bytes = queue->remaining_credit = ~0UL;
+	queue->credit_usec  = 0UL;
+	init_timer(&queue->credit_timeout);
+	queue->credit_window_start = get_jiffies_64();

-	err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
-	if (err < 0)
-		goto err;
+	skb_queue_head_init(&queue->rx_queue);
+	skb_queue_head_init(&queue->tx_queue);

-	init_waitqueue_head(&vif->wq);
-	init_waitqueue_head(&vif->dealloc_wq);
+	queue->pending_cons = 0;
+	queue->pending_prod = MAX_PENDING_REQS;
+	for (i = 0; i < MAX_PENDING_REQS; ++i)
+		queue->pending_ring[i] = i;

-	if (tx_evtchn == rx_evtchn) {
-		/* feature-split-event-channels == 0 */
-		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_interrupt, 0,
-			vif->dev->name, vif);
-		if (err < 0)
-			goto err_unmap;
-		vif->tx_irq = vif->rx_irq = err;
-		disable_irq(vif->tx_irq);
-	} else {
-		/* feature-split-event-channels == 1 */
-		snprintf(vif->tx_irq_name, sizeof(vif->tx_irq_name),
-			 "%s-tx", vif->dev->name);
-		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
-			vif->tx_irq_name, vif);
-		if (err < 0)
-			goto err_unmap;
-		vif->tx_irq = err;
-		disable_irq(vif->tx_irq);
+	spin_lock_init(&queue->callback_lock);
+	spin_lock_init(&queue->response_lock);

-		snprintf(vif->rx_irq_name, sizeof(vif->rx_irq_name),
-			 "%s-rx", vif->dev->name);
-		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
-			vif->rx_irq_name, vif);
-		if (err < 0)
-			goto err_tx_unbind;
-		vif->rx_irq = err;
-		disable_irq(vif->rx_irq);
+	/* If ballooning is disabled, this will consume real memory, so you
+	 * better enable it. The long term solution would be to use just a
+	 * bunch of valid page descriptors, without dependency on ballooning
+	 */
+	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
+				       queue->mmap_pages,
+				       false);
+	if (err) {
+		netdev_err(queue->vif->dev, "Could not reserve mmap_pages\n");
+		return -ENOMEM;
 	}

-	task = kthread_create(xenvif_kthread_guest_rx,
-			      (void *)vif, "%s-guest-rx", vif->dev->name);
-	if (IS_ERR(task)) {
-		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
-		err = PTR_ERR(task);
-		goto err_rx_unbind;
+	for (i = 0; i < MAX_PENDING_REQS; i++) {
+		queue->pending_tx_info[i].callback_struct = (struct ubuf_info)
+			{ .callback = xenvif_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = i };
+		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
 	}

-	vif->task = task;
+	init_timer(&queue->wake_queue);

-	task = kthread_create(xenvif_dealloc_kthread,
-			      (void *)vif, "%s-dealloc", vif->dev->name);
-	if (IS_ERR(task)) {
-		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
-		err = PTR_ERR(task);
-		goto err_rx_unbind;
-	}
+	netif_napi_add(queue->vif->dev, &queue->napi, xenvif_poll,
+			XENVIF_NAPI_WEIGHT);

-	vif->dealloc_task = task;
+	return 0;
+}

+void xenvif_carrier_on(struct xenvif *vif)
+{
 	rtnl_lock();
 	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
 		dev_set_mtu(vif->dev, ETH_DATA_LEN);
@ -499,20 +555,89 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	if (netif_running(vif->dev))
 		xenvif_up(vif);
 	rtnl_unlock();
+}

-	wake_up_process(vif->task);
-	wake_up_process(vif->dealloc_task);
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
+		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
+		   unsigned int rx_evtchn)
+{
+	struct task_struct *task;
+	int err = -ENOMEM;
+
+	BUG_ON(queue->tx_irq);
+	BUG_ON(queue->task);
+	BUG_ON(queue->dealloc_task);
+
+	err = xenvif_map_frontend_rings(queue, tx_ring_ref, rx_ring_ref);
+	if (err < 0)
+		goto err;
+
+	init_waitqueue_head(&queue->wq);
+	init_waitqueue_head(&queue->dealloc_wq);
+
+	if (tx_evtchn == rx_evtchn) {
+		/* feature-split-event-channels == 0 */
+		err = bind_interdomain_evtchn_to_irqhandler(
+			queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
+			queue->name, queue);
+		if (err < 0)
+			goto err_unmap;
+		queue->tx_irq = queue->rx_irq = err;
+		disable_irq(queue->tx_irq);
+	} else {
+		/* feature-split-event-channels == 1 */
+		snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
+			 "%s-tx", queue->name);
+		err = bind_interdomain_evtchn_to_irqhandler(
+			queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
+			queue->tx_irq_name, queue);
+		if (err < 0)
+			goto err_unmap;
+		queue->tx_irq = err;
+		disable_irq(queue->tx_irq);
+
+		snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
+			 "%s-rx", queue->name);
+		err = bind_interdomain_evtchn_to_irqhandler(
+			queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
+			queue->rx_irq_name, queue);
+		if (err < 0)
+			goto err_tx_unbind;
+		queue->rx_irq = err;
+		disable_irq(queue->rx_irq);
+	}
+
+	task = kthread_create(xenvif_kthread_guest_rx,
+			      (void *)queue, "%s-guest-rx", queue->name);
+	if (IS_ERR(task)) {
+		pr_warn("Could not allocate kthread for %s\n", queue->name);
+		err = PTR_ERR(task);
+		goto err_rx_unbind;
+	}
+	queue->task = task;
+
+	task = kthread_create(xenvif_dealloc_kthread,
+			      (void *)queue, "%s-dealloc", queue->name);
+	if (IS_ERR(task)) {
+		pr_warn("Could not allocate kthread for %s\n", queue->name);
+		err = PTR_ERR(task);
+		goto err_rx_unbind;
+	}
+	queue->dealloc_task = task;
+
+	wake_up_process(queue->task);
+	wake_up_process(queue->dealloc_task);

 	return 0;

 err_rx_unbind:
-	unbind_from_irqhandler(vif->rx_irq, vif);
-	vif->rx_irq = 0;
+	unbind_from_irqhandler(queue->rx_irq, queue);
+	queue->rx_irq = 0;
 err_tx_unbind:
-	unbind_from_irqhandler(vif->tx_irq, vif);
-	vif->tx_irq = 0;
+	unbind_from_irqhandler(queue->tx_irq, queue);
+	queue->tx_irq = 0;
 err_unmap:
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(queue);
 err:
 	module_put(THIS_MODULE);
 	return err;
@ -529,38 +654,77 @@ void xenvif_carrier_off(struct xenvif *vif)
 	rtnl_unlock();
 }

+static void xenvif_wait_unmap_timeout(struct xenvif_queue *queue,
+				      unsigned int worst_case_skb_lifetime)
+{
+	int i, unmap_timeout = 0;
+
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		if (queue->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
+			unmap_timeout++;
+			schedule_timeout(msecs_to_jiffies(1000));
+			if (unmap_timeout > worst_case_skb_lifetime &&
+			    net_ratelimit())
+				netdev_err(queue->vif->dev,
+					   "Page still granted! Index: %x\n",
+					   i);
+			i = -1;
+		}
+	}
+}
+
 void xenvif_disconnect(struct xenvif *vif)
 {
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = vif->dev->real_num_tx_queues;
+	unsigned int queue_index;
+
 	if (netif_carrier_ok(vif->dev))
 		xenvif_carrier_off(vif);

-	if (vif->task) {
-		del_timer_sync(&vif->wake_queue);
-		kthread_stop(vif->task);
-		vif->task = NULL;
-	}
+	for (queue_index = 0; queue_index < num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];

-	if (vif->dealloc_task) {
-		kthread_stop(vif->dealloc_task);
-		vif->dealloc_task = NULL;
-	}
-
-	if (vif->tx_irq) {
-		if (vif->tx_irq == vif->rx_irq)
-			unbind_from_irqhandler(vif->tx_irq, vif);
-		else {
-			unbind_from_irqhandler(vif->tx_irq, vif);
-			unbind_from_irqhandler(vif->rx_irq, vif);
+		if (queue->task) {
+			del_timer_sync(&queue->wake_queue);
+			kthread_stop(queue->task);
+			queue->task = NULL;
 		}
-		vif->tx_irq = 0;
-	}

-	xenvif_unmap_frontend_rings(vif);
+		if (queue->dealloc_task) {
+			kthread_stop(queue->dealloc_task);
+			queue->dealloc_task = NULL;
+		}
+
+		if (queue->tx_irq) {
+			if (queue->tx_irq == queue->rx_irq)
+				unbind_from_irqhandler(queue->tx_irq, queue);
+			else {
+				unbind_from_irqhandler(queue->tx_irq, queue);
+				unbind_from_irqhandler(queue->rx_irq, queue);
+			}
+			queue->tx_irq = 0;
+		}
+
+		xenvif_unmap_frontend_rings(queue);
+	}
+}
+
+/* Reverse the relevant parts of xenvif_init_queue().
+ * Used for queue teardown from xenvif_free(), and on the
+ * error handling paths in xenbus.c:connect().
+ */
+void xenvif_deinit_queue(struct xenvif_queue *queue)
+{
+	free_xenballooned_pages(MAX_PENDING_REQS, queue->mmap_pages);
+	netif_napi_del(&queue->napi);
 }

 void xenvif_free(struct xenvif *vif)
 {
-	int i, unmap_timeout = 0;
+	struct xenvif_queue *queue = NULL;
+	unsigned int num_queues = vif->dev->real_num_tx_queues;
+	unsigned int queue_index;
 	/* Here we want to avoid timeout messages if an skb can be legitimately
 	 * stuck somewhere else. Realistically this could be an another vif's
 	 * internal or QDisc queue. That another vif also has this
@ -575,33 +739,21 @@ void xenvif_free(struct xenvif *vif)
 	unsigned int worst_case_skb_lifetime = (rx_drain_timeout_msecs/1000) *
 		DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS));

-	for (i = 0; i < MAX_PENDING_REQS; ++i) {
-		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
-			unmap_timeout++;
-			schedule_timeout(msecs_to_jiffies(1000));
-			if (unmap_timeout > worst_case_skb_lifetime &&
-			    net_ratelimit())
-				netdev_err(vif->dev,
-					   "Page still granted! Index: %x\n",
-					   i);
-			/* If there are still unmapped pages, reset the loop to
-			 * start checking again. We shouldn't exit here until
-			 * dealloc thread and NAPI instance release all the
-			 * pages. If a kernel bug causes the skbs to stall
-			 * somewhere, the interface cannot be brought down
-			 * properly.
-			 */
-			i = -1;
-		}
-	}
-
-	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
-
-	netif_napi_del(&vif->napi);
-
 	unregister_netdev(vif->dev);

-	vfree(vif->grant_copy_op);
+	for (queue_index = 0; queue_index < num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		xenvif_wait_unmap_timeout(queue, worst_case_skb_lifetime);
+		xenvif_deinit_queue(queue);
+	}
+
+	/* Free the array of queues. The call below does not require
+	 * rtnl_lock() because it happens after unregister_netdev().
+	 */
+	netif_set_real_num_tx_queues(vif->dev, 0);
+	vfree(vif->queues);
+	vif->queues = NULL;
+
 	free_netdev(vif->dev);

 	module_put(THIS_MODULE);
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@ -19,6 +19,8 @@
 */

 #include "common.h"
+#include <linux/vmalloc.h>
+#include <linux/rtnetlink.h>

 struct backend_info {
 	struct xenbus_device *dev;
@ -34,8 +36,9 @@ struct backend_info {
 	u8 have_hotplug_status_watch:1;
 };

-static int connect_rings(struct backend_info *);
-static void connect(struct backend_info *);
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue);
+static void connect(struct backend_info *be);
+static int read_xenbus_vif_flags(struct backend_info *be);
 static void backend_create_xenvif(struct backend_info *be);
 static void unregister_hotplug_status_watch(struct backend_info *be);
 static void set_backend_state(struct backend_info *be,
@ -157,6 +160,12 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		pr_debug("Error writing feature-split-event-channels\n");

+	/* Multi-queue support: This is an optional feature. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "multi-queue-max-queues", "%u", xenvif_max_queues);
+	if (err)
+		pr_debug("Error writing multi-queue-max-queues\n");
+
 	err = xenbus_switch_state(dev, XenbusStateInitWait);
 	if (err)
 		goto fail;
@ -485,10 +494,26 @@ static void connect(struct backend_info *be)
 {
 	int err;
 	struct xenbus_device *dev = be->dev;
+	unsigned long credit_bytes, credit_usec;
+	unsigned int queue_index;
+	unsigned int requested_num_queues;
+	struct xenvif_queue *queue;

-	err = connect_rings(be);
-	if (err)
+	/* Check whether the frontend requested multiple queues
+	 * and read the number requested.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			   "multi-queue-num-queues",
+			   "%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1; /* Fall back to single queue */
+	} else if (requested_num_queues > xenvif_max_queues) {
+		/* buggy or malicious guest */
+		xenbus_dev_fatal(dev, err,
+				 "guest requested %u queues, exceeding the maximum of %u.",
+				 requested_num_queues, xenvif_max_queues);
 		return;
+	}

 	err = xen_net_read_mac(dev, be->vif->fe_dev_addr);
 	if (err) {
@ -496,9 +521,54 @@ static void connect(struct backend_info *be)
 		return;
 	}

-	xen_net_read_rate(dev, &be->vif->credit_bytes,
-			  &be->vif->credit_usec);
-	be->vif->remaining_credit = be->vif->credit_bytes;
+	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
+	read_xenbus_vif_flags(be);
+
+	/* Use the number of queues requested by the frontend */
+	be->vif->queues = vzalloc(requested_num_queues *
+				  sizeof(struct xenvif_queue));
+	rtnl_lock();
+	netif_set_real_num_tx_queues(be->vif->dev, requested_num_queues);
+	rtnl_unlock();
+
+	for (queue_index = 0; queue_index < requested_num_queues; ++queue_index) {
+		queue = &be->vif->queues[queue_index];
+		queue->vif = be->vif;
+		queue->id = queue_index;
+		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
+				be->vif->dev->name, queue->id);
+
+		err = xenvif_init_queue(queue);
+		if (err) {
+			/* xenvif_init_queue() cleans up after itself on
+			 * failure, but we need to clean up any previously
+			 * initialised queues. Set num_queues to i so that
+			 * earlier queues can be destroyed using the regular
+			 * disconnect logic.
+			 */
+			rtnl_lock();
+			netif_set_real_num_tx_queues(be->vif->dev, queue_index);
+			rtnl_unlock();
+			goto err;
+		}
+
+		queue->remaining_credit = credit_bytes;
+
+		err = connect_rings(be, queue);
+		if (err) {
+			/* connect_rings() cleans up after itself on failure,
+			 * but we need to clean up after xenvif_init_queue() here,
+			 * and also clean up any previously initialised queues.
+			 */
+			xenvif_deinit_queue(queue);
+			rtnl_lock();
+			netif_set_real_num_tx_queues(be->vif->dev, queue_index);
+			rtnl_unlock();
+			goto err;
+		}
+	}
+
+	xenvif_carrier_on(be->vif);

 	unregister_hotplug_status_watch(be);
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
@ -507,45 +577,109 @@ static void connect(struct backend_info *be)
 	if (!err)
 		be->have_hotplug_status_watch = 1;

-	netif_wake_queue(be->vif->dev);
+	netif_tx_wake_all_queues(be->vif->dev);
+
+	return;
+
+err:
+	if (be->vif->dev->real_num_tx_queues > 0)
+		xenvif_disconnect(be->vif); /* Clean up existing queues */
+	vfree(be->vif->queues);
+	be->vif->queues = NULL;
+	rtnl_lock();
+	netif_set_real_num_tx_queues(be->vif->dev, 0);
+	rtnl_unlock();
+	return;
 }


-static int connect_rings(struct backend_info *be)
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 {
-	struct xenvif *vif = be->vif;
 	struct xenbus_device *dev = be->dev;
+	unsigned int num_queues = queue->vif->dev->real_num_tx_queues;
 	unsigned long tx_ring_ref, rx_ring_ref;
-	unsigned int tx_evtchn, rx_evtchn, rx_copy;
+	unsigned int tx_evtchn, rx_evtchn;
 	int err;
-	int val;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */

-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	/* If the frontend requested 1 queue, or we have fallen back
+	 * to single queue due to lack of frontend support for multi-
+	 * queue, expect the remaining XenStore keys in the toplevel
+	 * directory. Otherwise, expect them in a subdirectory called
+	 * queue-N.
+	 */
+	if (num_queues == 1) {
+		xspath = kzalloc(strlen(dev->otherend) + 1, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM,
+					 "reading ring references");
+			return -ENOMEM;
+		}
+		strcpy(xspath, dev->otherend);
+	} else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kzalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM,
+					 "reading ring references");
+			return -ENOMEM;
+		}
+		snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend,
+			 queue->id);
+	}
+
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "tx-ring-ref", "%lu", &tx_ring_ref,
 			    "rx-ring-ref", "%lu", &rx_ring_ref, NULL);
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "reading %s/ring-ref",
-				 dev->otherend);
-		return err;
+				 xspath);
+		goto err;
 	}

 	/* Try split event channels first, then single event channel. */
-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "event-channel-tx", "%u", &tx_evtchn,
 			    "event-channel-rx", "%u", &rx_evtchn, NULL);
 	if (err < 0) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend,
+		err = xenbus_scanf(XBT_NIL, xspath,
 				   "event-channel", "%u", &tx_evtchn);
 		if (err < 0) {
 			xenbus_dev_fatal(dev, err,
 					 "reading %s/event-channel(-tx/rx)",
-					 dev->otherend);
-			return err;
+					 xspath);
+			goto err;
 		}
 		rx_evtchn = tx_evtchn;
 	}

+	/* Map the shared frame, irq etc. */
+	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
+			     tx_evtchn, rx_evtchn);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "mapping shared-frames %lu/%lu port tx %u rx %u",
+				 tx_ring_ref, rx_ring_ref,
+				 tx_evtchn, rx_evtchn);
+		goto err;
+	}
+
+	err = 0;
+err: /* Regular return falls through with err == 0 */
+	kfree(xspath);
+	return err;
+}
+
+static int read_xenbus_vif_flags(struct backend_info *be)
+{
+	struct xenvif *vif = be->vif;
+	struct xenbus_device *dev = be->dev;
+	unsigned int rx_copy;
+	int err, val;
+
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
 			   &rx_copy);
 	if (err == -ENOENT) {
@ -621,16 +755,6 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->ipv6_csum = !!val;

-	/* Map the shared frame, irq etc. */
-	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
-			     tx_evtchn, rx_evtchn);
-	if (err) {
-		xenbus_dev_fatal(dev, err,
-				 "mapping shared-frames %lu/%lu port tx %u rx %u",
-				 tx_ring_ref, rx_ring_ref,
-				 tx_evtchn, rx_evtchn);
-		return err;
-	}
 	return 0;
 }

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@ -50,6 +50,59 @@
 * node as before.
 */

+/*
+ * Multiple transmit and receive queues:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vif, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use, which
+ * must be greater than zero, and no more than the value reported by the backend
+ * in "multi-queue-max-queues".
+ *
+ * Queues replicate the shared rings and event channels.
+ * "feature-split-event-channels" may optionally be used when using
+ * multiple queues, but is not mandatory.
+ *
+ * Each queue consists of one shared ring pair, i.e. there must be the same
+ * number of tx and rx rings.
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel (or event-channel-{tx,rx}) and {tx,rx}-ring-ref keys,
+ * instead writing those keys under sub-keys having the name "queue-N" where
+ * N is the integer ID of the queue for which those keys belong. Queues
+ * are indexed from zero. For example, a frontend with two queues and split
+ * event channels must write the following set of queue-related keys:
+ *
+ * /local/domain/1/device/vif/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vif/0/queue-0 = ""
+ * /local/domain/1/device/vif/0/queue-0/tx-ring-ref = "<ring-ref-tx0>"
+ * /local/domain/1/device/vif/0/queue-0/rx-ring-ref = "<ring-ref-rx0>"
+ * /local/domain/1/device/vif/0/queue-0/event-channel-tx = "<evtchn-tx0>"
+ * /local/domain/1/device/vif/0/queue-0/event-channel-rx = "<evtchn-rx0>"
+ * /local/domain/1/device/vif/0/queue-1 = ""
+ * /local/domain/1/device/vif/0/queue-1/tx-ring-ref = "<ring-ref-tx1>"
+ * /local/domain/1/device/vif/0/queue-1/rx-ring-ref = "<ring-ref-rx1"
+ * /local/domain/1/device/vif/0/queue-1/event-channel-tx = "<evtchn-tx1>"
+ * /local/domain/1/device/vif/0/queue-1/event-channel-rx = "<evtchn-rx1>"
+ *
+ * If there is any inconsistency in the XenStore data, the backend may
+ * choose not to connect any queues, instead treating the request as an
+ * error. This includes scenarios where more (or fewer) queues were
+ * requested than the frontend provided details for.
+ *
+ * Mapping of packets to queues is considered to be a function of the
+ * transmitting system (backend or frontend) and is not negotiated
+ * between the two. Guests are free to transmit packets on any queue
+ * they choose, provided it has been set up correctly. Guests must be
+ * prepared to receive packets on any queue they have requested be set up.
+ */
+
 /*
 * "feature-no-csum-offload" should be used to turn IPv4 TCP/UDP checksum
 * offload off or on. If it is missing then the feature is assumed to be on.