diff --git a/docs/internals/rpc.html.in b/docs/internals/rpc.html.in new file mode 100644 index 0000000000..2c0c9836db --- /dev/null +++ b/docs/internals/rpc.html.in @@ -0,0 +1,875 @@ + +
++ libvirt includes a basic protocol and code to implement + an extensible, secure client/server RPC service. This was + originally designed for communication between the libvirt + client library and the libvirtd daemon, but the code is + now isolated to allow reuse in other areas of libvirt code. + This document provides an overview of the protocol and + structure / operation of the internal RPC library APIs. +
+ + +
+ libvirt uses a simple, variable length, packet based RPC protocol.
+ All structured data within packets is encoded using the
+ XDR standard
+ as currently defined by RFC 4506.
+ On any connection running the RPC protocol, there can be multiple
+ programs active, each supporting one or more versions. A program
+ defines a set of procedures that it supports. The procedures can
+ support call+reply method invocation, asynchronous events,
+ and generic data streams. Method invocations can be overlapped,
+ so waiting for a reply to one will not block the receipt of the
+ reply to another outstanding method. The protocol was loosely
+ inspired by the design of SunRPC. The definition of the RPC
+ protocol is in the file src/rpc/virnetprotocol.x
+ in the libvirt source tree.
+
+ On the wire, there is no explicit packet framing marker. Instead + each packet is preceded by an unsigned 32-bit integer giving + the total length of the packet in bytes. This length includes + the 4-bytes of the length word itself. Conceptually the framing + looks like this: +
+ ++|~~~ Packet 1 ~~~|~~~ Packet 2 ~~~|~~~ Packet 3 ~~~|~~~ + ++-------+------------+-------+------------+-------+------------+... +| n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | ++-------+------------+-------+------------+-------+------------+... + +|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ + ++ +
+ The data in each packet is split into two parts, a short + fixed length header, followed by a variable length payload. + So a packet from the illustration above is more correctly + shown as +
+ ++ ++-------+-------------+---------------....---+ +| n=U32 | 6*U32 | (n-(7*4))*U8 | ++-------+-------------+---------------....---+ + +|~ Len ~|~ Header ~|~ Payload .... ~| ++ + +
+ The header contains 6 fields, encoded as signed/unsigned 32-bit + integers. +
+ +++---------------+ +| program=U32 | ++---------------+ +| version=U32 | ++---------------+ +| procedure=S32 | ++---------------+ +| type=S32 | ++---------------+ +| serial=U32 | ++---------------+ +| status=S32 | ++---------------+ ++ +
program
version
procedure
type
+ This can be one of the following enumeration values +
+serial
status
+ This can one of the following enumeration values +
+
+ The payload of a packet will vary depending on the type
+ and status
fields from the header.
+
+ For the exact payload information for each procedure, consult the XDR protocol + definition for the program+version in question +
+ ++ The following diagrams illustrate some example packet exchanges + between a client and server +
+ ++ A single method call and successful + reply, for a program=8, version=1, procedure=3, which 10 bytes worth + of input args, and 4 bytes worth of return values. The overall input + packet length is 4 + 24 + 10 == 38, and output packet length 32 +
+ ++ +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ ++ +
+ An unsuccessful method call will instead return an error object +
+ ++ +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------------------------+ + C <-- |48| 8 | 1 | 3 | 2 | 1 | 0 | .o.oOo.o.oOo.o.oOo.o.oOo | <-- S (error) + +--+-----------------------+--------------------------+ ++ +
+ A method call which also involves uploading some data over + a stream will result in +
+ ++ +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ + + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + ... + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+ + C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish) + +--+-----------------------+ + +--+-----------------------+ + C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish) + +--+-----------------------+ ++ +
+ A method call which also involves a bi-directional stream will + result in +
+ ++ +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ + + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + .. + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+ + C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish) + +--+-----------------------+ + +--+-----------------------+ + C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish) + +--+-----------------------+ ++ + +
+ +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call 1) + +--+-----------------------+-----------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 2 | 0 | .o.oOo.o. | --> S (call 2) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 2 | 0 | .o.oOo | <-- S (reply 2) + +--+-----------------------+--------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 3 | 0 | .o.oOo.o. | --> S (call 3) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 3 | 0 | .o.oOo | <-- S (reply 3) + +--+-----------------------+--------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 4 | 0 | .o.oOo.o. | --> S (call 4) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply 1) + +--+-----------------------+--------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 4 | 0 | .o.oOo | <-- S (reply 4) + +--+-----------------------+--------+ ++ + +
+ There are various things to consider to ensure an implementation + of the RPC protocol can be satisfactorily secured +
+ ++ The basic RPC protocol does not define or require any specific + authentication/encryption capabilities. A generic solution to + providing encryption for the protocol is to run the protocol + over a TLS encrypted data stream. x509 certificate checks can + be done to form a crude authentication mechanism. It is also + possible for an RPC program to negotiate an encryption / + authentication capability, such as SASL, which may then also + provide per-packet data encryption. Finally the protocol data + stream can of course be tunnelled over transports such as SSH. +
+ ++ Although the protocol itself defines many arbitrary sized data values in the + payloads, to avoid denial of service attack there are a number of size limit + checks prior to encoding or decoding data. There is a limit on the maximum + size of a single RPC message, limit on the maximum string length, and limits + on any other parameter which uses a variable length array. These limits can + be raised, subject to agreement between client/server, without otherwise + breaking compatibility of the RPC data on the wire. +
+ ++ It is important that all data be fully validated before performing + any actions based on the data. When reading an RPC packet, the + first four bytes must be read and the max packet size limit validated, + before any attempt is made to read the variable length packet data. + After a complete packet has been read, the header must be decoded + and all 6 fields fully validated, before attempting to dispatch + the payload. Once dispatched, the payload can be decoded and passed + onto the appropriate API for execution. The RPC code must not take + any action based on the payload, since it has no way to validate + the semantics of the payload data. It must delegate this to the + execution API (e.g. corresponding libvirt public API). +
+ +
+ The generic internal RPC library code lives in the src/rpc/
+ directory of the libvirt source tree. Unless otherwise noted, the
+ objects are all threadsafe. The core object types and their
+ purposes are:
+
+ The following is a high level overview of the role of each + of the main RPC objects +
+ +virNetSASLContextPtr
(virnetsaslcontext.h)virNetSASLSessionPtr
(virnetsaslcontext.h)virNetTLSContextPtr
(virnettlscontext.h)virNetTLSSessionPtr
(virnettlscontext.h)virNetSocketPtr
(virnetsocket.h)virNetMessagePtr
(virnetmessage.h)virNetClientPtr
(virnetclient.h)virNetClientProgramPtr
(virnetclientprogram.h)virNetClientStreamPtr
(virnetclientstream.h)virNetServerPtr
(virnetserver.h)virNetServerMDNSPtr
(virnetservermdns.h)virNetServerClientPtr
(virnetserverclient.h)virNetServerProgramPtr
(virnetserverprogram.h)virNetServerServicePtr
(virnetserverservice.h)+ The client RPC code must allow for multiple overlapping RPC method + calls to be invoked, transmission and receipt of data for multiple + streams and receipt of asynchronous events. Understandably this + involves coordination of multiple threads. +
+ ++ The core requirement in the client dispatch code is that only + one thread is allowed to be performing I/O on the socket at + any time. This thread is said to be "holding the buck". When + any other thread comes along and needs to do I/O it must place + its packets on a queue and delegate processing of them to the + thread that has the buck. This thread will send out the method + call, and if it sees a reply will pass it back to the waiting + thread. If the other thread's reply hasn't arrived, by the time + the main thread has got its own reply, then it will transfer + responsibility for I/O to the thread that has been waiting the + longest. It is said to be "passing the buck" for I/O. +
+ ++ When no thread is performing any RPC method call, or sending + stream data there is still a need to monitor the socket for + incoming I/O related to asynchronous events, or stream data + receipt. For this task, a watch is registered with the event + loop which triggers whenever the socket is readable. This + watch is automatically disabled whenever any other thread + grabs the buck, and re-enabled when the buck is released. +
+ ++ In the first example, a second thread issues a API call + while the first thread holds the buck. The reply to the + first call arrives first, so the buck is passed to the + second thread. +
+ ++ Thread-1 + | + V + Call API1() + | + V + Grab Buck + | Thread-2 + V | + Send method1 V + | Call API2() + V | + Wait I/O V + |<--------Queue method2 + V | + Send method2 V + | Wait for buck + V | + Wait I/O | + | | + V | + Recv reply1 | + | | + V | + Pass the buck----->| + | V + V Wait I/O + Return API1() | + V + Recv reply2 + | + V + Release the buck + | + V + Return API2() ++ +
+ In this second example, a second thread issues an API call + which is sent and replied to, before the first thread's + API call has completed. The first thread thus notifies + the second that its reply is ready, and there is no need + to pass the buck +
+ ++ Thread-1 + | + V + Call API1() + | + V + Grab Buck + | Thread-2 + V | + Send method1 V + | Call API2() + V | + Wait I/O V + |<--------Queue method2 + V | + Send method2 V + | Wait for buck + V | + Wait I/O | + | | + V | + Recv reply2 | + | | + V | + Notify reply2------>| + | V + V Return API2() + Wait I/O + | + V + Recv reply1 + | + V + Release the buck + | + V + Return API1() ++ +
+ In this example, only one thread is present and it has to + deal with some async events arriving. The events are actually + dispatched to the application from the event loop thread +
+ ++ Thread-1 + | + V + Call API1() + | + V + Grab Buck + | + V + Send method1 + | + V + Wait I/O + | Event thread + V ... + Recv event1 | + | V + V Wait for timer/fd + Queue event1 | + | V + V Timer fires + Wait I/O | + | V + V Emit event1 + Recv reply1 | + | V + V Wait for timer/fd + Return API1() | + ... ++ +
+ The RPC server code must support receipt of incoming RPC requests from + multiple client connections, and parallel processing of all RPC + requests, even many from a single client. This goal is achieved through + a combination of event driven I/O, and multiple processing threads. +
+ ++ The main libvirt event loop thread is responsible for performing all + socket I/O. It will read incoming packets from clients and willl + transmit outgoing packets to clients. It will handle the I/O to/from + streams associated with client API calls. When doing client I/O it + will also pass the data through any applicable encryption layer + (through use of the virNetSocket / virNetTLSSession and virNetSASLSession + integration). What is paramount is that the event loop thread never + do any task that can take a non-trivial amount of time. +
+ ++ When reading packets, the event loop will first read the 4 byte length + word. This is validated to make sure it does not exceed the maximum + permissible packet size, and the client is set to allow receipt of the + rest of the packet data. Once a complete packet has been received, the + next step is to decode the RPC header. The header is validated to + ensure the request is sensible, ie the server should not receive a + method reply from a client. If the client has not yet authenticated, + a security check is also applied to make sure the procedure is on the + whitelist of those allowed prior to auth. If the packet is a method + call, it will be placed on a global processing queue. The event loop + thread is now done with the packet for the time being. +
+ ++ The server has a pool of worker threads, which wait for method call + packets to be queued. One of them will grab the new method call off + the queue for processing. The first step is to decode the payload of + the packet to extract the method call arguments. The worker does not + attempt to do any semantic validation of the arguments, except to make + sure the size of any variable length fields is below defined limits. +
+ ++ The worker now invokes the libvirt API call that corresponds to the + procedure number in the packet header. The worker is thus kept busy + until the API call completes. The implementation of the API call + is responsible for doing semantic validation of parameters and any + MAC security checks on the objects affected. +
+ ++ Once the API call has completed, the worker thread will take the + return value and output parameters, or error object and encode + them into a reply packet. Again it does not attempt to do any + semantic validation of output data, aside from variable length + field limit checks. The worker thread puts the reply packet onto + the transmission queue for the client. The worker is now finished + and goes back to wait for another incoming method call. +
+ ++ The main event loop is back in charge and when the client socket + becomes writable, it will start sending the method reply packet + back to the client. +
+ ++ At any time the libvirt connection object can emit asynchronous + events. These are handled by callbacks in the main event thread. + The callback will simply encode the event parameters into a new + data packet and place the packet on the client transmission + queue. +
+ ++ Incoming and outgoing stream packets are also directly handled + by the main event thread. When an incoming stream packet is + received, instead of placing it in the global dispatch queue + for the worker threads, it is sidetracked into a per-stream + processing queue. When the stream becomes writable, queued + incoming stream packets will be processed, passing their data + payload onto the stream. Conversely when the stream becomes + readable, chunks of data will be read from it, encoded into + new outgoing packets, and placed on the client's transmit + queue +
+ ++ This example illustrates processing of two incoming methods with + overlapping execution +
+ ++ Event thread Worker 1 Worker 2 + | | | + V V V + Wait I/O Wait Job Wait Job + | | | + V | | + Recv method1 | | + | | | + V | | + Queue method1 V | + | Serve method1 | + V | | + Wait I/O V | + | Call API1() | + V | | + Recv method2 | | + | | | + V | | + Queue method2 | V + | | Serve method2 + V V | + Wait I/O Return API1() V + | | Call API2() + | V | + V Queue reply1 | + Send reply1 | | + | V V + V Wait Job Return API2() + Wait I/O | | + | ... V + V Queue reply2 + Send reply2 | + | V + V Wait Job + Wait I/O | + | ... + ... ++ +
+ This example illustrates processing of stream data +
+ ++ Event thread + | + V + Wait I/O + | + V + Recv stream1 + | + V + Queue stream1 + | + V + Wait I/O + | + V + Recv stream2 + | + V + Queue stream2 + | + V + Wait I/O + | + V + Write stream1 + | + V + Write stream2 + | + V + Wait I/O + | + ... ++ + + diff --git a/docs/remote.html.in b/docs/remote.html.in index b554950939..6a8e830236 100644 --- a/docs/remote.html.in +++ b/docs/remote.html.in @@ -53,9 +53,6 @@ machines through authenticated and encrypted connections.
Please come and discuss these issues and more on the mailing list. -
--The current implementation uses XDR-encoded packets with a -simple remote procedure call implementation which also supports -asynchronous messaging and asynchronous and out-of-order replies, -although these latter features are not used at the moment. -
--The implementation should be considered strictly internal to -libvirt and subject to change at any time without notice. If -you wish to talk to libvirtd, link to libvirt. If there is a problem -that means you think you need to use the protocol directly, please -first discuss this on the mailing list. -
-
-The messaging protocol is described in
-qemud/remote_protocol.x
.
-
-Authentication and encryption (for TLS) is done using GnuTLS and the RPC protocol is unaware of this layer. -
-
-Protocol messages are sent using a simple 32 bit length word (encoded
-XDR int) followed by the message header (XDR
-remote_message_header
) followed by the message body. The
-length count includes the length word itself, and is measured in
-bytes. Maximum message size is REMOTE_MESSAGE_MAX
and to
-avoid denial of services attacks on the XDR decoders strings are
-individually limited to REMOTE_STRING_MAX
bytes. In the
-TLS case, messages may be split over TLS records, but a TLS record
-cannot contain parts of more than one message. In the common RPC case
-a single REMOTE_CALL
message is sent from client to
-server, and the server then replies synchronously with a single
-REMOTE_REPLY
message, but other forms of messaging are
-also possible.
-
-The protocol contains support for multiple program types and protocol -versioning, modelled after SunRPC.