mirror of
https://codeberg.org/Toasterson/ips.git
synced 2026-04-10 13:20:42 +00:00
206 lines
9.9 KiB
Text
206 lines
9.9 KiB
Text
|
|
.. This document is formatted using reStructuredText, which is a Markup
|
||
|
|
Syntax and Parser Component of Docutils for Python. An html version
|
||
|
|
of this document can be generated using the following command:
|
||
|
|
rst2html.py doc/parallel-linked-images.txt >doc/parallel-linked-images.html
|
||
|
|
|
||
|
|
======================
|
||
|
|
Parallel Linked Images
|
||
|
|
======================
|
||
|
|
|
||
|
|
:Author: Edward Pilatowicz
|
||
|
|
:Version: 0.1
|
||
|
|
|
||
|
|
|
||
|
|
Problems
|
||
|
|
========
|
||
|
|
|
||
|
|
Currently linked image recursion is done serially and in stages. For
|
||
|
|
example, when we perform an "pkg update" on an image then for each child
|
||
|
|
image we will execute multiple pkg.1 cli operations. The multiple pkg.1
|
||
|
|
invocations on a single child image correspond with the following
|
||
|
|
sequential stages of pkg.1 execution:
|
||
|
|
|
||
|
|
1) publisher check: sanity check child publisher configuration against
|
||
|
|
parent publisher configuration.
|
||
|
|
2) planning: plan fmri and action changes.
|
||
|
|
3) preparation: download content needed to execute planned changes.
|
||
|
|
4) execution: execute planned changes.
|
||
|
|
|
||
|
|
So to update an image with children, we invoke pkg.1 four times for each
|
||
|
|
child image. This architecture is inefficient for multiple reasons:
|
||
|
|
|
||
|
|
- we don't do any operations on child images in parallel
|
||
|
|
|
||
|
|
- when executing multiple pkg.1 invocations to perform a single
|
||
|
|
operation on a child image, we are constantly throwing out and
|
||
|
|
re-initializing lots of pkg.1 state.
|
||
|
|
|
||
|
|
To make matters worse, when as we execute stages 3 and 4 on a child
|
||
|
|
image the pkg client also re-executes previous stages. For example,
|
||
|
|
when we start stage 4 (execution) we re-execute stages 2 and 3. So for
|
||
|
|
each child we update we end up invoking stage 2 three times, and stage 3
|
||
|
|
twice. This leads to bugs like 18393 (where it seems that we download
|
||
|
|
packages twice). It also means that we have caching code buried within
|
||
|
|
the packaging system that attempts to cache internal state to disk in an
|
||
|
|
effort to speed up subsequent re-runs of previous stages.
|
||
|
|
|
||
|
|
|
||
|
|
Solutions
|
||
|
|
=========
|
||
|
|
|
||
|
|
|
||
|
|
Eliminate duplicate work
|
||
|
|
------------------------
|
||
|
|
|
||
|
|
We want to eliminate a lot of the duplicate work done when executing
|
||
|
|
packaging operations on children in stages. To do this we will update
|
||
|
|
the pkg client api to allow callers to:
|
||
|
|
|
||
|
|
- Save an image plan to disk.
|
||
|
|
- Load an image plan from disk.
|
||
|
|
- Execute a loaded plan from disk without first "preparing" it. (This
|
||
|
|
assumes that the caller has already "prepared" the plan in a previous
|
||
|
|
invocation.)
|
||
|
|
|
||
|
|
In addition to eliminating duplicated work during staged execution, this
|
||
|
|
will also allow us to stop caching intermediate state internally within
|
||
|
|
the package system. Instead client.py will be enhanced to cache the
|
||
|
|
image plan and it will be the only component that knows about "staging".
|
||
|
|
|
||
|
|
To allow us to save and restore plans, all image plan data will be saved
|
||
|
|
within a PlanDescription object, and we will support serializing this
|
||
|
|
object into a json format. The json format for saved image plans is an
|
||
|
|
internal, unstable, and unversioned private interface. We will not
|
||
|
|
support saving an image plan to disk and then executing it later with a
|
||
|
|
different version of the packaging system on a different host. Also,
|
||
|
|
even though we will be adding data into the PlanDescription object we
|
||
|
|
will also not be exposing any new information about an image plan to via
|
||
|
|
the PlanDescription object to api consumers.
|
||
|
|
|
||
|
|
An added advantage of allowing api consumers to save an image plan to
|
||
|
|
disk is that it should help with our plans to have the api.gen_plan_*()
|
||
|
|
functions to be able to return PlanDescription object for child images.
|
||
|
|
A file descriptor (or path) associated with a saved image plan would be
|
||
|
|
one way for child images to pass image plans back to their parent (which
|
||
|
|
could then load them and yield them as results to api.gen_plan_*()).
|
||
|
|
|
||
|
|
|
||
|
|
Update children in parallel
|
||
|
|
---------------------------
|
||
|
|
|
||
|
|
We want to enhance the package client so that it can update child images
|
||
|
|
in parallel.
|
||
|
|
|
||
|
|
Due to potential resource constraints (cpu, memory, and disk io) we
|
||
|
|
cannot entirely remove the ability to operate on child images serially.
|
||
|
|
Instead, we plan to allow for a concurrency setting that specifies how
|
||
|
|
many child images we are willing to update in parallel. By default when
|
||
|
|
operating on child images we will use a concurrency setting of 1, this
|
||
|
|
maintains the current behavior of the packaging system. If a user wants
|
||
|
|
to specify a higher concurrency setting, they can use the "-C N" option
|
||
|
|
to subcommands that recurse (like "install", "update", etc) or they can
|
||
|
|
set the environment variable "PKG_CONCURRENCY=N". (In both cases N is
|
||
|
|
an integer which specifies the desired concurrency level.)
|
||
|
|
|
||
|
|
Currently, pkg.1 worker subprocesses are invoked via the pkg.1 cli
|
||
|
|
interfaces. When switching to parallel execution this will be changed
|
||
|
|
to use a json encoded rpc execution model. This richer interface is
|
||
|
|
needed to allow worker processes to pause and resume execution between
|
||
|
|
stages so that we can do multi-staged operations in a single process.
|
||
|
|
|
||
|
|
Unfortunately, the current implementation does not yet retain child
|
||
|
|
processes across different stages of execution. Instead, whenever we
|
||
|
|
start a new stage of execution, we spawn one process for each child
|
||
|
|
images, then we make a remote procedure call into N images at once
|
||
|
|
(where N is our concurrency level). When an RPC returns, that child
|
||
|
|
process exits and we start a call for the next available child.
|
||
|
|
|
||
|
|
Ultimately, we'd like to move to model where we have a pool of N worker
|
||
|
|
processes, and those processes can operate on different images as
|
||
|
|
necessary. These processes would be persistent across all stages of
|
||
|
|
execution, and ideally, when moving from one stage to another these
|
||
|
|
processes could cache in memory the state for at least N child images so
|
||
|
|
that the processes could simply resume execution where they last left
|
||
|
|
off.
|
||
|
|
|
||
|
|
The client side of this rpc interface will live in a new module called
|
||
|
|
PkgRemote. The linked image subsystem will use the PkgRemote module to
|
||
|
|
initiate operations on child images. One PkgRemote instance will be
|
||
|
|
allocated for each child that we are operating on. Currently, this
|
||
|
|
PkgRemote module will only support the sync and update operations used
|
||
|
|
within linked images, but in the future it could easily be expanded to
|
||
|
|
support other remote pkg.1 operations so that we can support recursive
|
||
|
|
linked image operations (see 7140357). When PkgRemote invokes an
|
||
|
|
operation on a child image it will fork off a new pkg.1 worker process
|
||
|
|
as follows:
|
||
|
|
|
||
|
|
pkg -R /path/to/linked/image remote --ctlfd=5
|
||
|
|
|
||
|
|
this new pkg.1 worker process will function as an rpc server which the
|
||
|
|
client will make requests to.
|
||
|
|
|
||
|
|
Rpc communication between the client and server will be done via json
|
||
|
|
encoded rpc. These requests will be sent between the client and server
|
||
|
|
via a pipe. The communication pipe is created by the client, and its
|
||
|
|
file descriptor is passed to the server via fork/exec. The server is
|
||
|
|
told about the pipe file descriptor via the --ctlfd parameter. To avoid
|
||
|
|
issues with blocking IO, all communication via this pipe will be done by
|
||
|
|
passing file descriptors. For example, if the client wants to send a
|
||
|
|
rpc request to the server, it will write that rpc request into a
|
||
|
|
temporary file and then send the fd associated with the temporary file
|
||
|
|
over the pipe. Any reply from the server will be similarly serialized
|
||
|
|
and then sent via a file descriptor over the pipe. This should ensure
|
||
|
|
that no matter the size of the request or the response, we will not
|
||
|
|
block when sending or receiving requests via the pipe. (Currently, the
|
||
|
|
limit of fds that can be queued in a pipe is around 700. Given that our
|
||
|
|
rpc model includes matched requests and responses, it seems unlikely
|
||
|
|
that we'd ever hit this limit.)
|
||
|
|
|
||
|
|
In the pkg.1 worker server process, we will have a simple json rpc
|
||
|
|
server that lives within client.py. This server will listen for
|
||
|
|
requests from the client and invoke client.py subcommand interfaces
|
||
|
|
(like update()). The client.py subcommand interfaces were chosen to be
|
||
|
|
the target for remote interfaces for rpc calls for the following
|
||
|
|
reasons:
|
||
|
|
|
||
|
|
- Least amount of encoding / decoding. Since these interfaces are
|
||
|
|
invoked just after parsing user arguments, they mostly involve simple
|
||
|
|
arguments (strings, integers, etc) which have a direct json encoding.
|
||
|
|
Additionally, the return values from these calls are simple return
|
||
|
|
code integers, not objects, which means the results are also easy to
|
||
|
|
encode. This means that we don't need lots of extra serialization /
|
||
|
|
de-serialization logic (for things like api exceptions, etc).
|
||
|
|
|
||
|
|
- Output and exception handling. The client.py interfaces already
|
||
|
|
handle exceptions and output for the client. This means that we don't
|
||
|
|
have to create new output classes and build our own output and
|
||
|
|
exception management handling code, instead we leverage the existing
|
||
|
|
code.
|
||
|
|
|
||
|
|
- Future recursion support. Currently when recursing into child images
|
||
|
|
we only execute "sync" and "update" operations. Eventually we want to
|
||
|
|
support pkg.1 subcommand recursion into linked images (see 7140357)
|
||
|
|
for many more operations. If we do this, the client.py interfaces
|
||
|
|
provide a nice boundary since there will be an almost 1:1 mapping
|
||
|
|
between parent and child subcommand operations.
|
||
|
|
|
||
|
|
|
||
|
|
Child process output and progress management
|
||
|
|
--------------------------------------------
|
||
|
|
|
||
|
|
Currently, since child execution happens serially, all child images have
|
||
|
|
direct access to standard out and display their progress directly there.
|
||
|
|
Once we start updating child images in parallel this will no longer be
|
||
|
|
possible. Instead, all output from children will be logged to temporary
|
||
|
|
files and displayed by the parent when a child completes a given stage
|
||
|
|
of execution.
|
||
|
|
|
||
|
|
Additionally, since child images will no longer have access to standard
|
||
|
|
out, we will need a new mechanism to indicate progress while operating
|
||
|
|
on child images. To do this we will have a progress pipe between each
|
||
|
|
parent and child image. The child image will write one byte to this
|
||
|
|
pipe whenever one of the ProgressTracker`*_progress() interfaces are
|
||
|
|
invoked. The parent process can read from this pipe to detect progress
|
||
|
|
within children and update its user visible progress tracker
|
||
|
|
accordingly.
|