Multiprocessing and Distributed Systems

Creamas has builtin basic support for agent environments running on multiple cores (see Multiprocessing functionality) and agent environments set up on distributed systems (see Distributed Systems), e.g. computing clusters. In this section, we explain the basic architectural concepts required to understand how to build your own multiprocessing and distributed environments. You should be familiar with basic overview of the library before venturing forth (see Overview).

Next, we first go over multiprocessing implementation of the basic environment, MultiEnvironment, and then the distributed system implementation, DistributedEnvironment.

Support for Multiple Cores

Multiprocessing support in Creamas is built around MultiEnvironment. This class spawns a set of Environment slaves in their own subprocesses and acts as a master for them. The master environment has also its own instance of Environment which is used to communicate with the slaves, but because it does not contain any agents (other than a possible manager agent, as we will see when dealing with distributed systems), we will not distinguish it from the MultiEnvironment for the time being.

Slave Environments and Managers

Each of the slave environments in the MultiEnvironment is executed in its own subprocess (see Figure 1.). As the slave environments are outside the master environment’s process, their functions cannot be directly called by the master and thus the slaves require other functionality to accept orders from the master. To this end, each slave environment is initialized with a manager agent, EnvManager or its subclass, which acts as a bridge between external sources and the environment instance itself; the external source being in most cases the master environment.

_images/multiprocessing_architecture.svg

Figure 1. Basic architecture for MultiEnvironment. The environment in the main process is used to connect to each slave environment’s manager and sends commands to them. The managers then forward the commands to the slave environments which execute them.

Note

If an environment is a slave environment in some MultiEnvironment, then its first agent (the agent in path tcp://environment-address:port/0) is always expected to be an instance of EnvManager, or a subclass of it.

Managing Functions

The basic manager implementation contains several exposed managing functions for the environment’s functions, i.e. functions that call the underlying environment’s functions with the same name. These managing functions allow the master to execute tasks on each of the slave environments, e.g., to collect the addresses of all the agents in all the environments or trigger act() of each of these agents.

Communication Between Master and Slaves

The communication between the master and the slave environment happens through tcp connection. In principle, the functionality works as follows:

  1. Master environment connects to the slave’s manager.

  2. Master environment calls slave manager’s exposed method.

  3. The slave’s manager calls the method with the same name in its environment with the given arguments.

  4. The slave environment executes the method and returns possible return value.

  5. The slave manager passes the return value back to the master environment.

  6. Master environment closes the connection.

Warning

Managers do not check who gives the execution orders by default. When deploying in open environments, e.g. environments exposed to internet, it is important that you do not expose any unwanted functionality through them without adding some safe guards to the exposed functions.

Creamas is mainly developed to be a research tool to be used in closed environments, and therefore is not particularly designed to offer protection for any kinds of attacks. However, aiomas has some built-in encryption support for, e.g., TSL. As Creamas’ Environment is just a subclass of aiomas’ Container, the TSL support from aiomas can be utilised in Creamas.

Developing for Multiple Cores

To utilize multiprocessing support in your own implementations, you can give following initialization parameters to MultiEnvironment:

  • Address: Address for the manager/master environment.

  • Environment class: Class for the manager/master environment which is used to connect to each of the slave managers.

  • Manager class: Class for the master environment’s manager. This should not be needed if you are not using MultiEnvironment as a part of DistributedEnvironment

After the master environment has been created, the slave environments can be spawned using spawn_slaves(). It accepts at least the following arguments.

  • Slave addresses: Addresses for the slave environments, the size of this list will define how many subprocesses are spawned.

  • Slave environment class: Class for each slave environment inside the multiprocessing environment.

  • Slave environment parameters: Initialization parameters for each slave environment.

  • Slave manager class: This is the manager agent class that is used for each slave environment.

You can, of course, also subclass MultiEnvironment itself (see GridMultiEnvironment for an example).

Support for Distributed Systems

Support for distributed systems in Creamas is built around DistributedEnvironment. Distributed environment is designed to be used with multiple (quite homogeneous) nodes which operate in a closed system where each node can make tcp connections to ports in other nodes. Further on, it requires that it is located in a machine that is able to make SSH connections to the nodes.

The basic architecture of DistributedEnvironment can be seen in the Figure 2. In short, DistributedEnvironment acts as a master for the whole environment, i.e. it does not hold “actual” simulation agents, but serves only as a manager for the simulation. Other nodes in the environment then each contain an instance of MultiEnvironment with its own manager, which accepts orders from DistributedEnvironment. The slave environments inside each MultiEnvironment then hold the actual agents for the simulation (and the manager for the slave environment).

_images/distributed_architecture.svg

Figure 2. Basic architecture for DistributedEnvironment. It manages a set of nodes each containing a MultiEnvironment. The main difference from the single node implementation is, that the main process environment on each node also holds a manager which accepts commands for that node.

Next, we look at how to set up and use DistributedEnvironment. In the following, node and MultiEnvironment are used interchangeably.

Using a Distributed Environment

Initialization of a distributed environment is done roughly in the following steps:

  1. Initialize DistributedEnvironment with a list of node locations

  2. Create node spawning terminal commands for each node, i.e. commands which start MultiEnvironment on each node.

  3. Spawn nodes using spawn_nodes()

  4. Wait until all nodes are ready (see, e.g. is_ready()) using wait_nodes(). A node is ready when it has finished its own initialization and is ready to execute orders.

  1. Make any additional preparation for the nodes using prepare_nodes().

After this sequence, the DistributedEnvironment should be ready to be used. The main usage for iterative simulations is to call trigger_all(), which triggers all agents in all the nodes (in all the slave environments) to act.

Spawning Nodes

When spawn_nodes() is called, DistributedEnvironment spawns a new process for each node in the list of node locations given at initialization time. For each process it does the following:

  1. it opens a SSH connection to one of the nodes, and

  2. executes a command line script on the node.

The command line script executed is assumed to spawn an instance of MultiEnvironment with a manager attached to it. This manager is then used to communicate any commands from DistributedEnvironment to the slave environments on that node. The command line script can also do other preparation for the node, e.g. populate its slave environments with agents.

The command line script executed is assumed to wait until the MultiEnvironment is stopped, i.e. it does not exit after the initialization (as in the naive case this would delete the environment). To achieve this, you can for example add a following kind of function to your node spawning script and call it last in the script:

async def run_node(menv, log_folder):
    try:
            await menv.manager.stop_received
        except KeyboardInterrupt:
            logger.info('Execution interrupted by user.')
        finally:
            ret = await menv.close(log_folder, as_coro=True)
            return ret

When run_node() is called, the script will block its execution until the manager of MultiEnvironment receives a stop sign. The stop sign is sent to each node’s manager when stop_nodes() is called.

See creamas/examples/grid/ for an example implementation of a distributed agent environment.