My team came up with an architectural pattern a few years ago. I think it’s good.

I came up with the name this year, it signifies that the control plane (logical, Greek) and data plane (structural, Roman) often share many similarities, but they’re different alphabets, and there isn’t always a bidirectional, information-preserving mapping between them. There always needs to be a one-way mapping (control -> data) though!

Definitions

Control plane
Software responsible for the configuration of the system
Data plane
Software responsible for the runtime behaviour of the system
Conductor
Software that owns a particular area of the data plane, and is responsible for receiving notifications of configuration changes, and making the necessary changes in the data plane to make the configuration take effect

Principles

Loose Coupling

Control Plane and Data Plane runtime components must be loosely coupled:

  • Communications must be asynchronous and indirect (e.g. using a pub/sub system)
  • Communications must be one-way (except bootstrapping—Data Plane components MAY request a fresh configuration set from the Control Plane on cold boots, if the pub/sub channel doesn’t have sufficient retention to preserve the whole configuration set.

Different Data Consistency Models

Control plane components can, and usually should, store configuration in a CP store. It’s OK for configuration update functionality to be unavailable sometimes, because it’s important that it be consistent.

Data plane components can, and usually should, store data in an AP store. It’s OK for data to be eventually consistent, but it must always be available.

Use conductors to allow specialization without coupling

A conductor is a service that watches for control plane events, and implements them, in the particular region of the data plane it’s responsible for. This is where the Greco-Roman part comes in: configuration entities of type alpha in the domain model may correspond to A’s in the data plane: there are often clean mappings, but they’re definitely different alphabets.

For example, a Lambda Conductor might observe CUDs of Query objects (Greek: alphas), and create/update/delete corresponding functions on AWS Lambda (Roman: A’s) in order to enact the user’s desired domain model change.

Conductors can veto a CUD

Conductors are specialized to a particular runtime column, and have intimate knowledge about its capabilities, and thus should have sovereignty about what domain model mutations are permissible. Thus, the control plane must wait for positive responses from all conductors before making a proposed domain model mutation permanent. Domain models really need to be versioned anyway, which makes this feasible to implement: the control plane can save speculative future versions of users’ work in progress without requiring quorum, and only when a user (ideally one with higher permissions than those required to edit!) decides to “make it so” is a two-phase commit required.

In-band transitions to updated domain models

It’s basically impossible to coordinate a state transition across a large, complex distributed system, so let’s cheat. Once all conductors have accepted a proposed domain model update, and signaled that they’re ready to go live, the control plane should inject a synthetic message into the data plane to signal the new configuration epoch. Runtimes should only start acting on the new configuration once they observe this message.