A synchronous pipeline essentially processes one payload at a time. We could implement such a pipeline by creating a for loop that does the following:
- Dequeues the next payload from the input source or exits the loop if no more payloads are available
- Iterates the list of pipeline stages and invokes the Processor instance for each stage
- Enqueues the resulting payload to the output source
Synchronous pipelines are great for workloads where payloads must always be processed in first-in-first-out (FIFO) fashion, a quite common case for event-driven architectures which, most of the time, operate under the assumption that events are always processed in a specific order.
As an example, let's say that we are trying to construct an ETL (short for extract, transform, and load) pipeline for consuming an event-stream from an order-processing system, enriching some of the incoming events with additional information by querying an external system and finally transforming the enriched events into a format suitable for persisting into a relational database. The pipeline for this use-case can be assembled using the following two stages:
- The first stage inspects the event type and enriches it with the appropriate information by querying an external service
- The second stage converts each enriched event into a sequence of SQL queries for updating one or more database tables
By design, our processing code expects that an AccountCreated event must always precede an OrderPlaced event, which includes a reference (a UUID) to the account of the customer who placed the order. If the events were to be processed in the wrong order, the system might find itself trying to process OrderPlaced events before the customer records in the database have been created. While it is certainly possible to code around this limitation, it would make the processing code much more complicated and harder to debug when something goes wrong. A synchronous pipeline would enforce in-order processing semantics and make this a nonissue.
So what's the catch when using synchronous pipelines? The main issue associated with synchronous pipelines is low throughput. If our pipeline consists of N stages and each stage takes 1 time unit to complete, our pipeline would require N time-units to process and emit each payload. By extension, each time a stage is processing a payload, the remaining N-1 stages are idling.