10 Data Objects in Processes

10.1 Modeling Data Flow

When a process is carried out, it uses and creates data, information, files, documents, etc. A sequence flow from one activity to another is often accompanied by data transfer. The main purpose of message flow is also the exchange of data.

So far, when modeling sequence or message flows, data have not been considered explicitly. A sequence flow purely triggers the following activity. In respect to message flow, it is only important that a certain message is received, and thus, the related event occurs. If it is required to know which particular data are exchanged, this must be modeled separately.

BPMN was initially developed as a graphical notation for executable processes. Therefore, it was implied that all activities within a process can access all data within a pool at any time. If this concept is actually applied (e.g. by a process engine), it is not necessary to model the movement of data within a process.

However, in many processes, there is no common data pool so that it does make sense to model the data transfer explicitly. And even if there is a common data pool, it may be helpful to model input and output data of the activities. In doing so, data-related dependencies can be identified. For example, when using a common data pool, it still must be ensured that a certain piece of information has already been produced before it is required.

Figure 150: Modeling data flow with directed associations

Figure 151: Data objects connected to sequence flows

Figure 150 shows how to model data flow within a process. For this purpose, data objects are used in the form of document symbols. A data object can represent any kind of data and information, such as an electronic data set, a file, or a physical document.

Directed data associations can be used for modeling data inputs and outputs of activities. The name of a data object is often appended with the object’s current state, printed in square brackets. In the process in figure 150, the data object “Job Posting” can change its state from “to be reviewed” to “to be reworked” and “approved”.

A data association is drawn as a dotted line. It must not be confused with the dashed message flow connectors.

Data objects only exist within a process. In order to model persistent data, a data store can be used. In figure 150, the published job posting is transferred to such a data store at the very end. Thus, it will be still available when the process has finished.

Most of the data flows shown in figure 150 run in parallel to sequence flows. In such cases, it is also possible to simply connect the data object with the sequence flow, using an undirected association (figure 151). In this example, both models are equivalent. However, it is also possible to create a data object at a certain point and to use it much later in the process. This means that the data object is not passed with the sequence flow to the directly following activity. Such a case can only be modeled with directed associations, as in figure 152.

Figure 152: A data object that is used in a later activity

Figure 153: Alternative modeling of figure 152

A diagram can become confusing if there are too many lengthy data associations running through the entire process. It is therefore allowed to draw the same data object several times, as in figure 153. The two data object symbols do not represent two different data objects which happen to have the same name, but both actually reference the same object.

10.2 Multiple Data Objects

Often it is not only necessary to model single data sets, but also collections or lists. Such multiple data objects can be marked with three lines, just like multi-instance activities and multi-instance participants. In the first step of the process fragment in figure 154, the collection of received applications is reviewed. In doing so, the interesting applications are selected. These interesting applications are also a collection. In the second step, one single application is selected. As a single object, it naturally does not have a multiple-marker anymore.

10.3 Data and Events

Data objects only exist within a process; therefore data associations cannot cross the borders of pools. Direct data exchange with other processes is modeled with message flows (see chapter 5). In order to specify that the content of a message is processed as a data object, a catching message event can have an outgoing data association. Conversely, a throwing message event can have an incoming data association. The sent message then contains the contents of that data object (figure 155).

Figure 154: Multiple data objects representing lists or collections

Figure 155: Events can convert messages into data objects and vice versa

10.4 Data Stores

An indirect data exchange can also be realized by one process writing into a data store, and another process reading the data from that store. This requires that both processes can access the same data store. This is often the case if both processes belong to the same organization. In figure 156, a bill of material is created within the engineering process and written into a data store. In production planning, the bill of materials is read from the data store in order to determine the materials required.

Figure 156: Two processes accessing the same data store

10.5 Passing Data to Called Activities

If a process calls another independent process via a call activity (cf. chapter 7.5), the passing of data can also be modeled with data objects.

A data association leading into a call activity means that the data object is passed to that activity. Conversely, an outgoing data association stands for the return of a called activity’s data object to the calling process. However, it is not possible to pass any arbitrary data objects, because the process or global task to be called is defined elsewhere. This includes the definition of required data, as well produced and returned

Figure 157: Modeling data input and output for called processes

data. A global task is not modeled graphically, but for a process to be called from other processes, the required input and the provided output can be modeled.

In figure 157, the process “Appraise Insurance Claim” is called from within the process “Process Insurance Claim”. The process “Appraise Insurance Claim” at the bottom contains data objects with arrows representing the process’s data input and data output. The arrow of a data output is filled, that of a data input is blank. A data input defines which data need to be provided to the process in order to work properly. Accordingly, a data output denotes which data are produced by the process and returned to the caller.

If a process is integrated into another process via a call activity, it must be ensured that correct data inputs are provided. Likewise, only the defined data outputs can be received back from the call activity. In the example, the insurance claim process passes an insurance claim to the called process and waits for an appraisal report which corresponds exactly to the modeled data input and output in the appraisal process. Therefore, the call will work properly.

It would be possible to model further data objects (without arrow icons) in the process “Appraise Insurance Claim”. These, however, would only be used internally in the process.

If a process model contains several data inputs or outputs, the calling process must be provided with data objects suitable for all inputs, and the process will also return data objects of all data outputs. The BPMN specification also defines optional data inputs and outputs. However, this is not visible in the diagram and should be explained in an annotation, if required. Data inputs and data outputs can also be marked with three lines as multiple data objects.

10.6 Use of Data Objects

Data objects and associations, and especially the definitions of data inputs and outputs, can carry important information for process execution. For this, these elements can be connected with specific data structures, e.g. XML schema definitions. When executing a process instance, a process engine can then decide whether the required data inputs are available, and a certain activity can be triggered. Nevertheless, further detail information is required for this in addition to the model’s graphical information.

On a business level, data objects provide the means to analyze and optimize processes with respect to data creation and utilization. Process models can also be connected to data models via these elements. As an example, data models, UML class diagrams, or technical terms diagrams could be used to refine data objects in process models. In doing so, the process view could be integrated with the data view. This is important for seamless information systems development. Such an integration of different views is not part of the process modeling notation. It needs to be regulated by modeling conventions and suitable modeling tools.