Structured data, such as transactional, customers, analytical, and market data, usually resides within a local relational database. Given a query language, such as SQL, we can query the data used for processing, as shown in the workflow in the preceding diagram. Usually, all the data can be stored in memory and further processed with a machine learning library such as Weka, Java-ML, or MALLET.
A common practice in the architecture design is to create data pipelines, where different steps in the workflow are split. For instance, in order to create a client data record, we might have to scrap the data from different data sources. The record can be then saved in an intermediate database for further processing.
To understand how the high-level aspects of big data architecture differ, let's first clarify when data is considered big.