© Adam L. Davis 2020
A. L. DavisSpring Quick Reference Guidehttps://doi.org/10.1007/978-1-4842-6144-6_14

14. Spring Batch

Adam L. Davis1 
(1)
Oviedo, FL, USA
 

Spring Batch is a project to support long-running data conversion or similar long-running processes for enterprise systems. It has tons of features, some of which we will cover.

Features

Spring Batch provides features for partitioning and processing high volumes of data. It also provides reusable functions that are essential in processing large volumes of records, including transaction management, job processing statistics, job restart, retry and skip, logging and tracing, and resource management.

Overview

In the big picture, Spring Batch is composed of a JobLauncher, JobRepository, Jobs, Steps, ItemReaders, ItemProcessors, and ItemWriters.

A JobLauncher runs a Job with given Job Parameters. Each Job can have multiple Steps. Each Step is typically composed of an ItemReader, ItemProcessor, and ItemWriter. Metadata, or information about the state of each entity, is saved and loaded using the JobRepository.

../images/498572_1_En_14_Chapter/498572_1_En_14_Figa_HTML.jpg

The Example

To demonstrate Spring Batch, we will use an example. In this example, we will use a simple Course definition. Spring Batch will be used to load a CSV file defining courses, convert the values, and save new Course rows to the database.

Build

For simplicity, we’ll use Spring Boot (which is covered more fully in the next chapter). Firstly, we’ll define a Gradle build with spring-batch, and then we’ll cover the Maven build.

Gradle Build

Create a file named build.gradle with the following contents:
plugins {
  id 'org.springframework.boot' version '2.3.0.RELEASE'                 //1
  id 'io.spring.dependency-management' version '1.0.8.RELEASE'
  id 'java'
}
group = 'com.example'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '1.8'
repositories {
  mavenCentral()
}
dependencies {
  implementation 'org.springframework.boot:spring-boot-starter-batch' //2
  runtimeOnly 'org.hsqldb:hsqldb'
  testImplementation('org.springframework.boot:spring-boot-starter-test')
  {
    exclude group: 'org.junit.vintage', module: 'junit-vintage-engine'
  }
  testImplementation 'org.springframework.batch:spring-batch-test' //3
}
test {
  useJUnitPlatform() //4
}
  1. 1.

    We apply the plugin for Spring Boot and Spring dependency management, which allows us to leave off versions in the dependencies block.

     
  2. 2.

    This line defines the spring-boot-starter-batch which brings in all the Jars needed for Spring Batch. On the next line, we include hsqldb1 to use as the database.

     
  3. 3.

    There’s also a library specifically for testing Spring Batch, spring-batch-test.

     
  4. 4.

    This line tells Gradle to use JUnit 5 for tests.

     

Maven Build

Create a file named “pom.xml” with the following:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <parent>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-parent</artifactId>
                <version>2.3.0.RELEASE</version>
                <relativePath/>
        </parent>
        <groupId>com.example</groupId>
        <artifactId>batch-processing</artifactId>
        <version>0.0.1-SNAPSHOT</version>
        <name>batch-processing</name>
        <description>Demo project for Spring Boot, Batch</description>
        <properties>
                <java.version>1.8</java.version>
        </properties>
        <dependencies>
                <dependency>
                        <groupId>org.springframework.boot</groupId>
                        <artifactId>spring-boot-starter-batch</artifactId>
                </dependency>
                <dependency>
                        <groupId>org.hsqldb</groupId>
                        <artifactId>hsqldb</artifactId>
                        <scope>runtime</scope>
                </dependency>
                <dependency>
                        <groupId>org.springframework.boot</groupId>
                        <artifactId>spring-boot-starter-test</artifactId>
                        <scope>test</scope>
                        <exclusions>
                                <exclusion>
                            <groupId>org.junit.vintage</groupId>
                            <artifactId>junit-vintage-engine</artifactId>
                                </exclusion>
                        </exclusions>
                </dependency>
                <dependency>
                        <groupId>org.springframework.batch</groupId>
                        <artifactId>spring-batch-test</artifactId>
                        <scope>test</scope>
                </dependency>
        </dependencies>
        <build>
                <plugins>
                        <plugin>
                        <groupId>org.springframework.boot</groupId>
                        <artifactId>spring-boot-maven-plugin</artifactId>
                        </plugin>
                </plugins>
        </build>
</project>

In addition to the standard Spring Boot Maven build, we include hsqldb (the database), spring-boot-starter-batch, and spring-batch-test.

../images/498572_1_En_14_Chapter/498572_1_En_14_Figb_HTML.jpg Since Spring Batch typically involves an interaction with a database and saves metadata to a database by default, the starter for Spring Batch depends on spring-boot-starter-jdbc.

Schema

Since spring-boot-starter-jdbc is on the classpath, and we’ve included a database (hsqldb), the only thing necessary to initialize our database is to include a file named schema-all.sql under src/main/resources/. Create this file and add the following:
DROP TABLE course IF EXISTS ;
CREATE TABLE course  (
    course_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
    title VARCHAR(200),
    description VARCHAR(250)
);

Course

We define the Course entity as a typical domain class (POJO) with a title and description:
public class Course {
        private String title;
        private String description;
        public Course() {
        }
        public Course(String title, String description) {
                this.title = title;
                this.description = description;
        }
        //getters and setters...
        @Override
        public String toString() {
                return "title: " + title + ", description: " + description;
        }
}

CourseProcessor

Spring Batch provides the ItemProcessor<I,O> interface (I stands for input and O for output) for implementing the logic whenever an entity needs to be modified or processed in some way.

In this case, we define a CourseProcessor that implements ItemProcessor<I,O> that replaces any amount of space with one space and trims any leading or trailing whitespace:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;
public class CourseProcessor implements ItemProcessor<Course, Course> { //1
  private static final Logger log =
     LoggerFactory.getLogger(CourseProcessor.class);
  @Override
  public Course process(final Course course) throws Exception {
    final String title = course.getTitle()
                               .replaceAll("\\s+", " ").trim(); //2
    final String description = course.getDescription()
                                     .replaceAll("\\s+", " ").trim();
    final Course transformedCourse = new Course(title, description);
    log.info("Converting (" + course + ") into (" + transformedCourse + ")");
    return transformedCourse; //3
  }
}
  1. 1.

    We declare that CourseProcessor implements the ItemProcessor interface and that both the in and out types are the same, Course. If they were different, the first declared type would declare the type of the parameter to process, and the second type would be the return type.

     
  2. 2.

    Here, we replace any space with one space using replaceAll (using the regular expression \\s+) in both the title and description. We create a new object so that the Processor is idempotent – it should not modify the input object.

     
  3. 3.

    Finally, we return the new Course instance from the process method.

     

BatchConfiguration

Lastly, we define a @Configuration that defines the Step and Job that will be automatically run by Spring Batch. Although we have one Job and one Step in this case, there could be multiple Jobs and one or more Steps per Job. If multiple Jobs exist, you can specify which job or jobs to run as a property (spring.batch.job.names).
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.*;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import javax.sql.DataSource;
@Configuration
@EnableBatchProcessing                  //1
public class BatchConfiguration {
    @Autowired
    public JobBuilderFactory jobBuilderFactory;
    @Autowired
    public StepBuilderFactory stepBuilderFactory;
    @Bean
    public FlatFileItemReader<Course> reader() {            //2
        return new FlatFileItemReaderBuilder<Course>()
               .name("personItemReader")
               .resource(new ClassPathResource("sample-data.csv"))
               .delimited()
               .names(new String[]{"title", "description"})
               .fieldSetMapper(new BeanWrapperFieldSetMapper<Course>() {{
                    setTargetType(Course.class);
               }})
               .build();
    }
    @Bean
    public CourseProcessor processor() {
        return new CourseProcessor();
    }
    @Bean
    public JdbcBatchItemWriter<Course> writer(DataSource dataSource) { //3
        return new JdbcBatchItemWriterBuilder<Course>()
           .itemSqlParameterSourceProvider(new
                  BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("INSERT INTO course (title, description) VALUES" +
                     " (:title, :description)")
                .dataSource(dataSource)
                .build();
    }
    @Bean
    public Step readAndSaveStep(JdbcBatchItemWriter<Course> writer,  //4
                                CourseProcessor processor) {
        return stepBuilderFactory.get("saveStep")
                .<Course, Course>chunk(10)
                .reader(reader())
                .processor(processor)
                .writer(writer)
                .build();
    }
    @Bean
    public Job importCourseJob(JobCompletionListener listener, Step step) {
        return jobBuilderFactory.get("importCourseJob")     //5
                .incrementer(new RunIdIncrementer())
                .listener(listener)
                .flow(step)
                .end()
                .build();
    }
}
Listing 14-1

BatchConfiguration.java

  1. 1.

    @EnableBatchProcessing enables the auto-configuration for Spring Batch, which provides the default JobRepository, JobBuilderFactory, StepBuilderFactory, and other Spring beans.

     
  2. 2.

    We create a FlatFileItemReader<T>, which is one of the many helper classes provided by Spring Batch. Here, we define what file to read from, and using a BeanWrapperFieldSetMapper<T>, we define what fields to set on the Course (using Java Bean standards).

     
  3. 3.

    We create a JdbcBatchItemWriter<T>, which will insert records into our database.

     
  4. 4.

    Using the StepBuilderFactory, we create a step which will process in chunks of ten courses (ten at a time). Data is processed in chunks for efficiency and performance. If any error happens in a chunk, the entire chunk is rolled back.

     
  5. 5.

    We define the Job using the JobBuilderFactory.

     
For this example, the file, sample-data.csv, might look like the following (note the extra whitespace which will be removed):
Java   11,   Java 11 for beginners
Java    Advanced,  Advanced Java course
Spring    ,   Course for Spring Framework

JobExecutionListener

Spring Batch publishes events which can be listened to using a JobExecutionListener . For example, the following class, JobCompletionListener, implements the afterJob method and prints out a message only when the Job has been completed:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.stereotype.Component;
@Component
public class JobCompletionListener extends JobExecutionListenerSupport {
  private static final Logger log =
          LoggerFactory.getLogger(JobCompletionListener.class);
  @Override
  public void afterJob(JobExecution jobExecution) {
    if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
      log.info("JOB FINISHED!");
    }
  }
}

The JobExecutionListenerSupport class implements JobExecutionListener. This allows us to implement the interface and only define the afterJob method.

Spring Batch Metadata

Spring Batch can automatically store metadata about each batch execution as an audit record and to help restarts or analyzing errors in postmortems.

The Spring Batch metadata tables closely match the Domain objects that represent them in Java. For example, JobInstance, JobExecution, JobParameters, and StepExecution map to BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS, and BATCH_STEP_EXECUTION, respectively. ExecutionContext maps to both BATCH_JOB_EXECUTION_CONTEXT and BATCH_STEP_EXECUTION_CONTEXT.

../images/498572_1_En_14_Chapter/498572_1_En_14_Figc_HTML.jpg With Spring Boot, you can ensure this schema is created (tables created) using the following property:

spring.batch.initialize-schema=always

By default, it will only create the tables if you are using an embedded database. Likewise, you can keep it from even creating the tables using

spring.batch.initialize-schema=never

Spring Retry

Oftentimes, while running a batch process, you might want to automatically retry, try the same operation multiple times, if an operation fails. For example, there might be a temporary network glitch, or perhaps a database has a temporary issue. This is such a commonly desired feature; Spring made the Spring Retry2 project for implementing this as a cross-cutting feature – either through AOP or programmatically.

To get started with spring-retry, first include it in the build:

Maven

<dependency>

<groupId>org.springframework.retry</groupId>

<artifactId>spring-retry</artifactId>

<version>1.3.0</version>

</dependency>

Gradle

implementation 'org.springframework.retry:spring-retry:jar:1.3.0'

Then, to use the declarative/AOP approach, add the @EnableRetry annotation to one of your Java configuration classes (this tells Spring to scan for the @Retryable annotation):
@Configuration
@EnableBatchProcessing
@EnableRetry
public class BatchConfiguration {
Or to use Spring Retry in the imperative (programmatic) approach, use the RetryTemplate directly, for example:
RetryTemplate template = RetryTemplate.builder()
                                .maxAttempts(3)
                                .fixedBackoff(1000)
                                .retryOn(RemoteAccessException.class)
                                .build();
template.execute(ctx -> {
    // ... some code
});

In this example, the executed code will be retried up to three times, only when a RemoteAccessException is thrown, and will back off one second (1000 milliseconds) each time.

Retry Terms

Max-Attempts

Maximum number of retries.

Fixed-Backoff

Time to increase pause between retries (in milliseconds).

Exponential-Backoff

Parameters to increase pause between retries (in milliseconds) exponentially to better solve the issue when the problematic system is down due to oversaturation.

Random-Backoff

It’s good to include randomness (from 0% to 200% delay time) to avoid correlation of retries (so a bunch of nodes don’t all retry at the same time).

Retryable Annotation

Using the AOP method, you can annotate any method introspected by Spring (on a public method of a Spring bean) with @Retryable (after using @EnableRetry on a configuration class). For example, let’s modify our CourseProcessor from earlier to retry up to four times:
@Retryable(maxAttempts = 4, backoff =
        @Backoff(random = true, delay = 100))
@Override
public Course process(final Course course) throws Exception {
  // code...
  return transformedCourse;
}

Notice how we set the backoff using the @Backoff annotation.