Batch Processing in Mule 4 vs. Mule 3


In Mule Batch series, we looked at the batch processing capabilities of Mule ESB 3. At the time of writing this post, Mule 4 was already available as Release Candidate version.

Mule 4 offers huge improvements and changes to Mule 3, such as Introduction of DataWeave 2.0, Reusable Streaming, improved processing strategies, operation based connectors and much more. Mule 4 also has some changes to the way Batch jobs were implemented in Mule 3. In this post, we will look at batch processing in Mule 4.

If you have missed the batch processing capabilities in Mule 3, then I recommend reading below articles from earlier series -

1 Refreshing memory - Batch in Mule 3.x

Batch processing in Mule 3 is divided into four phases - Input, Load and Dispatch, Process and On Complete. A special scoped variable set, called as 'Record Variables' is used to store any variables at the record level. These are different than flow variables, which in Mule 3, could not be used during record processing.

Mule Batch Phases
Figure 1.A: Mule Batch Phases [Source: Mule Docs]

Input Phase: This is an optional part of the batch job that can be used to retrieve the source data using any inbound connector.

Load and Dispatch: This is an implicit phase and Mule runtime takes care of it. In this phase, the payload generated in Input phase or provided to Batch from Caller flow and is turned into a collection of records.

Process: This is the required phase where actual processing of every record occurs asynchronously. Records are processed through each step while they move back-and-forth between steps and the intermediate queue.

On Complete: In this final but optional phase, A summary of batch execution is made available to possibly generate reports or any other statistics. The payload in this phase is available as an instance of BatchJobResult object. It holds information such as the number of records loaded, processed, failed, succeeded. It can also provide details of exceptions occurred in steps.

2 Batch Job in Mule 4

In Mule 3.x, Batch Job was a top-level element and exists independent of flows/subflows. It can be called inside flow using Batch Execute (similar to flow-ref) component.

In Mule 4, this is changed. Batch is now a Scope like for-each, transactional, async etc and it can be added into flow itself. No more having different elements!

This is how a batch job looks in Mule 4 -

mule 4 batch processing batch job
This is mule4 version of the batch application that we used in earlier Mule (3) Batch series.

Let’s compare the Mule 4 Batch with Mule 3 Batch and learn about the changes.

2.1 Batch Phases

There are only 3 phases in Mule 4 Batch job compared to 4 phases in Mule 3 Batch job. Input phase does not exist in Mule 4. Since Batch is a scope and it does not need any special Input phase. The payload of the flow is passed to the batch.

Load and Dispatch: This is still there, Implicit and does the same thing as in Mule 3. It will create job instance, covert payload into collection of records and then split the collection into individual records for processing.

Process: This is still a required phase and essentially same as it was in Mule 3. It will process all records asynchronously. Batch steps in this phase, still allow you to filter records using acceptExpression and/or acceptPolicy configurations.

OnComplete: The last and optional phase in batch, functions same as in Mule 3. That means, you still do not get the processed results back in calling flow and they are available as an instance of BatchJobResult in this phase only.

2.2 Flow variables instead of Record Variables.

Batch jobs in Mule 3 have a special variable set called as Record Variables. These variables, unlike flow variables, only existed during the process phase.

In Mule 4, flow variables have been enhanced to work efficiently during batch processing, just like the record variables. Flow variables created in batch steps are now automatically tied to the processing record and stays with it throughout the processing phase. No longer record variables are needed.

This also removes all special treatment we had to give to record variables during munit testing.

Flow Variables in Batch Step

In this test batch application, the 100K objects are created with DataWeave 2.0 and then passed to the batch job for processing.

%dw 2.0
output application/java
---
(1 to 100000) map {
    id: $,
    "Name": "Name-" ++ $
}

In batch step 1, it sets a flow variable recordId with value of id.

<set-variable variableName="recordId" value="#[payload.id]" doc:name="Set Variable" />

Step 2 and Step 3 then have a validation component that compares the flow variable value with payload.id. If you run this application, not a single record fails due to validation, which means all 100K flow variables stayed with their records!

Any flow variables created during process phase, exist during Process phase only. Flow variables that existed before and outside batch scope, are available thorouhout process phase as well as in oncomplete phase.

2.3 Batch Aggregator instead of Batch Commit

In Mule 3 Batch Job, you would be using Batch Commit for grouping of records for sending over outbound connector such as database, salesforce etc. Instead of calling database insert for each record, you can specify the group size to perform batch commit.

In Mule 4 Batch job, this has been replaced with Batch Aggregator component. The underline behavior and functionality of Batch Aggregator is same as Batch Commit.

Grouped collection in batch aggregator is a mutable collection i.e. it allows you to modify individual records in groups or variables associated with those. You can aggregate records to process in two ways -

  • Aggregate fixed amount of records

  • Streaming all records

The important difference about mutability in both options is, streaming provides one-read forward-only iterator of records, while the other one allows you randomly as well as sequentially access records.

3. Conclusion

Mule 4 comes with lots of improvements and enhancements. Batch processing in Mule 3 was already powerful, in Mule 4 it has been simplified for developers and thus easy to implement.

Source code is available on Github

4. References

on twitter to get updates on new posts.

Stay updated!

On this blog, I post articles about different technologies like Java, MuleSoft, and much more.

You can get updates for new Posts in your email by subscribing to JavaStreets feed here -


Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Jakarta EE enthusiast, MuleSoft Integration Architect, MuleSoft Community Ambassador, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!