DataWeave 2.2 and Apache Avro
1. Overview
Mule Runtime 4.2 was released with DataWeave 2.2 version. This adds support for Content (De)Serialization with Apache Avro. In this post, we will take a test drive of DataWeave 2.2 with Apache Avro.
Requirements:
-
Mule Runtime 4.2.0 and above
-
DataWeave 2.2 (Uses Apache Avro 1.9.0)
-
2. Apache Avro
Apache Avro™ is a data serialization system. Some features Avro provides, are -
-
Rich Data Structures.
-
A compact, fast, binary data format.
-
A container file, to store persistent data.
2.1 Avro Schemas
Data format in Avro is described using Schemas. These schemas are defined in JSON. Schemas are needed when serializing data to Avro. When serialized, schema content is also included in serialized data. This makes it easy while deserializing the content, as required schema is locally present in data.
{
"namespace": "com.javastreets.avro",
"name": "com.javastreets.avro.Employee",
"type": "record",
"fields": [
{
"name": "employeeId",
"type": "int"
},
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "address",
"type": "string"
},
{
"name": "notes",
"type": "string"
}
]
}
3. DataWeave 2.2 Avro Support
DataWeave 2.2 adds support for Searializing and Deserializing data in Avro format. Fow DataWeave to use Avro (De)Serialization, Mime type of data must be application/avro
.
3.1 Serialize (output) with Avro
To Serialize (output) data using Avro format, the output
must be set to application/avro
. Schemas are required for serialization. We use schemaUrl
attribute on output
to reference our schema file.
To serialize list of Employees using above schema, our DataWeave file would look like following -
%dw 2.2
output application/avro schemaUrl="employee.avro.json" (1)
---
(0 to 100) map {
employeeId: $$,
firstname: "Manik" ++ $$,
lastname: "Magar",
address: "Test dummy address 123",
notes: "some more information"
}
1 | References employee.avro.json in src/main/resources/ directory. |
Body of the script DOES NOT contain any Avro-specific code. This makes script development experience exactly same as earlier for other formats. |
In case of larger payloads, streaming can be enabled by adding deferred=true
(default false
), and optionally bufferSize
(default 8192
) attributes on output
. For example, following output
will enable streaming with provided buffer size -
output application/avro schemaUrl="employee.avro.json",deferred=true,bufferSize=8192
Serialized data from above DataWeave would look like below (Note that it starts with the actual schema definition) -
Objavro.schema�{"type":"record","name":"Employee","namespace":"com.javastreets.avro","fields":[{"name":"employeeId","type":"int"},{"name":"firstname","type":"string"},{"name":"lastname","type":"string"},{"name":"address","type":"string"},{"name":"notes","type":"string"}]} �f�m���(�%�g���_ Manik0
Magar,Test dummy address 123*some more informationManik1
Magar,Test dummy address 123*some more informationManik2
3.2 Deserialize with Avro
Deserialization with Avro in DataWeave 2.2 is pretty straight forward. If payload is a data serialized with Avro, then payload mime type MUST be application/avro
. Once you have that, DataWeave should be able to deserialize that content and allow you to convert it to any other format.
Consider following flow which reads a file, originally serialized with Avro.
<flow name="test-avro-supportFlow1">
<file:listener doc:name="On New or Updated File" directory="Documents/mule-avro/input" moveToDirectory="Documents/mule-avro/backup" autoDelete="true" outputMimeType="application/avro"> (1)
<scheduling-strategy >
<fixed-frequency />
</scheduling-strategy>
<file:matcher filenamePattern="*.avro"/> (2)
</file:listener>
<ee:transform doc:name="Transform Message">
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.2
output application/json (3)
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
<file:write doc:name="Write" path="#['Documents/mule-avro/output/test.avro' ++ attributes.creationTime ++'.json']" /> (4)
</flow>
1 | Listens for new files and sets mime type to be application/avro . |
2 | Reads *.avro files. |
3 | Converts file content (original Employee List, for example.) to JSON. |
4 | Writes json file to directory. |
As per Avro specification, serialized data contains the schema definition. Reader is able to use that inline definition to deserialize data. Body of the script DOES NOT contain any Avro-specific code. This makes script development experience exactly same as for other formats. |
This simple transformaton should generate JSON file like below -
[
{
"employeeId": 0,
"firstname": "Manik0",
"lastname": "Magar",
"address": "Test dummy address 123",
"notes": "some more information"
},
{
"employeeId": 1,
"firstname": "Manik1",
"lastname": "Magar",
"address": "Test dummy address 123",
"notes": "some more information"
}
]
4. Conclusion
This post demonstrates how Apache Avro is used with DataWeave 2.2. We looked at writing (serializing) as well as reading (deserializing) data using Apache Avro. The demo source code is available on Github.
Stay updated!
On this blog, I post articles about different technologies like Java, MuleSoft, and much more.
You can get updates for new Posts in your email by subscribing to JavaStreets feed here -
Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Jakarta EE enthusiast, MuleSoft Integration Architect, MuleSoft Community Ambassador, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!