DataWeave 2 - JSON Writer Properties


In this post, we will look at DataWeave’s JSON Writer properties. DataWeave 2 is a powerful transformation language. It allows your to convert data from one format to another. When doing these data transformations, you may want to control how the output is written. Let’s learn how you can do that.

Requirements:

  • Mule Runtime 4.x

  • DataWeave 2.x

1. JSON Transformations

DataWeave supports transformations from and to JSON. Here is a simple example of converting XML into JSON.

Table 1. Simple JSON Transformation
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name>Earth</name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json
---
payload.planets.*planet map ((item, index) -> {
    name: item.name
})
[
  {
    "name": "Mercury"
  },
  {
    "name": "Venus"
  },
  {
    "name": "Earth"
  }
]

2. Writer Properties

If you observe the script in previous section, you see output application/json in the header section. This tells DataWeave that intended output is JSON and use JSON writer for this transformation.

It is possible to customize the behavior of this writer and change the way output (still JSON) is written.

2.1 Compact output

The default output for JSON is a well-indented, prettified, easy to read format. For large sized files, a compact output may save lots of network traffic.

This is where we can use indent=true/false writer property. Let’s modify our previous example to use this property.

Default: true.

Table 2. Simple JSON Transformation
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name>Earth</name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json indent=false (1)
---
payload.planets.*planet map ((item, index) -> {
    name: item.name
})
[{"name": "Mercury"},{"name": "Venus"},{"name": "Earth"}]
1 Indentation is disabled.

To see the impact of this attribute, lets do a simple exercise. Consider following DataWeave script which just generates 1000 json objects.

DataWeave Indentation test script
%dw 2.0
output application/json indent=false
---
(1 to 1000) map {
    name: "name " ++ $
}

Execute it with indent=true, save the output to a file (test-formatted.json).

Now, run it with indent=false and save the output to another file (test-compact.json).

Let’s look at the file size difference. Compact output size is almost 35% smaller in size (on disk) than the pretty formatted.

Size Name
20k test-compact.json
31k test-formatted.json
When transmitting over the network, every byte counts! Less is better :).

2.2 Skipping Null values

When transforming any data, it is not unusual to have some attributes or objects not available. The resulted data may represent such with null values.

Table 3. JSON with Nulls
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name></name>	(1)
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json
---
payload.planets.*planet map ((item, index) -> {
    name: item.name
}) ++ [ null ]	(2)
[
  {
    "name": "Mercury"
  },
  {
    "name": "Venus"
  },
  {
    "name": null	(3)
  },
  null (4)
]
1 [Payload] Third planet is missing a name, the value that we are mapping.
2 [Script] An intentional addition of null for demo purpose.
3 [Output] Attribute resulted in null for third planet.
4 [Output] A null item in the array.

Existence of these null values or objects in the output can be controlled with skipOnNull property on writer.

This property allows three values -

  • arrays: Ignore and omits null values from arrays.

  • objects: Ignore and omits all keys with null value. If all keys of an object are ignored, then leaves an empty {} object.

  • everywhere: Combination of arrays and objects option.

Default: null, do not skip null values.

Let’s see how each of these options affect the output.

Table 4. JSON with Nulls - Skip Null On 'arrays'
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name></name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json skipNullOn="arrays" (1)
---
payload.planets.*planet map ((item, index) -> {
    name: item.name
}) ++ [ null ]
[
  {
    "name": "Mercury"
  },
  {
    "name": "Venus"
  },
  {
    "name": null (3)
  }
	(2)
]
1 [Script] Set writer property it ignore null in arrays.
2 [Output] Compare this output with the default output. The fourth null object instance is omitted from the output.
3 [Output] Notice that the null object attributes are still present in output.
Table 5. JSON with Nulls - Skip Null On 'objects'
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name></name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json skipNullOn="objects" (1)
---
payload.planets.*planet map ((item, index) -> {
    name: item.name,
    dummy: null		(2)
}) ++ [ null ]
[
  {
    "name": "Mercury" (3)
  },
  {
    "name": "Venus"
  },
  {
	  (4)
  },
  null	(5)
]
1 [Script] Set writer property to ignore null in objects.
2 [Script] Adds a dummy key with null value for demo purpose.
3 [Output] Writer omits the dummy key from output in every object.
4 [Output] The third planet name is missing. All attribute values for third object resolves to null. So, Writer omits all attributes but keeps an empty object.
5 [Output] Notice null object in array is still present in output.
Table 6. JSON with Nulls - Skip Null On 'everywhere'
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name></name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json skipNullOn="everywhere" (1)
---
payload.planets.*planet map ((item, index) -> {
    name: item.name,
    dummy: null
}) ++ [ null ]
[
  {
    "name": "Mercury" (2)
  },
  {
    "name": "Venus"
  },
  {
	  (3)
  }
	(4)
]
1 [Script] Set writer property to ignore null everywhere.
2 [Output] Skip on objects effect - Writer omits the dummy key from output in every object.
3 [Output] Skip on objects effect - The third planet name is missing. All attribute values for third object resolves to null. So, writer omits all attributes and just adds an empty object.
4 [Output] Skip on arrays effect - The fourth null object instance is omitted from the output.

2.3 Handling Duplicate Keys

Generally, JSON structures are recommended to have unique key names under same parent. RFC8259 define rules for JSON standard. It notes "SHOULD", not "MUST" for uniqueness.

The names within an object SHOULD be unique. An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.
— RFC8259 4. Objects

The output structure in following example contains duplicate keys for planet.

Table 7. JSON with duplicate keys
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name>Earth</name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json
---
payload
{
  "planets": {
    "planet": {
      "name": "Mercury",
      "diameter": "4880"
    },
    "planet": {
      "name": "Venus",
      "diameter": "12103.6"
    },
    "planet": {
      "name": "Earth",
      "orbit": "149600000"
    }
  }
}
JSON Structure with duplicate keys is fully readable by DataWeave. If you input this to DataWeave, it doesn’t fail reading it.

If you are working with system that does not allow duplicate keys. There is one way to avoid it. Using duplicateKeyAsArray writer property can convert duplicate key objects into an array.

Default: false.

Table 8. JSON with duplicate keys as an array
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name>Earth</name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json duplicateKeyAsArray=true  (1)
---
payload
{
  "planets": {
    "planet": [ (2)
    {
      "name": "Mercury",
      "diameter": "4880"
    },
    {
      "name": "Venus",
      "diameter": "12103.6"
    },
    {
      "name": "Earth",
      "orbit": "149600000"
    }
    ]
  }
}
1 [Script] Set the property duplicateKeyAsArray.
2 [Output] Output contains planet as an array of objects instead of 3 objects with same key.
If any payload comes with single planet, the output will not contain planet as an array but it will have planet as an object. This could be treated as an unpredictable output for same script. The recommended way is to write a script that always produces same result types.

2.4 Writing key attributes

Some formats such as XML, can have attributes for keys. For example, look at the xml payload in any of the examples we have seen above. You will see each planet entry in xml payload has an id attribute. But you don’t see that id in any of the outputs for all those examples. One way to get it is to explicitly map these attributes.

Another way is to instruct writer for including all attributes. You can do so by setting writeAttributes to true. All attributes of a key in payload will be then added as children of the key in output. The key names for these new elements start with @ and original attribute name is used.

Default: false.

This property is introduced in DatWeave 2.3 with Mule Runtime 4.3.0 release.
Table 9. JSON with write attributes
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Mercury</name>
        <diameter>4880</diameter>
    </planet>
	<planet id="2">
        <name>Venus</name>
        <diameter>12103.6</diameter>
    </planet>
	<planet id="3">
        <name>Earth</name>
        <orbit>149600000</orbit>
    </planet>
</planets>
%dw 2.0
output application/json writeAttributes=true  (1)
---
payload
{
  "planets": {
    "planet": {
      "@id": "1", (2)
      "name": "Mercury",
      "diameter": "4880"
    },
    "planet": {
      "@id": "2",
      "name": "Venus",
      "diameter": "12103.6"
    },
    "planet": {
      "@id": "3",
      "name": "Earth",
      "orbit": "149600000"
    }
  }
}
1 [Script] Set the property writeAttributes.
2 [Output] id attribute generated in the output.

2.5 Character Encoding

Sometime character may not render correctly if wrong encoding is used. To avoid that, You can instruct writer to user a specific character encoding.

Default: UTF-8.

In following example, you can see that the Mercury is written in greek. When writer encoding is set to ASCII, the output writes it as "?????". ASCII encoding does not support those characters.

Table 10. JSON with encoding
Payload (application/xml) Script Output (application/json)
<?xml version="1.0" encoding="UTF-8"?>
<planets>
	<planet id="1">
        <name>Ερμής</name>  (1)
        <diameter>4880</diameter>
    </planet>
</planets>
%dw 2.0
output application/json encoding="ASCII"  (2)
---
payload
{
  "planets": {
    "planet": {
      "name": "?????",  (3)
      "diameter": "4880"
    }
  }
}
1 Name in greek language.
2 Encoding set to ASCII. Change this to UTF-8.
3 With ASCII, characters are not rendered correctly. But if you change it to UTF-8, they render as expected.

2.5 Other runtime properties

In addition to the properties we saw above, there are two properties that can affect writing process.

2.5.1 Deferred output

When dataweave processor runs, it executes the script to generate the output. In case of large date transformations, you may want to delay the transformation until it is really needed. For example, when transmitting transformed output over HTTP or writing to a file.

In that case, you can use deferred attribute. This can either generate an immediate result of transformation script OR delay it until it is read.

Default: false.

You can see the difference in payload for deferred=false and deferred=true when running "DataWeave Indentation test script" from section 2.1 above. When deferred, the output is a stream.

DataWeave JSON Immediate output
DataWeave JSON deferred output

2.5.2 Buffered size

If need, you can change the buffer size of the writer.

Default: 8192.

So that was all the DataWeave JSON writer properties.

3. Conclusion

This post explains JSON writer properties for DataWeave 2. We looked at some example transformations to understand how those properties affect the writer.

Feel free to take a look at more DataWeave posts.

on twitter to get updates on new posts.

Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Java EE enthusiast, MuleSoft Integration Consultant, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!

=