DataWeave 2.2: Compare lists using Arrays module
1. Overview
Mule Runtime 4.2 was released with DataWeave 2.2 version. Lots of new useful functions are added in Arrays module. In this post, we will see how can we compare two arraylists to find matched and unmatched objects.
Requirements:
-
Mule Runtime 4.2.0 and above
-
DataWeave 2.2
If you are a DataWeave 1 user and haven’t seen DataWeave 2 yet, then I suggest you check DataWeave 2 Syntax changes post to get you started.
2. DataWeave 2.2 Arrays Module
DataWeave 2.2 comes with many new functions like drop
, dropWhile
, join
, leftJoin
, outerJoin
and more in dw::core::Arrays
module. These functions now provide different ways to operate and transform array data.
Let’s consider an use case of receiving books feed from two different sources. Some books may exist in both feeds. To prevent duplicate processing of the books, we want to compare these two lists and find the matched as well as unmatched items between lists.
[
{
"title": "Harry Potter and the Sorcerer's Stone",
"ASIN": "B0192CTMYG"
},
{
"title": "Harry Potter and the Chamber of Secrets", (1)
"ASIN": "B0192CTMW8"
},
{
"title": "Harry Potter and the Deathly Hallows",
"ASIN": "B0192CTMWS"
},
{
"title": "Harry Potter and the Goblet of Fire", (1)
"ASIN": "B0192CTMUU"
},
{
"title": "Harry Potter and the Order of the Phoenix", (1)
"ASIN": "B0192CTMXM"
}
]
1 | Exist in both lists |
[
{
"title": "Harry Potter and the Prisoner of Azkaban",
"ASIN": "B0192CTMX2"
},
{
"title": "Harry Potter and the Goblet of Fire", (1)
"ASIN": "B0192CTMUU"
},
{
"title": "Harry Potter and the Chamber of Secrets", (1)
"ASIN": "B0192CTMW8"
},
{
"title": "Harry Potter and the Order of the Phoenix", (1)
"ASIN": "B0192CTMXM"
},
{
"title": "Harry Potter and the Half-Blood Prince",
"ASIN": "B0192CTMWI"
}
]
1 | Exist in both lists |
2.1 Arrays.outerJoin()
The new outerJoin
function from Arrays module is similar to outer join
in sql programming. It returns all records from left
array list that satisfies the given criteria with right
array list and all items from both lists that does not satisfy criteria.
Let’s just use this function to compare lists with books ASIN values.
%dw 2.0
import * from dw::core::Arrays
output application/json
---
outerJoin(list1, list2, (obj) -> (obj.ASIN), (obj) -> (obj.ASIN))
This outputs a collection of comparios objects that contains l
(left) item and/or r
(right) objects. The most important part of that are last two arguments of the function which is a mapper function to prepare objects for comparison. These functions could be as simple as just returning one attribute value, like we have. In some case, they could be complex.
[
{
"l": { (1)
"title": "Harry Potter and the Sorcerer's Stone",
"ASIN": "B0192CTMYG"
}
},
{
"l": { (2)
"title": "Harry Potter and the Chamber of Secrets",
"ASIN": "B0192CTMW8"
},
"r": { (2)
"title": "Harry Potter and the Chamber of Secrets",
"ASIN": "B0192CTMW8"
}
},
{
"l": {
"title": "Harry Potter and the Deathly Hallows",
"ASIN": "B0192CTMWS"
}
},
{
"l": {
"title": "Harry Potter and the Goblet of Fire",
"ASIN": "B0192CTMUU"
},
"r": {
"title": "Harry Potter and the Goblet of Fire",
"ASIN": "B0192CTMUU"
}
},
{
"l": {
"title": "Harry Potter and the Order of the Phoenix",
"ASIN": "B0192CTMXM"
},
"r": {
"title": "Harry Potter and the Order of the Phoenix",
"ASIN": "B0192CTMXM"
}
},
{
"r": { (3)
"title": "Harry Potter and the Prisoner of Azkaban",
"ASIN": "B0192CTMX2"
}
},
{
"r": {
"title": "Harry Potter and the Half-Blood Prince",
"ASIN": "B0192CTMWI"
}
}
]
1 | Existence of only l indicates no match on list2 (right) |
2 | Existence of l and r indicates object exists in both lists. |
3 | Existence of only r indicates no match on list1 (left) |
So, now we have some comparison output for both lists. Can we simplify this object structure? Model of Book object is same in list1
and list2
. It should be okay for us to try merging those and just consider single object.
2.2 Merge with reduce
Let’s try to create a final object of structure { 'matched': [], 'unmatched': []}
. Each of those arrays will contain a collection of Book objects.
%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN) (1)
---
joinedData
reduce ((item, acc = { 'matched': [], 'unmatched': []}) (2)
-> if(item.l != null and item.r != null)
{ matched: (acc.matched default []) ++ [item.l], unmatched: acc.unmatched } (3)
else { matched: acc.matched, unmatched: (acc.unmatched default []) ++ [ if(item.l != null) item.l else item.r ]} ) (4)
1 | outerJoin the lists to get comparison |
2 | Use reduce with initial accumulator object with empty arrays. |
3 | When l and r both exists i.e. matched, add it to matched array. Retain the unmatched array from accumulator. |
4 | When either l or r are null i.e. unmatched, add it to unmatched array. Retain the matched array from accumulator. |
So the output of this transformation would look like below -
{
"matched": [ (1)
{
"title": "Harry Potter and the Chamber of Secrets",
"ASIN": "B0192CTMW8"
},
{
"title": "Harry Potter and the Goblet of Fire",
"ASIN": "B0192CTMUU"
},
{
"title": "Harry Potter and the Order of the Phoenix",
"ASIN": "B0192CTMXM"
}
],
"unmatched": [ (2)
{
"title": "Harry Potter and the Sorcerer's Stone",
"ASIN": "B0192CTMYG"
},
{
"title": "Harry Potter and the Deathly Hallows",
"ASIN": "B0192CTMWS"
},
{
"title": "Harry Potter and the Prisoner of Azkaban",
"ASIN": "B0192CTMX2"
},
{
"title": "Harry Potter and the Half-Blood Prince",
"ASIN": "B0192CTMWI"
}
]
}
1 | List of objects that exist in both input lists |
2 | List of objects that exist in any one list |
With this output, now we know how to process our book objects without worrying about duplicate processing.
2.3 Alternate reducing: Single list of unique books
If we don’t need to know matched
and unmatched
separately, our reduce can become more simpler as shown in following listing -
%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN)
---
joinedData
reduce ((item, acc = []) (1)
-> acc ++ [ if(item.l != null) item.l else item.r ] ) (2)
1 | Accumulator initialized as an array. |
2 | Pick l when exist, otherwise r . |
This will produce a single list of unique books between both lists -
[
{
"title": "Harry Potter and the Sorcerer's Stone",
"ASIN": "B0192CTMYG"
},
{
"title": "Harry Potter and the Chamber of Secrets",
"ASIN": "B0192CTMW8"
},
{
"title": "Harry Potter and the Deathly Hallows",
"ASIN": "B0192CTMWS"
},
{
"title": "Harry Potter and the Goblet of Fire",
"ASIN": "B0192CTMUU"
},
{
"title": "Harry Potter and the Order of the Phoenix",
"ASIN": "B0192CTMXM"
},
{
"title": "Harry Potter and the Prisoner of Azkaban",
"ASIN": "B0192CTMX2"
},
{
"title": "Harry Potter and the Half-Blood Prince",
"ASIN": "B0192CTMWI"
}
]
3. How to do it without DataWeave 2.2?
DataWeave 2.0 does not include these additional functions of Arrays module. So, how should we achieve these results with DataWeave 2.0?
3.1 Merge into matched and unmatched collection
%dw 2.0
output application/json
var list1ASINs = list1.ASIN
var list2ASINs = list2.ASIN
---
{
matched: list1 filter contains(list2ASINs, $.ASIN),
unmatched: (list2 filter (contains(list1ASINs, $.ASIN) == false)) ++ (list1 filter (contains(list2ASINs, $.ASIN) == false))
}
Above script gives the same result as in section 2.2 above. Something to notice in this script is, usage of filter
functions. With large sized list1
and list2
these nested iterations could affect the performance.
3.2 Alternate reducing: Single list of unique books
This is probably easy to do even without new Arrays functions. Consider following script which gives the same output as in section 2.3 above.
%dw 2.0
output application/json
var list1ASINs = list1.ASIN
---
list1 ++ (list2 filter (contains(list1ASINs,$.ASIN) == false) )
4. Conclusion
This post demonstrates different ways to compare and merge array lists using new functions from DataWeave 2.2 as well as DataWeave 2.0.
If you liked this post, you can take a look at other DataWeave related posts.
This post was inpsired by a question on Mule Forum.
Stay updated!
On this blog, I post articles about different technologies like Java, MuleSoft, and much more.
You can get updates for new Posts in your email by subscribing to JavaStreets feed here -
Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Jakarta EE enthusiast, MuleSoft Integration Architect, MuleSoft Community Ambassador, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!