DataWeave 2.2: Compare lists using Arrays module


1. Overview

Mule Runtime 4.2 was released with DataWeave 2.2 version. Lots of new useful functions are added in Arrays module. In this post, we will see how can we compare two arraylists to find matched and unmatched objects.

Requirements:

  • Mule Runtime 4.2.0 and above

  • DataWeave 2.2

If you are a DataWeave 1 user and haven’t seen DataWeave 2 yet, then I suggest you check DataWeave 2 Syntax changes post to get you started.

2. DataWeave 2.2 Arrays Module

DataWeave 2.2 comes with many new functions like drop, dropWhile, join, leftJoin, outerJoin and more in dw::core::Arrays module. These functions now provide different ways to operate and transform array data.

Let’s consider an use case of receiving books feed from two different sources. Some books may exist in both feeds. To prevent duplicate processing of the books, we want to compare these two lists and find the matched as well as unmatched items between lists.

list1.json: Books List 1 - List of some books from Harry Potter series
[
	{
		"title": "Harry Potter and the Sorcerer's Stone",
		"ASIN": "B0192CTMYG"
	},
	{
		"title": "Harry Potter and the Chamber of Secrets", (1)
		"ASIN": "B0192CTMW8"
	},
	{
		"title": "Harry Potter and the Deathly Hallows",
		"ASIN": "B0192CTMWS"
	},
	{
		"title": "Harry Potter and the Goblet of Fire", (1)
		"ASIN": "B0192CTMUU"
	},
	{
		"title": "Harry Potter and the Order of the Phoenix", (1)
		"ASIN": "B0192CTMXM"
	}
]
1 Exist in both lists
list2.json: Books List 2 - List of some books from Harry Potter series
[
	{
		"title": "Harry Potter and the Prisoner of Azkaban",
		"ASIN": "B0192CTMX2"
	},
	{
		"title": "Harry Potter and the Goblet of Fire",	(1)
		"ASIN": "B0192CTMUU"
	},
	{
		"title": "Harry Potter and the Chamber of Secrets", (1)
		"ASIN": "B0192CTMW8"
	},
	{
		"title": "Harry Potter and the Order of the Phoenix", (1)
		"ASIN": "B0192CTMXM"
	},
	{
		"title": "Harry Potter and the Half-Blood Prince",
		"ASIN": "B0192CTMWI"
	}
]
1 Exist in both lists

2.1 Arrays.outerJoin()

The new outerJoin function from Arrays module is similar to outer join in sql programming. It returns all records from left array list that satisfies the given criteria with right array list and all items from both lists that does not satisfy criteria.

Let’s just use this function to compare lists with books ASIN values.

outerJoin: Plain outerjoin on both lists
%dw 2.0
import * from dw::core::Arrays
output application/json
---
outerJoin(list1, list2, (obj) -> (obj.ASIN), (obj) -> (obj.ASIN))

This outputs a collection of comparios objects that contains l (left) item and/or r (right) objects. The most important part of that are last two arguments of the function which is a mapper function to prepare objects for comparison. These functions could be as simple as just returning one attribute value, like we have. In some case, they could be complex.

outerJoin: Output
[
  {
    "l": { (1)
      "title": "Harry Potter and the Sorcerer's Stone",
      "ASIN": "B0192CTMYG"
    }
  },
  {
    "l": { (2)
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    },
    "r": { (2)
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Deathly Hallows",
      "ASIN": "B0192CTMWS"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    },
    "r": {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    },
    "r": {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    }
  },
  {
    "r": { (3)
      "title": "Harry Potter and the Prisoner of Azkaban",
      "ASIN": "B0192CTMX2"
    }
  },
  {
    "r": {
      "title": "Harry Potter and the Half-Blood Prince",
      "ASIN": "B0192CTMWI"
    }
  }
]
1 Existence of only l indicates no match on list2 (right)
2 Existence of l and r indicates object exists in both lists.
3 Existence of only r indicates no match on list1 (left)

So, now we have some comparison output for both lists. Can we simplify this object structure? Model of Book object is same in list1 and list2. It should be okay for us to try merging those and just consider single object.

2.2 Merge with reduce

Let’s try to create a final object of structure { 'matched': [], 'unmatched': []}. Each of those arrays will contain a collection of Book objects.

mergeWithReduce: Reduce objects to simplify structure
%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN) (1)
---
joinedData
                    reduce ((item, acc = { 'matched': [], 'unmatched': []}) 		(2)
                            ->  if(item.l != null and item.r != null)
                                    { matched: (acc.matched default []) ++ [item.l], unmatched: acc.unmatched }  (3)
                                    else { matched: acc.matched, unmatched: (acc.unmatched default []) ++ [ if(item.l != null) item.l else item.r ]} ) (4)
1 outerJoin the lists to get comparison
2 Use reduce with initial accumulator object with empty arrays.
3 When l and r both exists i.e. matched, add it to matched array. Retain the unmatched array from accumulator.
4 When either l or r are null i.e. unmatched, add it to unmatched array. Retain the matched array from accumulator.

So the output of this transformation would look like below -

mergedOutputJson: Merged single objects
{
  "matched": [ (1)
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    },
    {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    },
    {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    }
  ],
  "unmatched": [ (2)
    {
      "title": "Harry Potter and the Sorcerer's Stone",
      "ASIN": "B0192CTMYG"
    },
    {
      "title": "Harry Potter and the Deathly Hallows",
      "ASIN": "B0192CTMWS"
    },
    {
      "title": "Harry Potter and the Prisoner of Azkaban",
      "ASIN": "B0192CTMX2"
    },
    {
      "title": "Harry Potter and the Half-Blood Prince",
      "ASIN": "B0192CTMWI"
    }
  ]
}
1 List of objects that exist in both input lists
2 List of objects that exist in any one list

With this output, now we know how to process our book objects without worrying about duplicate processing.

2.3 Alternate reducing: Single list of unique books

If we don’t need to know matched and unmatched separately, our reduce can become more simpler as shown in following listing -

mergeWithReduce: Single list of unique books
%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN)
---
joinedData
                    reduce ((item, acc = []) (1)
                            ->  acc ++ [ if(item.l != null) item.l else item.r ] ) (2)
1 Accumulator initialized as an array.
2 Pick l when exist, otherwise r.

This will produce a single list of unique books between both lists -

mergeWithReduceOutput: Single list of unique books
[
  {
    "title": "Harry Potter and the Sorcerer's Stone",
    "ASIN": "B0192CTMYG"
  },
  {
    "title": "Harry Potter and the Chamber of Secrets",
    "ASIN": "B0192CTMW8"
  },
  {
    "title": "Harry Potter and the Deathly Hallows",
    "ASIN": "B0192CTMWS"
  },
  {
    "title": "Harry Potter and the Goblet of Fire",
    "ASIN": "B0192CTMUU"
  },
  {
    "title": "Harry Potter and the Order of the Phoenix",
    "ASIN": "B0192CTMXM"
  },
  {
    "title": "Harry Potter and the Prisoner of Azkaban",
    "ASIN": "B0192CTMX2"
  },
  {
    "title": "Harry Potter and the Half-Blood Prince",
    "ASIN": "B0192CTMWI"
  }
]

3. How to do it without DataWeave 2.2?

DataWeave 2.0 does not include these additional functions of Arrays module. So, how should we achieve these results with DataWeave 2.0?

3.1 Merge into matched and unmatched collection

mergeWithReduce: Reduce objects to simplify structure
%dw 2.0
output application/json
var list1ASINs = list1.ASIN
var list2ASINs = list2.ASIN
---
{
	matched: list1 filter contains(list2ASINs, $.ASIN),
	unmatched: (list2 filter (contains(list1ASINs, $.ASIN) == false))  ++ (list1 filter (contains(list2ASINs, $.ASIN) == false))
}

Above script gives the same result as in section 2.2 above. Something to notice in this script is, usage of filter functions. With large sized list1 and list2 these nested iterations could affect the performance.

3.2 Alternate reducing: Single list of unique books

This is probably easy to do even without new Arrays functions. Consider following script which gives the same output as in section 2.3 above.

mergeWithReduce: Reduce objects to simplify structure
%dw 2.0
output application/json
var list1ASINs = list1.ASIN
---
list1 ++ (list2 filter (contains(list1ASINs,$.ASIN) == false) )

4. Conclusion

This post demonstrates different ways to compare and merge array lists using new functions from DataWeave 2.2 as well as DataWeave 2.0.

If you liked this post, you can take a look at other DataWeave related posts.

This post was inpsired by a question on Mule Forum.

on twitter to get updates on new posts.

Stay updated!

I usually post about Java, Java EE, Integrations, Mule ESB and other things in java ecosystem.

Get updates for new Posts in your email by subscribing to JavaStreets feed here -


Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Java EE enthusiast, MuleSoft Integration Consultant, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!