DataWeave 2.2: Compare lists using Arrays module

September, 29 2019
Manik Magar
mule
dataweave2
arrays
dataweave

1. Overview

Mule Runtime 4.2 was released with DataWeave 2.2 version. Lots of new useful functions are added in Arrays module. In this post, we will see how can we compare two arraylists to find matched and unmatched objects.

Requirements:

Mule Runtime 4.2.0 and above
DataWeave 2.2

If you are a DataWeave 1 user and haven’t seen DataWeave 2 yet, then I suggest you check DataWeave 2 Syntax changes post to get you started.

2. DataWeave 2.2 Arrays Module

DataWeave 2.2 comes with many new functions like drop, dropWhile, join, leftJoin, outerJoin and more in dw::core::Arrays module. These functions now provide different ways to operate and transform array data.

Let’s consider an use case of receiving books feed from two different sources. Some books may exist in both feeds. To prevent duplicate processing of the books, we want to compare these two lists and find the matched as well as unmatched items between lists.

list1.json: Books List 1 - List of some books from Harry Potter series

[
	{
		"title": "Harry Potter and the Sorcerer's Stone",
		"ASIN": "B0192CTMYG"
	},
	{
		"title": "Harry Potter and the Chamber of Secrets", (1)
		"ASIN": "B0192CTMW8"
	},
	{
		"title": "Harry Potter and the Deathly Hallows",
		"ASIN": "B0192CTMWS"
	},
	{
		"title": "Harry Potter and the Goblet of Fire", (1)
		"ASIN": "B0192CTMUU"
	},
	{
		"title": "Harry Potter and the Order of the Phoenix", (1)
		"ASIN": "B0192CTMXM"
	}
]

1	Exist in both lists

list2.json: Books List 2 - List of some books from Harry Potter series

[
	{
		"title": "Harry Potter and the Prisoner of Azkaban",
		"ASIN": "B0192CTMX2"
	},
	{
		"title": "Harry Potter and the Goblet of Fire",	(1)
		"ASIN": "B0192CTMUU"
	},
	{
		"title": "Harry Potter and the Chamber of Secrets", (1)
		"ASIN": "B0192CTMW8"
	},
	{
		"title": "Harry Potter and the Order of the Phoenix", (1)
		"ASIN": "B0192CTMXM"
	},
	{
		"title": "Harry Potter and the Half-Blood Prince",
		"ASIN": "B0192CTMWI"
	}
]

1	Exist in both lists

2.1 Arrays.outerJoin()

The new outerJoin function from Arrays module is similar to outer join in sql programming. It returns all records from left array list that satisfies the given criteria with right array list and all items from both lists that does not satisfy criteria.

Let’s just use this function to compare lists with books ASIN values.

outerJoin: Plain outerjoin on both lists

%dw 2.0
import * from dw::core::Arrays
output application/json
---
outerJoin(list1, list2, (obj) -> (obj.ASIN), (obj) -> (obj.ASIN))

This outputs a collection of comparios objects that contains l (left) item and/or r (right) objects. The most important part of that are last two arguments of the function which is a mapper function to prepare objects for comparison. These functions could be as simple as just returning one attribute value, like we have. In some case, they could be complex.

outerJoin: Output

[
  {
    "l": { (1)
      "title": "Harry Potter and the Sorcerer's Stone",
      "ASIN": "B0192CTMYG"
    }
  },
  {
    "l": { (2)
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    },
    "r": { (2)
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Deathly Hallows",
      "ASIN": "B0192CTMWS"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    },
    "r": {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    }
  },
  {
    "l": {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    },
    "r": {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    }
  },
  {
    "r": { (3)
      "title": "Harry Potter and the Prisoner of Azkaban",
      "ASIN": "B0192CTMX2"
    }
  },
  {
    "r": {
      "title": "Harry Potter and the Half-Blood Prince",
      "ASIN": "B0192CTMWI"
    }
  }
]

1	Existence of only `l` indicates no match on `list2` (right)
2	Existence of `l` and `r` indicates object exists in both lists.
3	Existence of only `r` indicates no match on `list1` (left)

So, now we have some comparison output for both lists. Can we simplify this object structure? Model of Book object is same in list1 and list2. It should be okay for us to try merging those and just consider single object.

2.2 Merge with reduce

Let’s try to create a final object of structure { 'matched': [], 'unmatched': []}. Each of those arrays will contain a collection of Book objects.

mergeWithReduce: Reduce objects to simplify structure

%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN) (1)
---
joinedData
                    reduce ((item, acc = { 'matched': [], 'unmatched': []}) 		(2)
                            ->  if(item.l != null and item.r != null)
                                    { matched: (acc.matched default []) ++ [item.l], unmatched: acc.unmatched }  (3)
                                    else { matched: acc.matched, unmatched: (acc.unmatched default []) ++ [ if(item.l != null) item.l else item.r ]} ) (4)

1	`outerJoin` the lists to get comparison
2	Use `reduce` with initial accumulator object with empty arrays.
3	When `l` and `r` both exists i.e. matched, add it to `matched` array. Retain the `unmatched` array from accumulator.
4	When either `l` or `r` are null i.e. unmatched, add it to `unmatched` array. Retain the `matched` array from accumulator.

So the output of this transformation would look like below -

mergedOutputJson: Merged single objects

{
  "matched": [ (1)
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "ASIN": "B0192CTMW8"
    },
    {
      "title": "Harry Potter and the Goblet of Fire",
      "ASIN": "B0192CTMUU"
    },
    {
      "title": "Harry Potter and the Order of the Phoenix",
      "ASIN": "B0192CTMXM"
    }
  ],
  "unmatched": [ (2)
    {
      "title": "Harry Potter and the Sorcerer's Stone",
      "ASIN": "B0192CTMYG"
    },
    {
      "title": "Harry Potter and the Deathly Hallows",
      "ASIN": "B0192CTMWS"
    },
    {
      "title": "Harry Potter and the Prisoner of Azkaban",
      "ASIN": "B0192CTMX2"
    },
    {
      "title": "Harry Potter and the Half-Blood Prince",
      "ASIN": "B0192CTMWI"
    }
  ]
}

1	List of objects that exist in both input lists
2	List of objects that exist in any one list

With this output, now we know how to process our book objects without worrying about duplicate processing.

2.3 Alternate reducing: Single list of unique books

If we don’t need to know matched and unmatched separately, our reduce can become more simpler as shown in following listing -

mergeWithReduce: Single list of unique books

%dw 2.0
import * from dw::core::Arrays
output application/json
var joinedData = outerJoin(list1, list2, (obj) -> obj.ASIN, (obj) -> obj.ASIN)
---
joinedData
                    reduce ((item, acc = []) (1)
                            ->  acc ++ [ if(item.l != null) item.l else item.r ] ) (2)

1	Accumulator initialized as an array.
2	Pick `l` when exist, otherwise `r`.

This will produce a single list of unique books between both lists -

mergeWithReduceOutput: Single list of unique books

[
  {
    "title": "Harry Potter and the Sorcerer's Stone",
    "ASIN": "B0192CTMYG"
  },
  {
    "title": "Harry Potter and the Chamber of Secrets",
    "ASIN": "B0192CTMW8"
  },
  {
    "title": "Harry Potter and the Deathly Hallows",
    "ASIN": "B0192CTMWS"
  },
  {
    "title": "Harry Potter and the Goblet of Fire",
    "ASIN": "B0192CTMUU"
  },
  {
    "title": "Harry Potter and the Order of the Phoenix",
    "ASIN": "B0192CTMXM"
  },
  {
    "title": "Harry Potter and the Prisoner of Azkaban",
    "ASIN": "B0192CTMX2"
  },
  {
    "title": "Harry Potter and the Half-Blood Prince",
    "ASIN": "B0192CTMWI"
  }
]

3. How to do it without DataWeave 2.2?

DataWeave 2.0 does not include these additional functions of Arrays module. So, how should we achieve these results with DataWeave 2.0?

3.1 Merge into matched and unmatched collection

mergeWithReduce: Reduce objects to simplify structure

%dw 2.0
output application/json
var list1ASINs = list1.ASIN
var list2ASINs = list2.ASIN
---
{
	matched: list1 filter contains(list2ASINs, $.ASIN),
	unmatched: (list2 filter (contains(list1ASINs, $.ASIN) == false))  ++ (list1 filter (contains(list2ASINs, $.ASIN) == false))
}

Above script gives the same result as in section 2.2 above. Something to notice in this script is, usage of filter functions. With large sized list1 and list2 these nested iterations could affect the performance.

3.2 Alternate reducing: Single list of unique books

This is probably easy to do even without new Arrays functions. Consider following script which gives the same output as in section 2.3 above.

mergeWithReduce: Reduce objects to simplify structure

%dw 2.0
output application/json
var list1ASINs = list1.ASIN
---
list1 ++ (list2 filter (contains(list1ASINs,$.ASIN) == false) )

4. Conclusion

This post demonstrates different ways to compare and merge array lists using new functions from DataWeave 2.2 as well as DataWeave 2.0.

If you liked this post, you can take a look at other DataWeave related posts.

This post was inpsired by a question on Mule Forum.

Follow @manikmagar on twitter to get updates on new posts.

DataWeave 2.2: Compare lists using Arrays module

1. Overview

2. DataWeave 2.2 Arrays Module

2.1 Arrays.outerJoin()

2.2 Merge with reduce

2.3 Alternate reducing: Single list of unique books

3. How to do it without DataWeave 2.2?

3.1 Merge into matched and unmatched collection

3.2 Alternate reducing: Single list of unique books

4. Conclusion

Stay updated!