DataWeave 2 - Nested Data Structure Traversal and enrichment with state


When a nested data structure is provided, can we traverse and enrich it in dataweave? What if we need to know the state of last transformed record to map next one? Let’s check it out in this post.

One of my friend pointed me to a git repository josevalim/nested-data-structure-traversal that described an example use case for transformation. It defined an input structure and transformation requirements. If you browse that repo, you will see solutions written in different programming languages or frameworks. Here, we are going to attempt a solution using DataWeave 2.

For reading easiness, I am re-iterating details form that repository.

The problem statement on the repository -

The algorithm should receive a list of sections. A section is a key-value data structure, with a "title", a "reset_lesson_position" boolean, and a list of "lessons". A lesson is a key-value data structure with the "name" field.

Your job is to traverse the list of sections, adding a position (starting from 1) to each section, and traverse the list of lessons adding a position (starting from 1) to each lesson. Note, however, the lessons position is shared across sections. The lesson position should also be reset if "reset_lesson_position" is true.

Given following lesson data -

Input nested lessons data
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "lessons": [
      {"name": "Welcome"},
      {"name": "Installation"}
    ]
  },

  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "lessons": [
      {"name": "Addition / Subtraction"},
      {"name": "Multiplication / Division"}
    ]
  },

  {
    "title": "Advanced topics",
    "reset_lesson_position": true,
    "lessons": [
      {"name": "Mutability"},
      {"name": "Immutability"}
    ]
  }
]

Then the output should be -

Output Structure
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "position": 1,  (1)
    "lessons": [
      {"name": "Welcome", "position": 1}, (2)
      {"name": "Installation", "position": 2}
    ]
  },

  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "position": 2, (3)
    "lessons": [
      {"name": "Addition / Subtraction", "position": 3}, (4)
      {"name": "Multiplication / Division", "position": 4}
    ]
  },

  {
    "title": "Advanced topics",
    "reset_lesson_position": true,  (5)
    "position": 3,
    "lessons": [
      {"name": "Mutability", "position": 1},  (6)
      {"name": "Immutability", "position": 2}
    ]
  }
]
1 Category position insertion, a sequential index.
2 Lesson position insertion, a sequential index start.
3 Category position index continuation.
4 Lesson position index continuation across categories.
5 Flag indicating the reset for lesson positions. This category should restart lesson index from 1.
6 Lesson position index restarts at 1.

DataWeave 2 solution

Step 1: Simple map

If we leave out lesson position continuation and reset, then it is pretty straight mapping.

Table 1. Simple mapping
Payload (application/json) Script Output (application/json)
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "lessons": [
      {"name": "Welcome"},
      {"name": "Installation"}
    ]
  },

  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "lessons": [
      {"name": "Addition / Subtraction"},
      {"name": "Multiplication / Division"}
    ]
  },

  {
    "title": "Advanced topics",
    "reset_lesson_position": true,
    "lessons": [
      {"name": "Mutability"},
      {"name": "Immutability"}
    ]
  }
]
%dw 2.0
output application/json
---
payload map ((category, index) -> {
    title: category.title,
    reset_lesson_position: category.reset_lesson_position,
    position: index + 1,
    lessons: category.lessons map ((lesson, idx) -> {
        name: lesson.name,
        position: idx + 1
    })
})
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "position": 1, (1)
    "lessons": [
      {
        "name": "Welcome",
        "position": 1 (2)
      },
      {
        "name": "Installation",
        "position": 2
      }
    ]
  },
  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "position": 2, (3)
    "lessons": [
      {
        "name": "Addition / Subtraction",
        "position": 1 (4)
      },
      {
        "name": "Multiplication / Division",
        "position": 2
      }
    ]
  },
  {
    "title": "Advanced topics",
    "reset_lesson_position": true,
    "position": 3,
    "lessons": [
      {
        "name": "Mutability",
        "position": 1   (5)
      },
      {
        "name": "Immutability",
        "position": 2
      }
    ]
  }
]
1 Category position start
2 Lesson position start
3 Category position continues
4 Issue: Lesson position DOES NOT continue
5 Issue: Lesson position DOES NOT restart

Step 2: Lesson position state management

In last script, the lesson position neither continues across categories, nor it reset when reset_lesson_position is true.

There is no persisted state when executing dataweave script. The good part here is, positions are part of the transformed data. If we get access to the last transformed record, we can use that to find out next index.

Can you think of any function that gives you access to last transformed data while working on next record?

What about reduce function? When transforming, we can look into accumulator to get the last transformed record. Checkout the following transformation -

Table 2. Lesson positions
Payload (application/json) Script Output (application/json)
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "position": 1,
    "lessons": [
      {
        "name": "Welcome",
        "position": 1
      },
      {
        "name": "Installation",
        "position": 2
      }
    ]
  },
  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "position": 2,
    "lessons": [
      {
        "name": "Addition / Subtraction",
        "position": 3
      },
      {
        "name": "Multiplication / Division",
        "position": 4
      }
    ]
  },
  {
    "title": "Advanced topics",
    "reset_lesson_position": true,
    "position": 3,
    "lessons": [
      {
        "name": "Mutability",
        "position": 1
      },
      {
        "name": "Immutability",
        "position": 2
      }
    ]
  },
  {
    "title": "Getting started_2",
    "reset_lesson_position": false,
    "position": 4,
    "lessons": [
      {
        "name": "Welcome",
        "position": 3
      },
      {
        "name": "Installation",
        "position": 4
      }
    ]
  },
  {
    "title": "Basic operator 3",
    "reset_lesson_position": true,
    "position": 5,
    "lessons": [
      {
        "name": "Addition / Subtraction",
        "position": 1
      },
      {
        "name": "Multiplication / Division",
        "position": 2
      }
    ]
  },
  {
    "title": "Advanced topics 4",
    "reset_lesson_position": true,
    "position": 6,
    "lessons": [
      {
        "name": "Mutability",
        "position": 1
      },
      {
        "name": "Immutability",
        "position": 2
      }
    ]
  }
]
%dw 2.0
output application/json
fun toSection(section, sectionIndex, lessonStartIdx) = {
    title: section.title,
    reset_lesson_position: section.reset_lesson_position,
    position: sectionIndex,
    lessons: section.lessons map ((lesson, idx) -> {
        name: lesson.name,
        position: (lessonStartIdx + idx + 1) (4)
    })
}
---
payload reduce ((section, accumulator = []) -> (1)
do {
    var lastSection = accumulator[-1]
    var lastLessonIdx = if(section.reset_lesson_position default true) 0 else lastSection.lessons[-1].position default 0 (2)
    ---
    accumulator + toSection(section, sizeOf(accumulator) + 1, lastLessonIdx) (3)
})
[
  {
    "title": "Getting started",
    "reset_lesson_position": false,
    "position": 1,
    "lessons": [
      {
        "name": "Welcome",
        "position": 1
      },
      {
        "name": "Installation",
        "position": 2
      }
    ]
  },
  {
    "title": "Basic operator",
    "reset_lesson_position": false,
    "position": 2,
    "lessons": [
      {
        "name": "Addition / Subtraction",
        "position": 3 (5)
      },
      {
        "name": "Multiplication / Division",
        "position": 4
      }
    ]
  },
  {
    "title": "Advanced topics",
    "reset_lesson_position": true,
    "position": 3,
    "lessons": [
      {
        "name": "Mutability",
        "position": 1   (6)
      },
      {
        "name": "Immutability",
        "position": 2
      }
    ]
  },
  {
    "title": "Getting started_2",
    "reset_lesson_position": false,
    "position": 4,
    "lessons": [
      {
        "name": "Welcome",
        "position": 3 (5)
      },
      {
        "name": "Installation",
        "position": 4
      }
    ]
  },
  {
    "title": "Basic operator 3",
    "reset_lesson_position": true,
    "position": 5,
    "lessons": [
      {
        "name": "Addition / Subtraction",
        "position": 1   (6)
      },
      {
        "name": "Multiplication / Division",
        "position": 2
      }
    ]
  },
  {
    "title": "Advanced topics 4",
    "reset_lesson_position": true,
    "position": 6,
    "lessons": [
      {
        "name": "Mutability",
        "position": 1   (6)
      },
      {
        "name": "Immutability",
        "position": 2
      }
    ]
  }
]
1 Use reduce to iterate over the collection. do scope will allow us to prepare some variables before mapping.
2 When we see reset_lesson_position as true, use 0 as last lesson index. Otherwise, find the last lesson for last category section from accumulator (so far transformed records).
3 Finally, call our custom function to map the category. We use size of accumulator array to get next position for categories.
4 Use last session index to calculate the next lesson index.
5 Lesson index continues from last category.
6 Lesson index restarts when reset flag is true.

That is how we get the expected output described in transformation use case.

Conclusion

reduce is a very powerful functions and help solve many mapping use cases. Hopefully, you learned something new here.

You can also see other DataWeave 2 related posts.

You can visit the josevalim/nested-data-structure-traversal for any other details. Repository is now archived by the owner, otherwise I would have sent this solution to repo.

Do you have other way of solving this use case? Feel free to share your dataweave scripts in comment. I would love to see other approaches.

If you have any thoughts or feedback on this article, feel free to comment on this article or find me on Twitter or on manik.magar.me.

on twitter to get updates on new posts.

Stay updated!

On this blog, I post articles about different technologies like Java, MuleSoft, and much more.

You can get updates for new Posts in your email by subscribing to JavaStreets feed here -


Lives on Java Planet, Walks on Java Streets, Read/Writes in Java, JCP member, Jakarta EE enthusiast, MuleSoft Integration Architect, MuleSoft Community Ambassador, Open Source Contributor and Supporter, also writes at Unit Testers, A Family man!