Skip to content

Module 5

Rishabh Makhar edited this page May 16, 2023 · 9 revisions

JSON

  • JSON or JavaScript Object Notation is quite a popular data exchange format commonly used for transmitting data between a server and a web application, or between different components of an application.

  • JSON is a File format. JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data.

  • A JSON object is a collection of key-value pairs, where the keys are strings and the values can be strings, numbers, booleans, arrays, or other JSON objects.

{
  "name": "Rishabh Makhar",
  "age": 21,
  "isStudent": false,
  "hobbies": ["reading", "playing guitar", "hiking"],
  "address": {
    "street": "-----",
    "city": "indore",
    "country": "India"
  }
}

JOLT-JSON language for transform:

An open-source JSON to JSON transformation library written in Java developed by Bazaarvoice.

  • Provides a set of transforms, that can be "chained" together to form the overall JSON to JSON transform.
  • Jolt operates on a JSON object, which is represented as a Java Map. The transformation specifications, also known as "Jolt specs", are written in JSON format and describe how the input JSON object should be transformed into the desired output JSON object.
  • It provides a simple way to convert JSON data into a new structure by specifying a set of transformation rules in JSON format.

JOLT Transformation:

Jolt transformations work by defining a set of transformation rules, called "specs," which describe how input JSON should be transformed into the desired output JSON structure. These specs are defined in a separate JSON document and are applied to the input JSON using the Jolt library.

  • Jolt transformations take an input JSON document and apply a set of transformation rules to produce an output JSON document. The transformation can modify the structure, content, or format of the JSON data.
  • Jolt does not handle other data formats like XML or CSV.

Advantages of JOLT:

  • Lightweight and Efficient.
  • Declarative Transformation Specification.
  • Integration with Java Ecosystem.

Disadvantages of JOLT:

  • Jolt is limited to JSON-to-JSON transformations and does not support other data formats like XML or CSV.
  • Based on straightforward syntax, mastering its full potential may require some learning and experimentation.
  • Jolt's error handling capabilities are somewhat limited.

Alternatives to Jolt:

  1. JSONata: JSONata is a lightweight query and transformation language specifically designed for JSON data.
  2. JSONPath: JSONPath is a query language for JSON that allows you to specify paths to access and manipulate JSON data.
  3. Apache NiFi: Apache NiFi is a data integration and processing framework that includes powerful JSON transformation capabilities.

Operations In JOLT:

there are set of operations, also known as "specs," that can be used to perform transformations on JSON data. These operations define how input JSON should be transformed into the desired output JSON structure. Here are some commonly used operations in Jolt:

  1. shift
  2. default
  3. remove
  4. sort
  5. cardinality
  6. modify-default-beta
  7. modify-overwrite-beta

Shift:

The shift operation is the most commonly used operation in Jolt. It allows you to map and restructure JSON data from one form to another. It uses a set of transformation rules defined in a "shift spec" to specify the desired output structure. The shift operation supports various transformation patterns like renaming fields, extracting values, aggregating data, and more.

Input

{ 
  "client": { 
    "name": "Rishabh Makhar", 
    "email": "makhar@email.com", 
    "birthDate": "01/06/1997", 
    "address": "--colony indore",
    "country": "india",
    "number": "00000000000"
  } 
}

spec-

[
  {
    "operation": "shift",
    "spec": {
      "client": {
        "name": "customer.fullName",
        "birthDate": "customer.birthDate",
        "address": "customer.address.street",
        "country": "customer.address.country",
        "number": ["customer.phoneNumber", "customer.mobileNumber"]
      }
    }
  }
]

Explaination:

Through the "." (dot) notation, we are able to define levels in the new JSON that we want to create. With "name":"customer.fullName" we take the value of the field name and throw it into the field fullName inside of the object customer, and in "address":"customer.address.street" we take the value of the field address and throw it into the field street inside of the object address which will also be contained in the object costumer.

Output

{
  "customer" : {
    "fullName" : "Rishabh Makhar",
    "birthDate" : "01/06/1997",
    "address" : {
      "street" : "--colony indore",
      "country" : "india"
    },
    "phoneNumber" : "0000000000",
    "mobileNumber" : "0000000000"
  }
}

Default:

The default operation sets a default value for a specified field if it does not exist in the input JSON. It is useful when you want to ensure that a field is present in the output JSON, even if it is missing in the input.

Input

{
  "customer": {
    "name": "Costumer Default",
    "ssn": "123.456.789.10"
  }
}

spec:

[
  {
    "operation": "default",
    "spec": {
      "customer": {
        "birthDate": "01/01/1970"
      }
    }
  }
]

Output:

{
  "customer": {
    "name": "Costumer Default",
    "ssn": "123.456.789.10",
    "birthDate": "01/01/1970"
  }
}

Remove:

The remove operation removes a specified field from the input JSON. It is used when you want to exclude certain fields from the output JSON.

Input

{
  "customer": {
    "name": "Costumer Default",
    "ssn": "123.456.789.10",
    "birthDate": "01/01/1970"
  }
}

spec:

[
  {
    "operation": "remove",
    "spec": {
      "customer": {
        "birthDate": ""
      }
    }
  }
]

Output

{
  "customer": {
    "name": "Costumer Default",
    "ssn": "123.456.789.10"
  }
}

Sort:

The sort operation sorts an array field in the input JSON based on a specified order. It can sort the array in ascending or descending order based on the field values.

Input:

{
  "employee": {
    "phone": "9 9999-9999",
    "name": "Employee Sort",
    "birthDate": "01/01/1980",
    "role": "JOLT Analyst"
  }
}

spec

[
  {
    "operation": "sort"
  }
]

Output

{
  "employee": {
    "birthDate": "01/01/1980",
    "name": "Employee Sort",
    "phone": "9 9999-9999",
    "role": "JOLT Analyst"
  }
}

Cardinality:

The cardinality operation calculates the count of unique values for a specified field in the input JSON. It is helpful when you want to analyze the uniqueness or frequency of values in a JSON field.

Input

{
  "products": {
    "name": "Product A",
    "id": "123-A",
    "value": 10
  }
}

spec

[
  {
    "operation": "cardinality",
    "spec": {
      "products": "MANY"
    }
  }
]

Output

{
  "products": [
    {
      "name": "Product A",
      "id": "123-A",
      "value": 10
    }
  ]
}

Modify-Overwrite-Beta:

The modify-overwrite-beta operation allows you to modify the values of specific fields in the input JSON. It replaces the existing value with the specified new value. This operation is useful when you need to update or overwrite particular fields in the JSON.

Input

{
  "student": {
    "name": "Rishabh Makhar",
    "scores": [
      1,
      2,
      3,
      4
    ]
  }
}

spec

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "student": {
        "name": "=toUpper(@(1,name))",
        "maxScore": "=max(@(1,scores))"
      }
    }
    }
]

Output

{
  "student" : {
    "name" : "Rishabh Makhar",
    "scores" : [ 1, 2, 3, 4 ],
    "maxScore" : 4
  }
}

modify-default-beta:

modify-default-beta will assign a value to a field if it does not exist.

Input

{
  "student": {
    "name": "Rishabh",
    "scores": [
      1,
      2,
      3,
      4
    ]
  }
}

spec

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "student": {
        "name": "=toUpper(@(1,name))",
        "maxScore": "=max(@(1,scores))"
      }
    }
    }
]

output

{
  "student" : {
    "name" : "RISHABH",
    "scores" : [ 1, 2, 3, 4 ],
    "maxScore" : 4
  }
}

Wildcards in JOLT:

Jolt provides support for wildcards, which are placeholders that match and operate on multiple fields or elements in JSON data. Wildcards allow you to perform transformations on multiple fields simultaneously, without explicitly specifying each field individually. Here are the commonly used wildcards in Jolt:

  • &: uses the content of what is declared in the LHS to compose the structure of the output JSON.
  • *: References all fields and objects in a JSON without having to make their names explicit in the transformation.
  • @: References the value of a field or object in the input JSON.
  • $: References the name of a field or object contained in the input JSON to be used as the value of a field or object in the output JSON.
  • #: For LHS- enter values ​​manually in the output JSON. For RHS- to create lists.
  • |: References multiple fields or objects of an input JSON.

1. '&':

Uses the content of what is declared in the LHS to compose the structure of the output JSON, without the need to make this content explicit in the transformation. usage-RHS

Input

{
  "name": "Client Example",
  "email": "[client-example@email.com](mailto:client-example@email.com)"
}

spec

[
  {
    "operation": "shift",
    "spec": {
      "name": "client.&",
      "email": "client.&"
    }
  }
]

Output

{
  "client": {
    "name": "Client Example",
    "email": "[client-example@email.com](mailto:client-example@email.com)"
  }
}

2. '*':

References all fields and objects in a JSON without having to make their names explicit in the transformation. Usage: LHS Operations: shift, remove, cardinality, modify-default-beta, modify-overwrite-beta

Input

{
  "name": "Customer Example",
  "email": "[cliente-exemplo@email.com](mailto:cliente-exemplo@email.com)",
  "document": "1234567890",
  "birthDate": "10/31/1990",
  "address": "Customer Example Street"
}

spec

[
  {
    "operation": "shift",
    "spec": {
      "*": "customer.&",
      "document": "customer.ssn"
    }
  }
]

Output

{
  "customer": {
    "name": "Customer Example",
    "email": "[client-example@email.com](mailto:client-example@email.com)",
    "document": "1234567890",
    "birthDate": "10/31/1990",
    "address": "Customer Example Street"
  }
}

3. '@':

References the value of a field or object contained in the input JSON. Usage: LHS or RHS Operations: shift (LHS e RHS), modify-overwrite-beta (RHS), modify-overwrite-beta (RHS)

input

{
  "key": "code",
  "value": "123-ABC"
}

spec

[
  {
    "operation": "shift",
    "spec": {
      "value": "product.@(1,key)"
    }
  }
]

Output

{
  "product": {
    "code": "123-ABC"
  }
}

4. '$':

References the name of a field or object contained in the input JSON to be used as the value of a field or object in the output JSON. Usage: LHS or RHS Operations: shift

Input

{
  "product": {
    "name": "Product Example",
    "value": 10,
    "category": "CATEG-1",
    "weight": 25
  }
}

spec

[
  {
    "operation": "shift",
    "spec": {
      "product": {
        "*": {
          "$": "product[]"
        }
      }
    }
  }
]

Output

{
  "product": [
    "name",
    "value",
    "category",
    "weight"
  ]
}

6.'|'

It allows referencing multiple fields or objects of an input JSON so that, regardless of the name of the field or object, its value is allocated to the same destination in the output JSON.

Usage: LHS Operations: shift

Input

{
  "customer": {
    "fullName": "Customer Example",
    "email": "[customer-example@email.com](mailto:customer-example@email.com)"
  }
}

spec

[
  {
    "operation": "shift",
    "spec": {
      "customer": {
        "fullName|customerName": "customer.nome",
        "email": "customer.&"
      }
    }
  }
]

output

{
  "customer": {
    "name": "Customer Example",
    "email": "[customer-example@email.com](mailto:customer-example@email.com)"
  }
}

7.#

If used in LHS, it has the function of entering values ​​manually in the output JSON. In RHS, on the other hand, it is applicable only to create lists and has the function of grouping certain content of the input JSON within the list to be created. Usage: LHS e RHS Operations: shift

Functions-

  1. String toLower, toUpper, concat, join, split, substring, trim, leftPad e rightPad
  2. Number min, max, abs, avg, intSum, doubleSum, longSum, intSubtract, doubleSubtract, longSubtract, divide e divideAndRound
  3. Type toInteger, toDouble, toLong, toBoolean, toString, recursivelySquashNulls, squashNulls, size
  4. List firstElement, lastElement, elementAt, toList, sort