jq – remove duplicates from arrays

jq – remove duplicates from arrays

Problem Description:

i want to remove the duplicates from each array in this json:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "one",
    "two",
    "two",
    "three",
    "three",
    "four",
    "four"
  ],
  "xyz": [
    "one",
    "one",
    "two",
    "two",
    "four"
  ]
}

output I am expecting after removing the duplicates:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

i tried map, uniq, group_by with jq but nothing helped

Solution – 1

unique can remove duplicates, but it automatically sorts the arrays, which may or may not be what you want.

jq '.[] |= unique'
{
  "abc": [
    "five"
  ],
  "pqr": [
    "four",
    "one",
    "three",
    "two"
  ],
  "xyz": [
    "four",
    "one",
    "two"
  ]
}

Demo

You can retrieve the original ordering by recreating the array based on sort ing the index positions of all of its unique items:

jq '.[] |= [.[[index(unique[])] | sort[]]]'

Demo

Or circumvent any sorting behaviour by writing your own straightforward de-duplication function:

jq '.[] |= reduce .[] as $i ([]; . + if index($i) then [] else [$i] end)'

Demo

In my tests, the latter performed best, with both producing

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

Solution – 2

Here is a sort-free alternative for obtaining the distinct items in an array (or stream) while retaining the order of first occurrence.

It uses a filter that is a tiny bit more complex than it would otherwise be, for the sake of complete genericity:

# generate a stream of the distinct items in `stream`
# in order of first occurrence, without sorting
def uniques(stream):
  foreach stream as $s ({};
     ($s|type) as $t
     | (if $t == "string" then $s else ($s|tostring) end) as $y
     | if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
     if .emit then .item else empty end );

Now it’s just a matter of applying this filter to your JSON. One possibility would be:

 map_values([uniques(.[])])
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject