JQ syntax feels too unusual, doesn't resemble known code, gives me the feeling of looking into cryptic Perl or regex, could never remember the simplest things.
For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?
jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.
In this case:
* you know that .[] iterates over objects, so you use it to unpack the root array,
* you know you get a stream of objects, thus from those you use the .k1 filter to get the values of each k1 key.
Well for comparison, I did that learning curve process with SQL and I was able to understand it. But I did the same learning curve process with JQ and I still don’t understand it.
> jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.
SQL is based on solid mathematical theory, relational algebra. I personally learned that (and tuple relational calculus) in college before learning SQL, which made it easier. It helps making it coherant. Is there something like this for jq? Often when people invent languages that are not based on solid theory, they tend to lack coherence. This can make learning them difficult if you're someone that relies on your mental model of how things "should" work, like I am.
It's a filter. You can name-drop math stuff and even mention monads and the like, but it's just predicates, maps, a reductions.
Also, I'm not aware of a single person who ever looked at relational algebra beyond the introductory lessons of a relational databases 101 course, and even then that stuff was mostly in the way.
Your are the one name-dropping three mathematical concepts though? I'm not sure I understand your reasoning here. I'm talking about the basis for jq in general, not just your example. And if your message is representative of how the people that created jq think, I guess the answer is no and jq falls into the "no solid theory behind it" category.
I don't think my message was implying that jq is a worse (or better) tool for it. I was just explaining that for some people, tools with a theory behind are easier to learn and understand than tools without.
I agree that jq's query language is very obtuse and probably my biggest barrier towards learning it. I have found great mileage using gron [1], which is very different from jq, but its goal is to promote exploration of a JSON file through common unix tools such as awk and grep.
`gron` is great but doesn't seem to handle some (extreme-ish) situations that `jq` can, e.g. the json output from the fastnbt-tools. You either get a `token too long` error using `gron -s` because the input is too long (it's 90MB, that's fair) or you get only one set of outputs per key (iyswim) because they get overlapped in memory.
It's definitely an oddness when you have multiple objects at the same level that aren't in an array but I guess the explanation there is "they should all be on their own individual lines as streaming json" which `gron` does handle correctly.
I want to vouch for gron as well. Apart from being grepable, I found it is easier to orient myself where I am in a very large JSON structure. The location in the hierarchy is present on every single line, no need to scroll up or down to figure it out. Granted, many other tools can help with this as well, but gron does it well.
Love this. Such a simple idea yet very helpful. It probably can't do what all jq does but it will solve most of what you usually want to do with json on the command line.
I've struggled with the jq language when doing complicated things, but generally felt it was just the problem that was tricky. Generally I feel like I'm learning an actual useful language, though I guess Perl, Regex fall into that same category, what seems impenetreble at first later becomes almost second nature as you use $ to mean end of line in vi and so on. Then if you don't do it for a while, you forget the more obscure bits.
My approach to the example would be to use `.[] | .k1` which I think does what you want, and like bash command line pipes, you can build up to it semi-interactively.
The bits I struggle with JQ often involve irregular json, where a value might be missing, or null, or a list, not sure what the idiomatic way to deal with that is if there is one.
Had the same experience. That's why I've written jql[0], which puts a uniform lispy spin on CLI JSON processing. I now use it almost exclusively instead of jq. Check it out if you're looking for alternatives.
And by the way, you can achieve live preview with any of these CLI tools by using fzf. This is the snippet for jql for example: `echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'` (substitute jql for jq or anything else)
P.S.: jql might seem dead, as there are no recent commits, but it's not. It's just finished.
`jql` looks interesting - is there an easy way to do the equivalent of `jq`'s `to_entries[]`? (e.g. turns `{"x":"y"}{"a":"b"}` into `{"key":"x","value":"y"}{"key":"a","value":"b"}` which I've needed a lot recently for dealing with output with unknown keys.)
For the general case of multiple keys and values - no. It sounds reasonable, though, so I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.
For the special case you wrote as an example, where each object is just a single key-value, it's possible:
> I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.
That would be super, ta. `to_entries[]` is pretty much the major reason I've not managed to move off `jq` to anything else yet because it's just incredibly powerful in this situation.
It's not as terse as the jq equivalent - I'll probably add a way to create user-defined functions, so you can alias stuff like this to shorter forms - but that one will require more thought.
Wasn't expecting such a quick (if any!) response! Excellent, ta. That gives me the same output from my file as jq does with `to_entries[]`.
Unfortunately my next issue is how do I iterate over an array of objects (like jq `.[]`)? I'm guessing it's maybe something to do with `range` but I don't know how many I have in order to fill in those indices and I can't do `(elem 0) ... (elem 1)` for the same reason.
Not sure if you've gone through the README - especially the first few paragraphs should help you get an intuition on how to structure nested jql queries.
Basically, you can think about the query as a composition of many functions which result in one big function taking in your JSON and outputting a new JSON.
When you do ("mykey") or (0) you dive in one level deeper. You can also transform what is that one level deeper by writing ("mykey" (mytransform)). There is a keys function which returns the list of keys or the list of indices, for the current object or list, respectively. And you can use those lists of indices for indexing purposes.
Thus, if you have an input list and want to transform it element by element, you can write ((keys) (my-single-element-transformer)). It gets the indices, uses them as an index, and transforms each object contained in the list.
So let's say you have a list of objects {"name": "abc", "surname": "xyz"} and would like to transform them into a list of {"abc": "xyz"}. You can write ((keys) (object ("name") ("surname"))). This goes over all elements and for each returns a single object with a key that is the name (it's actually a transformer/continuation which gets the name from the current object that we pass there) and value that is the surname.
You can also see that in the original "entries" query. It first zips the keys with the values, so for a list of {"mykey": "myvalue}, it will give you a list of lists ["mykey", "myvalue"]. Then it pipes that into another transform, which for each such pair creates an object {"key": "<first element of pair>", "value": "<second element of pair>"}.
The overall system isn't that straightforward at first, but playing around with it for a while should make it click and then it's easy to write even more complex queries.
> gives me the feeling of looking into cryptic Perl or regex
Dunno if you'll see this given how many replies you already got, but rather than just dumping "how do you do that" here's a realization I had a while ago that made it way easier to understand:
jq's language is a series of filters/transformers more akin to bash pipes on a stream of data than anything else.
For example, just "." selects out the current object (and is needed to match the "root" at the start of the query), and jq pretty-prints the results (when to a terminal):
jq is now iterating over 2 objects, so the next filter is the one where you select out the key you want. This can be done in two different ways for this example (per sibling replies):
Note how I broke these up: The atoms are ".", "[]", and ".k1" - ".[]" isn't one of them, despite what it may look like at first glance when compared to ".k1". Some additional examples to show how these combine:
The "unwrap/flatten" [] can be used multiple times when nested arrays are involved, with or without the pipe syntax, but only works on arrays. It errors if given something else:
Also notice how the "." is needed after the pipes; these are separate filters/transformations being chained together, so as a new rule it needs the same "." as with the first one.
This one has the advantage of being natively understood by aws-cli, meaning you can pass a JMESPath to an AWS call and only receive the filtered / transformed result back.
I was once upon a time working heavily with JSON back-ends and wrote a node.js script which, when piped json and lambdas, ran the data thorough the lambda and outputted the result. It was very productive.
But then I discovered LINQPad[0] and, "The Legendary Dump".
That's pretty cool. Would be nice if there was an option to put the currently used filter/query into the shell history or the clipboard. So that you could experiment to find the right one, then back out and use it in a pipeline.
Not sure I understand. I mean using the interactive mode to figure out the right query ("[].mumble.whatever...") and then being able to save the text of the query that you figured out by pointing and clicking. Like a graphical SQL query builder allows you to do.
I absolutely love gron, but I have to confess I feel dirty when I use it. It's unashamedly a brute-force tool in a world with plenty of elegant alternatives, and the main reason to use it is pure laziness to just shamelessly grep stuff around. And I love it for that.
Another alternative is the oj app (ojg/cmd/oj) which is part of https://github.com/ohler55/ojg. It relies on JSONPath for extraction and manipulation of JSON.
Usually the problem with dependencies it not that they are hard to install but that you have something additional to install on a machine where you want to use it. Sometimes you might not even have the permission to do so.
For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?