Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
FX: An interactive alternative to jq to process JSON (github.com/antonmedv)
257 points by federicoterzi on Jan 9, 2022 | hide | past | favorite | 62 comments


JQ syntax feels too unusual, doesn't resemble known code, gives me the feeling of looking into cryptic Perl or regex, could never remember the simplest things.

For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?


> For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?

Do you mean something like:

    .[].k1
Give it a try.

https://jqplay.org/

jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.

In this case:

* you know that .[] iterates over objects, so you use it to unpack the root array,

* you know you get a stream of objects, thus from those you use the .k1 filter to get the values of each k1 key.

Here's jq's manual on basic filters: https://stedolan.github.io/jq/manual/#Basicfilters

After you get jq to filter out what you want, you can work on getting it to output results in whatever format you wish.


Well for comparison, I did that learning curve process with SQL and I was able to understand it. But I did the same learning curve process with JQ and I still don’t understand it.


> jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.

SQL is based on solid mathematical theory, relational algebra. I personally learned that (and tuple relational calculus) in college before learning SQL, which made it easier. It helps making it coherant. Is there something like this for jq? Often when people invent languages that are not based on solid theory, they tend to lack coherence. This can make learning them difficult if you're someone that relies on your mental model of how things "should" work, like I am.


> Is there something like this for jq?

It's a filter. You can name-drop math stuff and even mention monads and the like, but it's just predicates, maps, a reductions.

Also, I'm not aware of a single person who ever looked at relational algebra beyond the introductory lessons of a relational databases 101 course, and even then that stuff was mostly in the way.


Your are the one name-dropping three mathematical concepts though? I'm not sure I understand your reasoning here. I'm talking about the basis for jq in general, not just your example. And if your message is representative of how the people that created jq think, I guess the answer is no and jq falls into the "no solid theory behind it" category.

I don't think my message was implying that jq is a worse (or better) tool for it. I was just explaining that for some people, tools with a theory behind are easier to learn and understand than tools without.


I agree that jq's query language is very obtuse and probably my biggest barrier towards learning it. I have found great mileage using gron [1], which is very different from jq, but its goal is to promote exploration of a JSON file through common unix tools such as awk and grep.

1: https://github.com/tomnomnom/gron


> I have found great mileage using gron

`gron` is great but doesn't seem to handle some (extreme-ish) situations that `jq` can, e.g. the json output from the fastnbt-tools. You either get a `token too long` error using `gron -s` because the input is too long (it's 90MB, that's fair) or you get only one set of outputs per key (iyswim) because they get overlapped in memory.


> or you get only one set of outputs per key (iyswim) because they get overlapped in memory

That sounds like a major bug. So it will silently skip data that you wanted?


> That sounds like a major bug.

It's definitely an oddness when you have multiple objects at the same level that aren't in an array but I guess the explanation there is "they should all be on their own individual lines as streaming json" which `gron` does handle correctly.

    (echo '{"a":"23"}'; echo '{"a":"25"}') | gron -s
    json = [];
    json[0] = {};
    json[0].a = "23";
    json[1] = {};
    json[1].a = "25";
> So it will silently skip data that you wanted?

Yeah.

    echo '{"a":"23"}{"a":"25"}' | gron
    json = {};
    json.a = "23";
The `-s` option doesn't help.

    echo '{"a":"23"}{"a":"25"}' | gron -s
    json = [];
    json[0] = {};
    json[0].a = "23";


Had a look at the source and I think I've figured out why and maybe how to fix it. Will have a bash at making a PR this week.


I want to vouch for gron as well. Apart from being grepable, I found it is easier to orient myself where I am in a very large JSON structure. The location in the hierarchy is present on every single line, no need to scroll up or down to figure it out. Granted, many other tools can help with this as well, but gron does it well.


Love this. Such a simple idea yet very helpful. It probably can't do what all jq does but it will solve most of what you usually want to do with json on the command line.

Thanks for that tip!


I've struggled with the jq language when doing complicated things, but generally felt it was just the problem that was tricky. Generally I feel like I'm learning an actual useful language, though I guess Perl, Regex fall into that same category, what seems impenetreble at first later becomes almost second nature as you use $ to mean end of line in vi and so on. Then if you don't do it for a while, you forget the more obscure bits.

My approach to the example would be to use `.[] | .k1` which I think does what you want, and like bash command line pipes, you can build up to it semi-interactively.

The bits I struggle with JQ often involve irregular json, where a value might be missing, or null, or a list, not sure what the idiomatic way to deal with that is if there is one.


Had the same experience. That's why I've written jql[0], which puts a uniform lispy spin on CLI JSON processing. I now use it almost exclusively instead of jq. Check it out if you're looking for alternatives.

And by the way, you can achieve live preview with any of these CLI tools by using fzf. This is the snippet for jql for example: `echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'` (substitute jql for jq or anything else)

P.S.: jql might seem dead, as there are no recent commits, but it's not. It's just finished.

[0]: https://github.com/cube2222/jql


`jql` looks interesting - is there an easy way to do the equivalent of `jq`'s `to_entries[]`? (e.g. turns `{"x":"y"}{"a":"b"}` into `{"key":"x","value":"y"}{"key":"a","value":"b"}` which I've needed a lot recently for dealing with output with unknown keys.)


For the general case of multiple keys and values - no. It sounds reasonable, though, so I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.

For the special case you wrote as an example, where each object is just a single key-value, it's possible:

  (object
      "key" (pipe (keys) (0))
      "value" (pipe ((keys)) (0)))


> I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.

That would be super, ta. `to_entries[]` is pretty much the major reason I've not managed to move off `jq` to anything else yet because it's just incredibly powerful in this situation.


I've actually just gone ahead and added a way to do this - the zip function - in the v0.2.0 release.

The relevant jql snippet to solve this in the general case now is:

  (pipe
    (zip
      (keys)
      ((keys)))
    ((keys)
      (object
        "key" (0)
        "value" (1))))
It's not as terse as the jq equivalent - I'll probably add a way to create user-defined functions, so you can alias stuff like this to shorter forms - but that one will require more thought.


Wasn't expecting such a quick (if any!) response! Excellent, ta. That gives me the same output from my file as jq does with `to_entries[]`.

Unfortunately my next issue is how do I iterate over an array of objects (like jq `.[]`)? I'm guessing it's maybe something to do with `range` but I don't know how many I have in order to fill in those indices and I can't do `(elem 0) ... (elem 1)` for the same reason.


Not sure if you've gone through the README - especially the first few paragraphs should help you get an intuition on how to structure nested jql queries.

Basically, you can think about the query as a composition of many functions which result in one big function taking in your JSON and outputting a new JSON.

When you do ("mykey") or (0) you dive in one level deeper. You can also transform what is that one level deeper by writing ("mykey" (mytransform)). There is a keys function which returns the list of keys or the list of indices, for the current object or list, respectively. And you can use those lists of indices for indexing purposes.

Thus, if you have an input list and want to transform it element by element, you can write ((keys) (my-single-element-transformer)). It gets the indices, uses them as an index, and transforms each object contained in the list.

So let's say you have a list of objects {"name": "abc", "surname": "xyz"} and would like to transform them into a list of {"abc": "xyz"}. You can write ((keys) (object ("name") ("surname"))). This goes over all elements and for each returns a single object with a key that is the name (it's actually a transformer/continuation which gets the name from the current object that we pass there) and value that is the surname.

You can also see that in the original "entries" query. It first zips the keys with the values, so for a list of {"mykey": "myvalue}, it will give you a list of lists ["mykey", "myvalue"]. Then it pipes that into another transform, which for each such pair creates an object {"key": "<first element of pair>", "value": "<second element of pair>"}.

The overall system isn't that straightforward at first, but playing around with it for a while should make it click and then it's easy to write even more complex queries.


Does fzf let you specify autocomplete for the command based on the input? That would be amazing.


> For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?

    '.[].k1'


  ╰─$ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq -r '.[] | .k1'
  v1
  v3
https://codefaster.substack.com/p/mastering-jq-part-1-59c

1. parse a json value from stdin and set it as the initial result

2. for each function, apply the function to the result, and set the output as the result for the next function.

3. The final result is pretty printed on stdout.


Yeah I too found the syntax somewhat unintuitive at times. But to answer your question, you would do 'map(.k1)'


> gives me the feeling of looking into cryptic Perl or regex

Dunno if you'll see this given how many replies you already got, but rather than just dumping "how do you do that" here's a realization I had a while ago that made it way easier to understand:

jq's language is a series of filters/transformers more akin to bash pipes on a stream of data than anything else.

For example, just "." selects out the current object (and is needed to match the "root" at the start of the query), and jq pretty-prints the results (when to a terminal):

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]'
  [{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]
  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.'
  [
    {
      "k1": "v1",
      "k2": "v2"
    },
    {
      "k1": "v3"
    }
  ]
There's only 1 matching element here, the outermost array. We want to go one deeper, so use "[]" to unwrap/flatten it:

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[]'
  {
    "k1": "v1",
    "k2": "v2"
  }
  {
    "k1": "v3"
  }
jq is now iterating over 2 objects, so the next filter is the one where you select out the key you want. This can be done in two different ways for this example (per sibling replies):

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[].k1'
  "v1"
  "v3"
  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[] | .k1'
  "v1"
  "v3"
Note how I broke these up: The atoms are ".", "[]", and ".k1" - ".[]" isn't one of them, despite what it may look like at first glance when compared to ".k1". Some additional examples to show how these combine:

The "unwrap/flatten" [] can be used multiple times when nested arrays are involved, with or without the pipe syntax, but only works on arrays. It errors if given something else:

  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[]'
  [
    1,
    2,
    3
  ]
  [
    4,
    [
      5,
      6
    ]
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[][]'
  1
  2
  3
  4
  [
    5,
    6
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[][][]'
  jq: error (at <stdin>:1): Cannot iterate over number (1)

  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[] | .[]'
  1
  2
  3
  4
  [
    5,
    6
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[] | .[] | .[]'
  jq: error (at <stdin>:1): Cannot iterate over number (1)
Also notice how the "." is needed after the pipes; these are separate filters/transformations being chained together, so as a new rule it needs the same "." as with the first one.


There’s also jp, which interprets JMESPath: https://github.com/jmespath/jp

This one has the advantage of being natively understood by aws-cli, meaning you can pass a JMESPath to an AWS call and only receive the filtered / transformed result back.


For those who are comfortable with python, I created Jello[0], which works like jq but uses python syntax.

I also created Jellex[1], which is a TUI built on Jello to assist with building the python queries.

Jello gives you the power of python but without all of the boilerplate, so it’s nicer to use in Bash scripts.

[0] https://github.com/kellyjonbrazil/jello

[1] https://github.com/kellyjonbrazil/jellex


Apparently, the author wrote this tool because jid was struggling with a 7MB JSON file.

See https://github.com/simeji/jid/issues/66#issuecomment-4436718...


Just tried out both of these for a large endpoint.

- FX "expand/collapse" functionality seems way better for exploring APIs whose shape you don't know

- jid is maybe marginally better for APIs where you have instant recall of the exact shape and need to rapidly query it

Overall, I like FX better because it provides feedback on your query faster.

I am grateful to the author(s) for creating it and I'll be using it instead of JQ whenever I need to wrangle APIs from the CLI.


FYI, fx has a jid-like mode that can be started by pressing the dot key. See this post[1] for more information.

[1] https://medium.com/@antonmedv/discover-how-to-use-fx-effecti...


Thank you! I had discovered it when I was playing around with it but forgot to mention that.


A generic (partial) solution to this type of thing is just to sample a number of lines from the large input, and do the investigation on that.

shuf -n 1000 file

This is part of coreutils.

There's also jiq, which is a clone of jid (mentioned elsewhere) but with jq syntax


That would only work if it's line separated JSON though. If you cut off the first 1000 lines of a big JSON file it will be invalid.


Then, use

  jq —-compact-output '.' | head -10 | foo
That is also useful for grepping to filter on records of interest.

jq also has --stream for handling large inputs.


If you want to use jq but with Python syntax, I wrote pq:

https://github.com/dvolk/pq


I was once upon a time working heavily with JSON back-ends and wrote a node.js script which, when piped json and lambdas, ran the data thorough the lambda and outputted the result. It was very productive.

But then I discovered LINQPad[0] and, "The Legendary Dump".

[0]: https://www.linqpad.net/


That's pretty cool. Would be nice if there was an option to put the currently used filter/query into the shell history or the clipboard. So that you could experiment to find the right one, then back out and use it in a pipeline.


Cat xxx.json | fx 'whatever' | fx

Is how I usually do so, if you don't want the interactive mode drop the last pipe


Not sure I understand. I mean using the interactive mode to figure out the right query ("[].mumble.whatever...") and then being able to save the text of the query that you figured out by pointing and clicking. Like a graphical SQL query builder allows you to do.


You can do this with `jid -q`: https://github.com/simeji/jid

(Its interactivity is keyboard- rather than mouse-based.)


Oh, that's great...outputs a jq-compatible query. Thanks!


gron [0] is another interesting JSON processor that follows UNIX philosophy:

[0] https://github.com/tomnomnom/gron

"Make JSON greppable!"

"gron transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute 'path' to it."


I absolutely love gron, but I have to confess I feel dirty when I use it. It's unashamedly a brute-force tool in a world with plenty of elegant alternatives, and the main reason to use it is pure laziness to just shamelessly grep stuff around. And I love it for that.


I also like https://github.com/dflemstr/rq because it supports a few more formats (protobufs for example)


If you're into the SQL side (compared to jq's custom query language) of querying arbitrary files I've got a comparison of some major tools here too.

https://github.com/multiprocessio/datastation/tree/main/runn...


It would be a good idea if this was added to one of the existing[1] comparison tools for similar tools, there's already a lot of them out there.

1: https://cburgmer.github.io/json-path-comparison/


Another alternative is the oj app (ojg/cmd/oj) which is part of https://github.com/ohler55/ojg. It relies on JSONPath for extraction and manipulation of JSON.


Since nobody else mentioned it, there's also jiq, which uses jq under the hood.

https://github.com/fiatjaf/jiq


https://github.com/akavel/up is another tool that can be used to make most of commands interactive


Would be really nice if there was a big warning somewhere in the installation section that it doesn't work on windows (outside wsl).


Looks really nice. I wish it handled yaml as well.


I like to have something like this in my shell:

    y2j () {
        ruby -r json -r yaml -e 'puts JSON.dump(YAML.load(STDIN))'
    }
Makes it easy to use json tooling for yaml, although of course it flattens out anchors etc.


This simply evals the argument. Not an alternative to jq.

  fx “code to eval”


This doesn't look to me something that can be evaluated by node

  fx .comments[].authors[].names
You might want to take a look at this post[1] to see everything fx can do.

[1] https://medium.com/@antonmedv/discover-how-to-use-fx-effecti...


Are you truly unable to see how fx adds value over plain jq?

If you are, then why comment?


Would be cool if it was not a cli made in 'node'...


do any of these json processing tools support wildcards in key names ? eg, '.long*' instead of '.longNameWithManyDetails'


[flagged]


On my machine jq depends on oniguruma. Both node and onigurama should be easy to install from your distro package manager.


Usually the problem with dependencies it not that they are hard to install but that you have something additional to install on a machine where you want to use it. Sometimes you might not even have the permission to do so.


It's because Homebrew (if you're on macOS) decided not to compile it statically.


I'm on manjaro.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: