The dumbest (Emacs) indenter | natkr's ramblings

I've been using Nushell for a while, but mostly as an interactive shell. The other day I ended up writing my first nontrivial script with it, and... oh boy. There's an Emacs mode, but.. it only covers syntax highlighting, not highlighting.

What's the big deal, anyway?

And if you haven't run into it before, Emacs' default indentation behaviour when it doesn't understand the language¹ is, to put it nicely, a bit of an oddball oddball.

In short, it tries to align with the words of the line above, rather than any kind of standardized indentation like you'd normally expect. It also has no idea about when to begin or end indentation blocks. So, assuming that we manually mark the start of indentation blocks, it will produce code like this:

def main [] {
    dostuff | filter {
            $in > 5
            }
            if $cond {
               true
               } else {
                 false
                 }
                 }

instead of something more sensible like this:

def main [] {
    dostuff | filter {
        $in > 5
    }
    if $cond {
        true
    } else {
        false
    }
}

Oh dear

Surely we can do better? Well... Emacs expects you to provide your own parser, which is well and good but.. I'm not an elisp expert and I didn't really feel like writing a full parser from scratch. There is an option to delegate it to tree-sitter and nushell does provide a tree-sitter grammer, but I have vague memories of tree-sitter requiring a kind of awkward amount of glue that I just didn't feel like dealing with. NEXT!

Enter SMIE

In the end, I figured I'd at least give the custom parser a try. I haven't tried it before, but... the S in SMIE stands for Simple, so surely it should be fine... right? Well... Thankfully, since we only care about producing indentation we don't actually need to parse the full language. All we really care about for basic functionality is the level of nesting! So maybe this isn't so bad, after all! Let's get started.

SMIE splits indentation into three phases.

The lexer

First, the lexer splits the text into "tokens" (the basic units of text that our works with, think of it like "words"), using whitespace to resolve ambiguities and then throwing it away (for example: the snippet function foo(a, b) {} might become "function" "foo" "(" "a" "," "b" ")" "{" "}").

We don't actually need to write this ourselves today! SMIE provides a basic lexer that will split on whitespace or when switching between letters and symbols, which is good enough for what we care about right now.

The parser

Second, the parser matches that list of tokens against a grammar to produce an abstract syntax tree. For a "proper" parser that would mean having to write rules to recognize expressions, operators, statements, declarations, and so on. But we just care about indentation.. which means that we just care about distinguishing between "starts a level of nesting", "ends a level of nesting", and anything else!

SMIE uses a Lisp-flavoured variant of Backus-Naur Form (BNF), a declarative way to define the structure. Here, we define a structure "exp" (expression) that can contain itself within brackets².

(require 'smie)
(setq natkr/nushell-smie-grammar
      (smie-prec2->grammar
       (smie-bnf->prec2
        '((exp
           ("(" exp ")")
           ("{" exp "}")
           ("[" exp "]"))))))

The indentation rules

Finally, SMIE calls an "indentation rules" function for every line, to understand how it should modify its indentation based on the context around it.

(defun natkr/nushell-smie-rules (kind token)
  (pcase (cons kind token)
    ;; For each level of nesting, add ~nushell-indent-offset~
    (`(:elem . basic) nushell-indent-offset)
    ;; For things that *aren't* nested, don't add any more indentation
    (`(:list-intro . "") 0)))

Connecting it all

Finally, we just need to call smie-setup when applying nushell-mode, in order to apply our rules!

;; Could also use something like Doom's (after!) to avoid loading nushell-mode before we have to
(require 'nushell-mode)
(defun natkr/nushell-mode-hook ()
  (smie-setup natkr/nushell-smie-grammar 'natkr/nushell-smie-rules))
(add-hook 'nushell-mode-hook #'natkr/nushell-mode-hook)

And there we have it! A starting point for a nushell indenter, and one that's dumb enough that it should work for most vaguely C-shaped languages.

That is, indent-relative as inherited from prog-mode.

Normally, a BNF grammar would have to specify all valid trees fully, but SMIE will ignore anything it doesn't recognize.