Vim Complexity

In this post, I’ll horrify you by coming up with some terrible mathematical ideas and some even more terrible names for them.

The edit distance between two strings is the minimum number of operations needed to transform one into the other. The particular example of this that you probably saw in your undergraduate algorithms course is the Levenshtein distance, which counts “operations” as single-character insertions, deletions, and substitutions. For example, eric and epic are at distance 1, because a single substitution transforms one into the other.

But wait, you say. Why do we need to constrain our definition of the edit distance to use only single-character insertions, deletions, and substitutions? It’s 2022, not 1965; editors have advanced greatly. Why don’t we make use all the wonderful features that Vim, the most awesome of modern editors, gives us? I’m glad you asked, dear reader.

Directed Vim Distance

Let’s define the directed Vim distance from string xx to yy as the minimum number of keystrokes needed in Vim to transform a buffer containing exactly xx into one containing exactly yy. Now, I’ll concede that this isn’t exactly a formal definition, but I’ll leave that as an implementation detail. Let’s just say that you have to do “reasonable” things in Vim—no shelling out, for instance.

As a brief aside, you may be thinking: Eric, this is an absolutely ingenious concept, but the name is no good. Every modern system needs a catchy name; no one wants to say “directed Vim distance” all the time. I hear you. Let’s call the directed Vim distance the divide for short. My research advisor really likes good mathematical notation. To absolutely horrify him, let’s denote the directed Vim distance (divide) from xx to yy as y/xy/x.

Okay, with that out of the way, let’s play around with some examples. Now clearly epic//eric 3\le 3, since to transform eric into epic in Vim, all you need to do is type lrp.

What about more complex transformations? Vim is actually quite powerful. For example, to get from urdqwcxokfazshgvmjbylptien to abcdefghijklmnopqrstuvwxyz, all you need to do is enter the super-intuitive sequence of keystrokes :s/\v(\w)/\1\r/g<enter>:%sort<enter>:%s/\n//g<enter>. So the divide is at most 34.1

Constructive Vim Complexity

Let’s take a step back for a moment and ask a few obvious questions about our shiny new distance function, such as: is this really a distance function? Usually when you’re asked this question in class, you need to go about proving that the triangle inequality holds. Luckily for us, we don’t need to bother with triangles, because there’s an even more obvious reason why the divide is not a metric, viz. it’s not symmetric. In particular, ε/x2\varepsilon/x \le 2 for any string xx, because to get from xx to the empty string ε\varepsilon, all you need to type in Vim is dG. However, it is clearly not the case that x/ε2x/\varepsilon \le 2 for all xx.

This seems a little bad for us, but don’t worry; there’s still some utility in the quantity x/εx/\varepsilon. This gives us a natural way to define the constructive Vim complexity of a string xx: it’s the number of keystrokes needed to construct xx starting from the empty buffer in Vim. Let’s horrify my advisor even more by inventing an even worse name for this: we’ll abbreviate the “constructive Vim complexity” as the convexity.2 For example, here’s a nice theorem:

Theorem: Any string xx has convexity at most x+1|x| + 1.

The proof is quite simple: just type i and then your string to insert it into the buffer.

If you’ve studied theoretical computer science before, perhaps this notion of convexity will remind you of the Kolmogorov complexity. For those who haven’t heard of this before: intuitively, the Kolmogorov complexity of a string is a measure of how “random” it is, based on how difficult it is to describe. For example, the string aaaaaaaaaaaa is quite easy to describe: it’s just twelve as. On the other hand, 9fa72u34j5nl is much harder to describe, despite being the same length. The formalization of this notion is a bit subtle, but hopefully the intuition is clear.

Anyway, I bring this up because there is an analogous theorem in Real Computer Science: for any string xx, the Kolmogorov complexity is bounded by K(x)x+O(1)K(x) \le |x| + O(1).

This isn’t the only thing we can steal from Real Computer Science. There is another theorem stating that there exist “incompressible” (or “Kolmogorov random”) strings, i.e. strings with K(x)xK(x) \ge |x|. We can define an analogous notion for convexity: call a string convex if x/εxx/\varepsilon \ge |x|.

Do such strings exist? Sure they do. Assuming Vim is deterministic, every sequence of less than nn keystrokes maps to exactly one string. So there are at most 2n12^n-1 strings with convexity less than nn. However, there are 2n2^n strings of length nn.

Okay, frankly it’s quite late and I need to go to work tomorrow, so I’ll cut off this post here. Maybe I’ll explore this idea more in the future.3

  1. Okay, in this case, you could have done better by just typing in ccabcdefghijklmnopqrstuvwxyz↩︎

  2. This might be the worst mathematical name that I’ve ever managed to invent.↩︎

  3. Maybe it’s coming soon to a SIGBOVIK near you!↩︎