In this post, I’ll horrify you by coming up with some terrible mathematical ideas and some even more terrible names for them.
The edit distance between two strings is the minimum number of operations needed to transform one into the other. The particular example of this that you probably saw in your undergraduate algorithms course is the Levenshtein distance, which counts “operations” as single-character insertions, deletions, and substitutions. For example, eric
and epic
are at distance 1, because a single substitution transforms one into the other.
But wait, you say. Why do we need to constrain our definition of the edit distance to use only single-character insertions, deletions, and substitutions? It’s 2022, not 1965; editors have advanced greatly. Why don’t we make use all the wonderful features that Vim, the most awesome of modern editors, gives us? I’m glad you asked, dear reader.
Let’s define the directed Vim distance from string to as the minimum number of keystrokes needed in Vim to transform a buffer containing exactly into one containing exactly . Now, I’ll concede that this isn’t exactly a formal definition, but I’ll leave that as an implementation detail. Let’s just say that you have to do “reasonable” things in Vim—no shelling out, for instance.
As a brief aside, you may be thinking: Eric, this is an absolutely ingenious concept, but the name is no good. Every modern system needs a catchy name; no one wants to say “directed Vim distance” all the time. I hear you. Let’s call the directed Vim distance the divide for short. My research advisor really likes good mathematical notation. To absolutely horrify him, let’s denote the directed Vim distance (divide) from to as .
Okay, with that out of the way, let’s play around with some examples. Now clearly epic
eric
, since to transform eric
into epic
in Vim, all you need to do is type lrp
.
What about more complex transformations? Vim is actually quite powerful. For example, to get from urdqwcxokfazshgvmjbylptien
to abcdefghijklmnopqrstuvwxyz
, all you need to do is enter the super-intuitive sequence of keystrokes :s/\v(\w)/\1\r/g<enter>:%sort<enter>:%s/\n//g<enter>
. So the divide is at most 34.1
Let’s take a step back for a moment and ask a few obvious questions about our shiny new distance function, such as: is this really a distance function? Usually when you’re asked this question in class, you need to go about proving that the triangle inequality holds. Luckily for us, we don’t need to bother with triangles, because there’s an even more obvious reason why the divide is not a metric, viz. it’s not symmetric. In particular, for any string , because to get from to the empty string , all you need to type in Vim is dG
. However, it is clearly not the case that for all .
This seems a little bad for us, but don’t worry; there’s still some utility in the quantity . This gives us a natural way to define the constructive Vim complexity of a string : it’s the number of keystrokes needed to construct starting from the empty buffer in Vim. Let’s horrify my advisor even more by inventing an even worse name for this: we’ll abbreviate the “constructive Vim complexity” as the convexity.2 For example, here’s a nice theorem:
Theorem: Any string has convexity at most .
The proof is quite simple: just type i
and then your string to insert it into the buffer.
If you’ve studied theoretical computer science before, perhaps this notion of convexity will remind you of the Kolmogorov complexity. For those who haven’t heard of this before: intuitively, the Kolmogorov complexity of a string is a measure of how “random” it is, based on how difficult it is to describe. For example, the string aaaaaaaaaaaa
is quite easy to describe: it’s just twelve a
s. On the other hand, 9fa72u34j5nl
is much harder to describe, despite being the same length. The formalization of this notion is a bit subtle, but hopefully the intuition is clear.
Anyway, I bring this up because there is an analogous theorem in Real Computer Science: for any string , the Kolmogorov complexity is bounded by .
This isn’t the only thing we can steal from Real Computer Science. There is another theorem stating that there exist “incompressible” (or “Kolmogorov random”) strings, i.e. strings with . We can define an analogous notion for convexity: call a string convex if .
Do such strings exist? Sure they do. Assuming Vim is deterministic, every sequence of less than keystrokes maps to exactly one string. So there are at most strings with convexity less than . However, there are strings of length .
Okay, frankly it’s quite late and I need to go to work tomorrow, so I’ll cut off this post here. Maybe I’ll explore this idea more in the future.3
Comments