Sorting variable length arrays in Minecraft

Minecraft has a way to write 'code' in-game using their command syntax:


/say Hello World!
/summon sheep
  

It's an imperative language where you literally tell the game what command to run right now. You can build distributable functionality with .mcfunction files and datapacks:


test.mcfunction:
say Hello
say World!
  

So... what can we do with Minecraft commands?

Can we, for example, perform a sort on a variable length data structure?

The first step is to find a suitable variable length array to sort. Ideally, we want this array to be:

  • able to contain any number or type of elements
  • nestable (you can do arrays of arrays)
  • iterable (we need some way to loop over it)

Minecraft does not have variables

So, uh, yeah. Minecraft doesn't actually have variables at all, at least not the kind that can be any type like in other programming languages. What it does have is several things that feel like variables stored in different methods around the world:

Entities (like sheep) can be iterated over. This code will print out "Hello world" for each sheep in the world:


execute as @e[type=sheep] run say Hello world
  

Entities in the world also have "data" associated with them. This data is represented as a JSON object:


data get entity 6f9eaf9d-1585-4314-84ab-abbfcd0f8b02
{Brain: {memories: {}}, HurtByTimestamp: 0, Attributes:[], ...}
  

But summoning and dealing with sheep can be expensive because the game has to handle AI and motion and stuff for each entity. Let's take a closer look at the "data" that the sheep has:


{..., Attributes:[], ...}
  

Is that... is that a json array? Yep! It turns out, there are variable length arrays already in each entity's data!

The data command also has another usage:


data get storage NAME
data modify storage NAME append|insert|merge|prepend|set from OTHER_NAME
data modify storage NAME append|insert|merge|prepend|set value [...]
  

Now we can read, write, and copy around variable length arrays. We even got appending items to the end, and prepending items to the start!

For example, we can initialize an array to sort as:


data modify storage sort input set value [2,7,3,9,5,1]
  

IO is not a thing

.mcfunctions are only a collection of statements to run. Unlike in other programming languages, minecraft functions don't accept any input, and they don't return any output. They can only modify the state of the world.

We get around this by designating storage as "input" and "output" storage. We can just do copy operations into that temporary location, and out of the temporary output location after running a function:


data modify storage input data set from storage INPUT TO FUNCTION
function mergesort
data modify storage OUTPUT FROM FUNCTION set from storage output data
  

The syntax is hilariously verbose to me.

Iterating

So we have a variable length storage and functions with IO - how can we loop over it if we don't have any loop commands?

Always remember: where there's functions, there's loops - in the form of recursion!

We'll call a function recursively on our array, and index it using... wait... we can't do indexing!

What I'd like to do:


data get storage input data[i]
scoreboard players i add 1
  

But that doesn't work because the indexing for what to "get" from data has to have a hard-coded number:


data get storage input data[0]
data get storage input data[1]
...
  

Obviously, this doesn't scale to arrays of unknown length. Is there another way? There's always another way.

Peeling

First, we'll copy the input array into a temporary that we can safely modify


data modify storage temp data set from storage input data
  

Next, we'll write a function that peels one item off, does something, and then calls itself until the array is empty


iterate.mcfunction:
# Do something with temp data[0]
data remove storage temp data[0]
execute if data storage temp data[0] run function iterate
  

That's it! Here's python's len() function to get the length of an array using this method:


len.mcfunction:
data modify storage temp data set from storage input data
scoreboard players set out len 0
execute if data storage temp data[0] run function len_iter

len_iter.mcfunction:
scoreboard players add out len 1
data remove storage temp data[0]
execute if data storage temp data[0] run function len_iter
  

Sorting

Now that we have our tools, which sorting method should we use? Let's google "best sorting method arrays".

...But we're not actually dealing with an array in the traditional sense.

What I mean is, arrays have constant time random access. This means that it takes the same amount of time to lookup every item in the array at a given size. However, our version needs to be peeled. We need to call the function N times to get the N'th item. This is known as linear lookup time, and means that a different sorting method might perform better.

The linear lookup makes me think of another data structure: linked lists. The supposed best sorting method for linked lists is merge sort, so let's do that.

We can start with the base case of an input list of 0 or 1 items:


# If there are 0 or 1 elements in the list, copy input to output
execute unless data storage input data[1] run data modify storage output data set from storage input data
# Otherwise, clear output
execute if data storage input data[1] run data modify storage output data set value []
  

Note: unless is equivalent to if not

If we do have two or more items, let's try to sort them. First, we call our recursive function:


execute if data storage input data[1] run function mergesort_iter
  

So how do we merge sort recursively? We'll:

  1. split the input array into two halves
  2. merge sort each half
  3. merge the sorted arrays

We can split arrays by every other element easily:


split_every_other.mcfunction:

execute if data storage input data[0] run data modify storage output data append from storage input data[0]
execute if data storage input data[0] run data remove storage input data[0]
execute if data storage input data[0] run data modify storage output_1 data append from storage input data[0]
execute if data storage input data[0] run data remove storage input data[0]
execute if data storage input data[0] run function split_every_other
  

The second task, merge sorting each half, recursively calls our mergesort function until the array to sort is only one or zero elements, and then we just return that. After we return, we put the sorted array onto a buffer of arrays to sort. This prevents us overwriting other sorted arrays when we return:


# to_sort = [[], [], ...]
data modify storage input data set from storage to_sort data[0]
data remove storage to_sort data[0]
# to_merge = [[], [], ...]
execute unless data storage input data[1] run data modify storage to_merge data prepend from storage input data
execute if data storage input data[1] run function mergesort_iter
  

We copy this block of code twice, once for each half.

The third task, merging sorted arrays, is where the magic happens.

First, we pop off two arrays to merge:


data modify storage array data set from storage to_merge data[0]
data remove storage to_merge data[0]
data modify storage array_1 data set from storage to_merge data[0]
data remove storage to_merge data[0]
  

Next, we call merge_sorted_arrays, a comparison function that takes in two arrays, iterates through them and moves the smaller element to the end of the output array:


merge_sorted_arrays.mcfunction:

# if (len(array) >= 1) && (len(array_1) >= 1)
execute if data storage input data[0] if data storage input_1 data[0] run function merge_sorted_arrays_iter
  

merge_sorted_arrays_iter.mcfunction:

# if only (len(array) >= 1)
execute if data storage input data[0] unless data storage input_1 data[0] run data modify storage output data append from storage input data[0]
execute if data storage input data[0] unless data storage input_1 data[0] run data remove storage input data[0]

# if only (len(array_1) >= 1)
execute unless data storage input data[0] if data storage input_1 data[0] run data modify storage output data append from storage input_1 data[0]
execute unless data storage input data[0] if data storage input_1 data[0] run data remove storage input_1 data[0]

# if (len(array) >= 1) and (len(array_1) >= 1)
execute if data storage input data[0] if data storage input_1 data[0] run function merge_sorted_arrays_comp

# if (len(array) >= 1) or (len(array_1) >= 1)
scoreboard players set or_variable math_private 0
execute if data storage input data[0] run scoreboard players add or_variable math_private 1
execute if data storage input_1 data[0] run scoreboard players add or_variable math_private 1
execute if score or_variable math_private matches 1.. run function merge_sorted_arrays_iter
  

merge_sorted_arrays_comp.mcfunction:

# Get comparison variables as integers
execute store result score .a math_private run data get storage input data[0] 1000
execute store result score .b math_private run data get storage input_1 data[0] 1000

# if a >= b, move input data[0] to the end of output data
execute if score .a math_private <= .b math_private run data modify storage output data append from storage input data[0]
execute if score .a math_private <= .b math_private run data remove storage input data[0]
# else, move input data[0] to the end of output data
execute if score .a math_private > .b math_private run data modify storage output data append from storage input_1 data[0]
execute if score .a math_private > .b math_private run data remove storage input_1 data[0]
  

It works!

merge sort working

Let's take a look at what the to_merge buffer is during execution:

mergesort printouts

The algorithm seems to do:

  1. split [3, 6, 4, 7, 8, 2, 7, 5, 1] -> [3, 4, 8, 7, 1] and [6, 7, 2, 5]
  2. split [3, 4, 8, 7, 1] -> [3, 8, 1] and [4, 7]
  3. split [3, 8, 1] -> [3, 1] and [8]
  4. split [3, 1] -> [3] and [1]
  5. merge [3] and [1] -> [1, 3]
  6. merge [8] and [1, 3] -> [1, 3, 8]
  7. split [4, 7] -> [4] and [7]
  8. merge [4] and [7] -> [4, 7]
  9. merge [4, 7] and [1, 3, 8] -> [1, 3, 4, 7, 8]
  10. split [6, 7, 2, 5] -> [6, 2] and [7, 5]
  11. split [6, 2] -> [6] and [2]
  12. merge [6] and [2] -> [2, 6]
  13. split [7, 5] -> [7] and [5]
  14. merge [7] and [5] -> [5, 7]
  15. merge [5, 7] and [2, 6] -> [2, 5, 6, 7]
  16. merge [2, 5, 6, 7] and [1, 3, 4, 7, 8] -> [1, 2, 3, 4, 5, 6, 7, 7, 8]

While I don't see Minecraft surpassing Python or Javascript or C++, I'm curious what other cool things we can do with variable length arrays, iteration, and functions with IO...