Exploring Performance of Various List Iteration Options in CFML

A fellow developer came to me complaining of slow reload times on our CMS. One of the many things our CMS does is handle internationalization so that we can offer websites that use core functionality that can be translated in multiple languages. The process goes essentially as such:

Find all the source files containing translated text
Store those in a structure in memory to weed out duplicates
Write that structure to a single JSON file so we don't have to gather & read dozens of files in production

The requirements for this process were that the generation of the aggregated file can only happen in development and that the generated file must be easily diff-able by Git. Whether you agree with the latter requirement is irrelevant, but my colleagues insisted, and seeing it work this way made me a convert too.

In order to satisfy the second requirement of an easily diff-able file we first sort the structure keys and then iterate over them so the data is written alphabetically. This helps keep the Git diff easy to understand. However, it turns out that iterating over a list isn't always as cut and dried as you'd hope when it comes to looping over lists with many thousands of items.

Before you lambast me and say "use serializeJSON, idiot!" - yes, I could have. However, when we are working with files like this that are composed of other contents, it is impossible to get a good diff on the generated file so we chose to "pretty print" the structure data to the file. Rake me over the coals if you wish, but please do keep on reading for the good stuff.

Final aside, I promise - the numbers you see below are only for the provided examples. In my practical use, I started at around 17 seconds for the operation to complete but was able to drive that down to under 1 or 2 seconds.

Alright, back to it. The following snippets of code are all fed the same list of 10,000 items. My practical use had both the keys and the values are strings, not that it's likely to make a difference. All tests were executed using ColdFusion 2021.

Attempt #1 - basic index-based list iteration

function indexBasedListIteration(required struct items) {
    local.start = now().getTime();

    local.allContent = "";
    local.keys = arguments.items.keyList();

    for ( local.index = 1; local.index <= local.keys.listLen(); local.index++ ) {
        local.key = listGetAt( local.keys, local.index );
        local.value = arguments.items[ local.key ];
        local.line = "#local.key#: #local.value#";

        if ( local.index < local.keys.listLen() ) {
            local.line &= ",";
        }

        local.line &= chr(10);
        local.allContent &= local.line;
    }

    local.end = now().getTime();

    writeOutput( "function took #local.end - local.start#ms to run" );
}

In my tests, I gave this function a structure containing 10,000 items and it took roughly 1300-1400ms to complete, on average. So, that's our baseline. The code above looks like it should be pretty fast, right? What could possibly drive up the execution time of this function? If you said "the check for the index against the total number of keys", give yourself a pat on the back.

Attempt #2 - eliminating one logical check

Now that we know what the bottleneck is on that function, how can we eliminate it? In the end, I want to write a JSON file from all of my data, and I do need it to be valid, so I can't have a trailing comma. Simply leaving the check out isn't going to work. What if we eliminate the index check and use regular expressions to chop off the last comma?

function indexBasedListIterationWithCommaStrip(required struct items) {
    local.start = now().getTime();

    local.allContent = "";
    local.keys = arguments.items.keyList();

    for ( local.index = 1; local.index <= local.keys.listLen(); local.index++ ) {
        local.key = listGetAt( local.keys, local.index );
        local.value = arguments.items[ local.key ];
        local.line = "#local.key#: #local.value#,#chr(10)#";
        local.allContent &= local.line;
    }

    local.allContent = reReplace( local.allContent, ",#chr(10)#", "" );

    local.end = now().getTime();

    writeOutput( "indexBasedListIterationWithCommaStrip took #local.end - local.start#ms<br/>" );
}

Given the same structure with 10,000 items this function takes around 800-900ms to execute. By not doing the index check, we've shaved off around half a second. Not bad, but... still not great. I think we can do better...

Attempt #3 - switching to for...in syntax

At this point I was wondering if index-based loops were just slower than for...in loops. There's only one way to find out, right? This example builds off the last attempt but substitutes the list iteration type.

function forInListIterationWithCommaStrip(required struct items) {
    local.start = now().getTime();

    local.allContent = "";
    local.keys = arguments.items.keyList();

    for ( local.key in local.keys ) {
        local.value = arguments.items[ local.key ];
        local.line = "#local.key#: #local.value#,#chr(10)#";
        local.allContent &= local.line;
    }

    local.allContent = reReplace( local.allContent, ",#chr(10)#", "" );

    local.end = now().getTime();

    writeOutput( "forInListIterationWithCommaStrip took #local.end - local.start#ms<br/>" );
}

This version of the function took roughly 140-175ms to complete, on average. Remember that our naive approach with index-based iteration started us at 1300-1400ms, so we've saved well over a second at this point. If you want to look at it another way, the for...in syntax is more than 5x faster (~800ms to ~140ms) than an index-based loop that does the same thing. It's not fair to compare to attempt 1 since that had an extra logical step. The fairest comparison is attempt 2 to 3.

Summary of findings

It's pretty clear that if you can get away with it, you should use the for...in syntax for looping over lists. I suspect that, to some extent, not having to use listGetAt is a big time saver.

Additional optimization

If you're really a stickler for performance, you can turn this thing to 11 and use a Java StringBuffer instead of string concatenation:

function stringBufferWithCommaStripListLoop(required struct items) {
    local.start = now().getTime();

    local.stringBuffer = createObject("java", "java.lang.StringBuffer").init();
    local.keys = arguments.items.keyList();

    for ( local.key in local.keys ) {
        local.value = arguments.items[ local.key ];
        local.line = "#local.key#: #local.value#,#chr(10)#";
        local.stringBuffer.append(local.line)
    }

    local.allContent = reReplace( local.stringBuffer.toString(), ",#chr(10)#", "" );
    local.end = now().getTime();

    writeOutput( "stringBufferWithCommaStripListLoop took #local.end - local.start#ms<br/>" );
}

In my tests, using the StringBuffer cuts the execution time in half when compared to attempt 3 using string concatenation. I don't want to put too much focus on this since it's not really a list iteration optimization, but I'd be remiss not to point it out.