Regex Redux benchmark: F# vs Kotlin

OK, it’s time for another F# -> Kotlin port experiment, it’s regex-redux benchmark game :) It exposes Kotlin’s Coroutines a bit, compared to F# Async expressions. F# code (slightly adopted this code):

F#

open System.Text.RegularExpressions
open System
open System.IO
open System.Diagnostics

[<EntryPoint>]
let main args = 
    let sw = Stopwatch.StartNew()
    let regex pat = Regex (pat, RegexOptions.Compiled ||| RegexOptions.ExplicitCapture)
    let input = File.ReadAllText args.[0]
    let text = (regex ">.*\n|\n").Replace(input, "")

    async {
        let! newTextAsync =
            async {
               return
                   ["tHa[Nt]", "<4>"
                    "aND|caN|Ha[DS]|WaS", "<3>"
                    "a[NSt]|BY", "<2>"
                    "<[^>]*>", "|"
                    "\\|[^|][^|]*\\|" , "-"]
                   |> List.fold (fun s (code, alt) -> 
                        (regex code).Replace(s, alt)) text
            } |> Async.StartChild
        
        let! results =
            [| "agggtaaa|tttaccct"
               "[cgt]gggtaaa|tttaccc[acg]"
               "a[act]ggtaaa|tttacc[agt]t"
               "ag[act]gtaaa|tttac[agt]ct"
               "agg[act]taaa|ttta[agt]cct"
               "aggg[acg]aaa|ttt[cgt]ccct"
               "agggt[cgt]aa|tt[acg]accct"
               "agggta[cgt]a|t[acg]taccct"
               "agggtaa[cgt]|[acg]ttaccct" |]
            |> Array.map (fun pat -> async { return pat, ((regex pat).Matches text).Count })
            |> Async.Parallel
        
        for pat, count in results do printfn "%s %i" pat count
        
        let! newText = newTextAsync
        printfn "\n%i\n%i\n%i\n%d ms" input.Length text.Length newText.Length sw.ElapsedMilliseconds
    } 
    |> Async.RunSynchronously
    0

It starts a number of async computations that matches a large text with regular expressions, all in parallel, then waits them to finish. Nothing special, just ordinary F# async code.

Kotlin

import kotlinx.coroutines.experimental.*
import java.io.File
import java.util.regex.Pattern

fun main(args: Array<String>) = runBlocking {
    val start = System.nanoTime()
    val input = File(args.firstOrNull() ?: "d:\\input5000000.txt").readText()
    val text = ">.*\n|\n".toRegex().replace(input, "")

    val newTextAsync = async(CommonPool) {
        listOf(
            "tHa[Nt]" to "<4>",
            "aND|caN|Ha[DS]|WaS" to "<3>",
            "a[NSt]|BY" to "<2>",
            "<[^>]*>" to "|",
            "\\|[^|][^|]*\\|" to "-"
        ).fold(text) { s, (code, alt) -> code.toRegex().replace(s, alt) }
    }

    listOf("agggtaaa|tttaccct",
        "[cgt]gggtaaa|tttaccc[acg]",
        "a[act]ggtaaa|tttacc[agt]t",
        "ag[act]gtaaa|tttac[agt]ct",
        "agg[act]taaa|ttta[agt]cct",
        "aggg[acg]aaa|ttt[cgt]ccct",
        "agggt[cgt]aa|tt[acg]accct",
        "agggta[cgt]a|t[acg]taccct",
        "agggtaa[cgt]|[acg]ttaccct")
        .map { async(CommonPool) { it to Pattern.compile(it).splitAsStream(text).count() - 1 }}
        .map { it.await() }
        .map { (pat, count) -> println("$pat, $count") }

    val newText = newTextAsync.await()

    println(
        """
        ${input.length}
        ${text.length}
        ${newText.length}
        elapsed: ${(System.nanoTime().toDouble() - start.toDouble()) / 1E6} ms
        """.trimIndent())
}

The code is very, very similar to the F# one thanks to the new Kotlin feature - coroutines. I don’t have a solid understanding of then to decide if they are better or worse than F# computation expressions, but I can make a several remarks already:

Coroutins are a general approach to generate continuation-passing style code out from regular code, like F# does for computation expressions (Kotlin actually compiles to FSM, I’m not sure about F#).
Both allow to implement any kind of such expressions, like async computations, lazy sequence generators, etc.
The main visual difference is that continuation points are not immediately visible in Kotlin source code. Compare F#
```
async {
  ...
  let! newText = newTextAsync
  ...
}
```
to Kotlin
```
runBlocking {
 ...
 val newText = newTextAsync.await()
 ...
}
```
However, Intellij IDEA shows all such points in a gutter, like this:

which removes the problem (if it’s a problem at all. I’m not sure if let!, do!, yeild! and return! keywords make F# code more or less readable. I mean Kotlin coroutine code looks exactly the same as ordinary one and it may make it more approachable for beginners).

Another huge difference is that so-called suspended functions are allowed to be used at different places than F#’s async functions:

suspended function cannot be called outside a coroutine builder lambda or another suspended function
suspended function can be part of any expression, not only RHS of let!, return!, yeild! in F#.

To illustrate the last point, we could inline val newText = newTextAsync.await() variable right into the printf:

println(
        """
        ${input.length}
        ${text.length}
        ${newTextAsync.await().length}
        elapsed: ${(System.nanoTime().toDouble() - start.toDouble()) / 1E6} ms
        """.trimIndent())

This is impossible to do in F#.

As for performance of this benchmark, Kotlin code is faster (on a file generated with Fasta benchmark for 5M entries): 5.6 seconds vs 7.6 seconds.

fsharpblog

Regex Redux benchmark: F# vs Kotlin

F#

Kotlin