Regex Redux benchmark: F# vs Kotlin

OK, it’s time for another F# -> Kotlin port experiment, it’s regex-redux benchmark game :) It exposes Kotlin’s Coroutines a bit, compared to F# Async expressions. F# code (slightly adopted this code):


open System.Text.RegularExpressions
open System
open System.IO
open System.Diagnostics

let main args = 
    let sw = Stopwatch.StartNew()
    let regex pat = Regex (pat, RegexOptions.Compiled ||| RegexOptions.ExplicitCapture)
    let input = File.ReadAllText args.[0]
    let text = (regex ">.*\n|\n").Replace(input, "")

    async {
        let! newTextAsync =
            async {
                   ["tHa[Nt]", "<4>"
                    "aND|caN|Ha[DS]|WaS", "<3>"
                    "a[NSt]|BY", "<2>"
                    "<[^>]*>", "|"
                    "\\|[^|][^|]*\\|" , "-"]
                   |> List.fold (fun s (code, alt) -> 
                        (regex code).Replace(s, alt)) text
            } |> Async.StartChild
        let! results =
            [| "agggtaaa|tttaccct"
               "agggtaa[cgt]|[acg]ttaccct" |]
            |> (fun pat -> async { return pat, ((regex pat).Matches text).Count })
            |> Async.Parallel
        for pat, count in results do printfn "%s %i" pat count
        let! newText = newTextAsync
        printfn "\n%i\n%i\n%i\n%d ms" input.Length text.Length newText.Length sw.ElapsedMilliseconds
    |> Async.RunSynchronously

It starts a number of async computations that matches a large text with regular expressions, all in parallel, then waits them to finish. Nothing special, just ordinary F# async code.


import kotlinx.coroutines.experimental.*
import java.util.regex.Pattern

fun main(args: Array<String>) = runBlocking {
    val start = System.nanoTime()
    val input = File(args.firstOrNull() ?: "d:\\input5000000.txt").readText()
    val text = ">.*\n|\n".toRegex().replace(input, "")

    val newTextAsync = async(CommonPool) {
            "tHa[Nt]" to "<4>",
            "aND|caN|Ha[DS]|WaS" to "<3>",
            "a[NSt]|BY" to "<2>",
            "<[^>]*>" to "|",
            "\\|[^|][^|]*\\|" to "-"
        ).fold(text) { s, (code, alt) -> code.toRegex().replace(s, alt) }

        .map { async(CommonPool) { it to Pattern.compile(it).splitAsStream(text).count() - 1 }}
        .map { it.await() }
        .map { (pat, count) -> println("$pat, $count") }

    val newText = newTextAsync.await()

        elapsed: ${(System.nanoTime().toDouble() - start.toDouble()) / 1E6} ms

The code is very, very similar to the F# one thanks to the new Kotlin feature - coroutines. I don’t have a solid understanding of then to decide if they are better or worse than F# computation expressions, but I can make a several remarks already:


which removes the problem (if it’s a problem at all. I’m not sure if let!, do!, yeild! and return! keywords make F# code more or less readable. I mean Kotlin coroutine code looks exactly the same as ordinary one and it may make it more approachable for beginners).

Another huge difference is that so-called suspended functions are allowed to be used at different places than F#’s async functions:

To illustrate the last point, we could inline val newText = newTextAsync.await() variable right into the printf:

        elapsed: ${(System.nanoTime().toDouble() - start.toDouble()) / 1E6} ms

This is impossible to do in F#.

As for performance of this benchmark, Kotlin code is faster (on a file generated with Fasta benchmark for 5M entries): 5.6 seconds vs 7.6 seconds.