Go allows to run very lightweight goroutines and throw them away (for garbage collector) after they are no longer needed. You can execute any code inside them and this caode can allocate as much as needed. Both on stack and on heap. If goroutine stack is too small it grows. This grow is not free. Suppose you have a widely used function that needs to allocate a bit of memory on stack. Depending on whether there is enough of memory on stack or not it will take different amount of time, since in some cases it will include time to grow the stack. It happens that a really common function time.Now() is allocating. So this effect can be demoed with this function. Consider the following program.
package main
import (
"fmt"
"time"
)
func main() {
for depth := 0; depth < 1000; depth++ {
var res time.Duration
for i := 0; i < 10_000; i++ {
done := make(chan struct{})
go func() {
res += twoNowsDiffInDepth(depth)
close(done)
}()
<-done
}
fmt.Printf("%03d - %d\n", depth, res)
}
}
func twoNowsDiffInDepth(depth int) time.Duration {
if depth == 0 {
return nowDiff(time.Now())
}
return twoNowsDiffInDepth(depth - 1)
}
func nowDiff(t time.Time) time.Duration {
return time.Now().Sub(t)
}
The output has 1000 lines, obviously. Each line has total number of nanoseconds to call time.Now() on stacks depth from 1 to 1000 in a fresh goroutine 10 thousand times. Usually it takes 400000-600000 nanoseconds. However on some stack depth the result is different. Here is some lines from the output.
000 - 838890
001 - 796577
002 - 691937
003 - 578079
004 - 531705
005 - 499433
...
017 - 405944
018 - 9895890
019 - 9742951
020 - 594866
...
060 - 458445
061 - 20673514
062 - 20978750
063 - 617336
...
145 - 470577
146 - 42773890
147 - 42822470
148 - 462516
...
316 - 480823
317 - 87866537
318 - 89459648
319 - 438552
...
657 - 504671
658 - 177226079
659 - 177820189
660 - 484756
...
997 - 564084
998 - 556363
999 - 524782
It is clear that at some stack depths we run out of stack memory and need to create new stack and copy old stack to prefix of the new one. This operation takes time and the larger the stack the larger the cost. I found curious that without growing the stack the cost of calling time.Now() on different stack depths is constant. Sure there is price for unwinding stack if you want to return the value. Consider the following benchmark.
package main
import (
"strconv"
"testing"
"time"
)
var Blackhole time.Time
func BenchmarkNow(b *testing.B) {
for depth := 0; depth < 100; depth++ {
b.Run("Depth"+strconv.Itoa(depth), func(b *testing.B) {
for i := 0; i < b.N; i++ {
Blackhole = nowInDepth(depth)
}
})
}
}
func nowInDepth(depth int) time.Time {
if depth == 0 {
return time.Now()
}
return nowInDepth(depth - 1)
}
Here are the results with boring ones ommitted.
BenchmarkNow/Depth0 41.12 ns/op
BenchmarkNow/Depth1 38.26 ns/op
BenchmarkNow/Depth2 38.82 ns/op
BenchmarkNow/Depth3 39.13 ns/op
...
BenchmarkNow/Depth15 40.29 ns/op
BenchmarkNow/Depth16 40.55 ns/op
BenchmarkNow/Depth17 43.13 ns/op
BenchmarkNow/Depth18 107.70 ns/op
BenchmarkNow/Depth19 103.90 ns/op
BenchmarkNow/Depth20 111.50 ns/op
...
BenchmarkNow/Depth28 96.53 ns/op
BenchmarkNow/Depth29 94.81 ns/op
BenchmarkNow/Depth30 92.30 ns/op
BenchmarkNow/Depth31 246.70 ns/op
BenchmarkNow/Depth32 255.40 ns/op
BenchmarkNow/Depth33 269.50 ns/op
BenchmarkNow/Depth36 267.80 ns/op
BenchmarkNow/Depth37 276.70 ns/op
BenchmarkNow/Depth38 310.40 ns/op
BenchmarkNow/Depth39 307.20 ns/op
...
BenchmarkNow/Depth78 368.00 ns/op
BenchmarkNow/Depth79 368.30 ns/op
BenchmarkNow/Depth80 364.30 ns/op
BenchmarkNow/Depth81 551.50 ns/op
BenchmarkNow/Depth82 565.40 ns/op
BenchmarkNow/Depth83 573.00 ns/op
...
BenchmarkNow/Depth97 624.90 ns/op
BenchmarkNow/Depth98 640.00 ns/op
BenchmarkNow/Depth99 649.90 ns/op
The cost of unwinding stack grows not linear. At some stack depths there are leaps. I believe they are caused by stack memory growing beyond different levels of memory caches.