GopherCon 2019 - Optimization for Number of goroutines using Feedback Control

: 25 July 2019
: conference, golang, gophercon2019, notes

These are some notes from my experiences at the GopherCon 2019. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.

Why Optimize the number of goroutines?

Experimental evidence that there is an optimization to be had
- But different envs had different apparent optima
- Universally, at some point there were too many and performance degraded though
- Different processes on the same architecture also had different apparent optima

How can we optimize?

One option: experiment in each case (but that takes time and effort)
Note: Can think of OS threads as a worker process for the goroutine run queue(s)
We want to optimize dynamically and based on bottleneck detection in a running program

Process

Measure performance
Increase count until performace target
Occasionally try to decrease goroutines and see if we can do with fewer and adjust

Performance Targets

Don’t really want to special-case performance per-context
Can try to use CPU usage as a target
We don’t even know the target though
- Start the target too high and after a while of not hitting it, slowly lower the target

Feedback Control

Good way to make this all work
Controller: PID
- P: Proportional – enables large adjustment
- I: Integral – integral of error values over time – doesn’t allow too large of swings
- D: Derivative – derivative of error values over time – enables starting to change again after things have gotten good
Nested controllers to change the target at the same time as the # of goroutines
- Both take input of current cpu use

Implementation

Common practice – buffered channel as a semaphore
- Can’t change the buffer size though
Designed an elastic semaphore to work the same way but enables changing the size
- Can’t kill running goroutines though, so long-lived worker processes aren’t great
Kaburaya (example use there)

Benefits and drawbacks

Some cases worked well, others not so well
Need to improve predictive accuracy of the target cpu use point
Picking K values (0.1-0.3 worked well for him)