GopherCon 2019 - Optimization for Number of goroutines using Feedback Control
These are some notes from my experiences at the GopherCon 2019. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.
Why Optimize the number of goroutines?
-
Experimental evidence that there is an optimization to be had
- But different envs had different apparent optima
- Universally, at some point there were too many and performance degraded though
- Different processes on the same architecture also had different apparent optima
How can we optimize?
- One option: experiment in each case (but that takes time and effort)
- Note: Can think of OS threads as a worker process for the goroutine run queue(s)
- We want to optimize dynamically and based on bottleneck detection in a running program
Process
- Measure performance
- Increase count until performace target
- Occasionally try to decrease goroutines and see if we can do with fewer and adjust
Performance Targets
-
Don’t really want to special-case performance per-context
-
Can try to use CPU usage as a target
-
We don’t even know the target though
- Start the target too high and after a while of not hitting it, slowly lower the target
Feedback Control
-
Good way to make this all work
-
Controller: PID
- P: Proportional – enables large adjustment
- I: Integral – integral of error values over time – doesn’t allow too large of swings
- D: Derivative – derivative of error values over time – enables starting to change again after things have gotten good
-
Nested controllers to change the target at the same time as the # of goroutines
- Both take input of current cpu use
Implementation
-
Common practice – buffered channel as a semaphore
- Can’t change the buffer size though
-
Designed an elastic semaphore to work the same way but enables changing the size
- Can’t kill running goroutines though, so long-lived worker processes aren’t great
-
Kaburaya (example use there)
Benefits and drawbacks
- Some cases worked well, others not so well
- Need to improve predictive accuracy of the target cpu use point
- Picking K values (0.1-0.3 worked well for him)