GopherCon 2019 - Optimization for Number of goroutines using Feedback Control

conference, golang, gophercon2019, notes

These are some notes from my experiences at the GopherCon 2019. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.


Why Optimize the number of goroutines?

  • Experimental evidence that there is an optimization to be had

    • But different envs had different apparent optima
    • Universally, at some point there were too many and performance degraded though
    • Different processes on the same architecture also had different apparent optima

How can we optimize?

  • One option: experiment in each case (but that takes time and effort)
  • Note: Can think of OS threads as a worker process for the goroutine run queue(s)
  • We want to optimize dynamically and based on bottleneck detection in a running program

Process

  • Measure performance
  • Increase count until performace target
  • Occasionally try to decrease goroutines and see if we can do with fewer and adjust

Performance Targets

  • Don’t really want to special-case performance per-context

  • Can try to use CPU usage as a target

  • We don’t even know the target though

    • Start the target too high and after a while of not hitting it, slowly lower the target

Feedback Control

  • Good way to make this all work

  • Controller: PID

    • P: Proportional – enables large adjustment
    • I: Integral – integral of error values over time – doesn’t allow too large of swings
    • D: Derivative – derivative of error values over time – enables starting to change again after things have gotten good
  • Nested controllers to change the target at the same time as the # of goroutines

    • Both take input of current cpu use

Implementation

  • Common practice – buffered channel as a semaphore

    • Can’t change the buffer size though
  • Designed an elastic semaphore to work the same way but enables changing the size

    • Can’t kill running goroutines though, so long-lived worker processes aren’t great
  • Kaburaya (example use there)

Benefits and drawbacks

  • Some cases worked well, others not so well
  • Need to improve predictive accuracy of the target cpu use point
  • Picking K values (0.1-0.3 worked well for him)