GopherCon 2019 - Machine Learning & AI with Go Workshop
These are some notes from my experiences at the GopherCon 2019. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.
Presenter: Daniel Whitenack
Introduction to ML/AI
Benefits of Go for ML/AI
- Type safety
- Performance pretty good
- Easy concurrency
Uses of AI in the World
- Classification: input of images/text, output label or bounding boxes etc
- Control systems (self-driving): input of images, output of control deltas
- Translation: input of text, output of more text
All basically just input -> ML model -> output (data transformation)
- Input data == features
- Output data == labels, responses
- ML model is just a function
ML Models
- Definitions – equations, expressions, conditions (if an image is mostly color C, then a cat)
- Parameters – weights, biases (the color C)
- Hyperparameters – parameters that we choose but don’t subject to training (kind of a part of model selection)
- Make ML/AI basically by trial and error to set parameter values
Two Major pieces to ML/AI
(at least supervised ML/AI)
- Inference / prediction – using the model
- Training – generating the model
Training
- Known Results (labels/responses for set inputs)
- Automated trial and error to find the “best” paramater value(s)
Model Selection
- How do we pick which definition is the best? Trial and error / domain knowledge
Machine Learning vs. AI
- Not a great answer to this – not well differentiated
Common Blockers
-
Getting the data
- Need annotations / known outputs for the training and evaluation data
-
Overfitting
-
Only works well on the data it knows about, not novel data
-
Set aside a validation set
- Can still overfit on validation set, but could like randomly select validation set every time or something?
- Can have a really separate holdout set that is not used in training too
-
Can always increase model complexity to decrease error too
-
Kinds of ML/AI Problems
- Object recognition / Classification
- Prediction (customers etc -> sales)
- Forecasting (last month sales -> this month sales)
- Recommendation (netflix problem)
- Clustering (group users by “similarity”)
Model Artifacts
-
After training, we save a model artifact file
- some include both definition and parameters, other just parameters
- various formats
- newer format emerging: onnx
Linear and Logistic Regression
Linear Regression
-
y = w * x + b (w == weights, b == biases)
- example: number of users (x) -> actual sales (y)
-
pick initial values somehow
- maybe random, maybe pick 2 points and draw that line
-
loss function
- determines how good a line is
- example: absolute vertical distance
-
data normalization
- squish values to always be 0-1 (or some other known range)
-
profiling data / looking for intuition
- for many-x worlds when wanting to pick a single x, try graphing all the pairs (
gc-ml/linear_regression/example1
)
- for many-x worlds when wanting to pick a single x, try graphing all the pairs (
-
before doing learning, reminder to pick out test data (
gc-ml/linear_regression/example2
)- might want to try and ensure that test data is representative and expand as necessary
-
Stochastic Gradient Descent training method (
gc-ml/linear_regression/example3
,gc-ml/linear_regression/example4
,gc-ml/linear_regression/example5
(adds multi-linear regression))-
epochs: number of training iterations (# of times through the training data)
-
gradient: more or less the derivative of goodness – move parameters in the direction of less error
- derivatives of error loss wrt each parameter, adjusted w/ learning rate
-
learning rate: hyperparamter that helps prevent huge jumping
-
-
Evaluate data (
gc-ml/linear_regression/example6
)-
test data set evaluated by
RMSE
(root mean squared error) when the loss function isMSE
(mean squared error)- gets back into the units of the prediction
-
multi-regression might not get you more than linear sometimes, but it might
-
might want to un-normalize errors to better understand error numbers
-
Logistic Regression
-
Pretty similar to linear (
gc-ml/logistic_regression
) -
Often used for classification where we need a step-function-like thing
-
Logistic function:
1 / (1 + e^(wx+b)) = 1 / (1 + e^b * e^wx)
-
Inflection at
- b/w
? (worked out for myself, but might have a wrong sign or something) -
Data cleaning is often necessary in real world (
gc-ml/logistic_regression/example2
) -
Intuition generation again (
gc-ml/logistic_regression/example3
) -
Don’t forget to create test/training splits (
gc-ml/logistic_regression/example4
) -
Training (
gc-ml/logistic_regression/example5
) -
Validation (
gc-ml/logistic_regression/example6
)- Accuracy – how many things did I get right?
- Alternatives: precision, recall, sensitivity, AUC, false pos/neg, etc.
-
goml
package to do a lot of this for you (gc-ml/logistic_regression/example7
)
Neural Networks and Deep Learning
- Gorgonia for tensorflow / theanos
Neural Networks
-
Semi-black-box neurons acting as mini-models
-
“With enough parameters we can model just about any relationship”
- Pile up logistic regressions (and other such things) to give enough freedom for more things
-
Terminology
- Input layer
- Hidden layers
- Output layer
- Feed forward – generate predictions
- Backpropogation – calculate error, then adjust parameters
-
Architecture choice is usually finding one that someone found has worked well
-
Iris flower classification example (
gc-ml/neural_networks/example{1,2}
)- uses “one-hot” encoding of correct species
Deep Learning
-
As used here: pre-trained models that we might tweak or just use to solve problems
-
TensorFlow trained model from python used in go (
gc-ml/deep_learning/example1
) for object identification- can be pretty verbose
-
Using gocv / opencv to interface w/ tensorflow model (
gc-ml/deep_learning/example2
) -
Using MachineBox to do classification via a rest service
ML Pipelines with Pachyderm
- Pachyderm seems to make ML pipeline work pretty darn efficient and painless, but that is definitely just first impression