GopherCon 2018 - Machine Learning on Go
These are some notes from my experiences at the GopherCon 2018. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.
Fun Facts
-
XP - 45M LoC
-
Ford F150 - 150M LoC!!!
-
Google - 1B LoC
-
Want better tools to let us write code better (and better code)
Machine Learning on Go Code
-
(mostly written in python)
-
Input to the model is source code (instead of plain text, images, etc)
- Similar to data mining, NLP, graph-based learning
-
Trying to extract information about source code
- More structured
- Generation should compile
- Trying to extract things like intent vs. code written
Getting the Data
- GH Archive
- Public Git Archive
- Language Identification
- File Parsing - generate language agnostic ASTs
- Token Extraction - function names
- History analysis (
go-git
)
Data Analysis
-
How to analyze?
-
series of bytes?
-
token sequence?
-
ast?
- machine learning on trees is HARD
-
flow graph?
-
How to Learn?
-
Neural Networks
- “kind of like a puppy”
-
Given prev 9 tokens, predict the next one
-
Recurrent Neural Networks
- Tried over the go std library with charRNN – 61% accuracy
- Generated some go-like constructs, but not really accurate
-
code2vec
What can we Build?
- Is the next token right or not?
- Identify “interesting” bits of code in a diff
- Suggesting function names
- Assisted code review (
src-d/lookout
)
Future
- Bug prediction
- Education
- Style Guide Enforcement
- Automated Code Review
Future Future
- Code Generation
- Natural Analysis