Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code
Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated Yoto Player mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the l