> Completely bespoke models are typically trained in Python using tools like TensorFlow, JAX or PyTorch that don't have real non-Python alternatives
The article outlines some interesting ways to evade this problem. What's the latest thinking on robustly addressing it, e.g. are there any approaches for executing inference on a tf or pytorch model from within a golang process, no sidecar required?
Perhaps check out GoMLX ("an Accelerated ML and Math Framework", there's a lot of scaffolding and it JITs to various backends. Related to that project, I sometimes use GoNB in VSCode, which is Golang notebooks [2].
For Go specifically, there are some libraries like Gorgonia (https://github.com/gorgonia/gorgonia) that can do inference.
Practically speaking though, the rate at which models change is so fast that if you opt to go this route, you'll perpetually be lagging behind the state of the art by just a bit. Either you'll be the one implementing the latest improvements or be waiting for the framework to catch up. This is the real value of the sidecar approach: when a new technique comes out (like speculative decoding, for example) you don't need to reimplement it in Go but instead can use the implementation that most other python users will use.
It is possible to include CPython in a CGO program - allowing Python to be executed from within the CGO process directly. This comes with some complexities - GIL and thread safety in Go routines, complexity of cross-compiling between architectures, overhead in copying values across the FFI, limitations of integrating as a Go module. I am hoping to see a CGO GIL'less Python integration show up here at some point that has all the answers.
These frameworks are C++ under the hood. A far as I know (not too experienced with go) you can use cgo to call any C++ code. So you should be able to serialize the model (torchscript) then run it with libtorch. Tensorflow also similarly has a C++ api
I was surprised you chose http for your IPC - I was expecting there to be a more handy tool that Python could expose and Go could leverage without needing to keep a second process constantly running.
Keeping models in memory is more efficient than constantly loading/unloading them, so a process that keeps running is the way to go