That’s a cool approach. Running OCR locally avoids the usual privacy and latency trade-offs, and turning the phone into a network-accessible endpoint is clever.
Curious about performance — how fast is the Vision Framework on-device compared to something like Tesseract or cloud OCR APIs? And does the app stay responsive if the phone is handling multiple requests at once?
Like many OCR solutions, this unfortunately is incomplete. For serious work the final output should be something like a PDF of the original image with the OCRed text embedded. Why? Ground truth. OCR is not reliable enough to isolate its output from the source. The original needs to be available for checking.
Cool - although I can't help to think that running a macOS VM and run the Vision Framework tool on it will be less clunky in the long run. Phones don't like to run with screens on 24/7 etc.
I also developed a macOS version using the Rust programming language! https://github.com/riddleling/macocr