• nutlope 8 months ago

    Hi all, I'm the author of llama-ocr. Thank you for sharing & for the kind comments! I built this earlier this week since I wanted a simple API to do OCR – it uses llama 3.2 vision (hosted on together.ai, where i work) to parse images into structured markdown. I also have it available as an npm package.

    Planning to add a bunch of other features like the ability to parse PDFs, output a response in JSON, ect... If anyone has any questions, feel free to send them and I'll try to respond!

    • nh2 8 months ago

      I put in a bill that has 3 identical line items and it didn't include them as 3 bullet points as usual, but generated a table with a "quantity" column that doesn't exist on the original paper.

      Is this amount of larger transformation expected/desirable?

      (It also means that the output is sometimes a bullet point list, sometimes a table, making further automatic processing a bit harder.)

    • rch 8 months ago

      I've had trouble with pulling scientific content out of poster PDFs, mostly because e.g. nougat falls apart with different layouts.

      Have you considered that usage yet?

      • Szpadel 8 months ago

        > Need an example image? Try ours. Great idea, I wish more services would have similar feature

        • gcr 8 months ago

          How accurate is this?

          When compared with existing OCR systems, what sorts of mistakes does it make?

          • Curiositry 8 months ago

            Option to use a local LLM?

            • Eisenstein 8 months ago

              I made a script which does exactly the same thing but locally using koboldcpp for inference. It downloads MiniCPM-V 2.6 with image projector the first time you run it. If you want to use a different model you can, but you will want to edit the instruct template to match.

              * https://github.com/jabberjabberjabber/LLMOCR

              • nirav72 8 months ago

                MiniCPM-v 2.6 is probably the best self-hosted vision model I have used so far. Not just for OCR, but also image analysis. I have it setup, so my NVR (frigate) sends couple of images upon motion alert from a driveway security camera to Ollama with minicpm-v 2.6. I’m able to get a reasonably accurate description of the vehicle that pulled into the driveway. Including describing the person that exits the vehicle and also the license plate. All sent to my phone.

                • timmattison 8 months ago

                  I love this. Can you share the source?

          • Eisenstein 8 months ago

            All it does is send the image to Llama 3.2 Vision and ask for it to read the text.

            Note that this is just as open to hallucination as any other LLM output, because what it is doing is not reading the pixels looking for text characters, but describing the picture, which uses the images it trained on and their captions to determine what the text is. It may completely make up words, especially if it can't read them.

            • M4v3R 8 months ago

              This is also true for any other OCR system, we just never called these errors “hallucinations” in this context.

              • geysersam 8 months ago

                I gave this tool a picture of a restaurant menu and it made up several additional entries that didn't exist in the picture... What other OCR system would do that?

                • noduerme 8 months ago

                  No, it's not even close to OCR systems, which are based on analyzing points in a grid for each character stroke and comparing them with known characters. Just for one thing, OCR systems are deterministic. Deterministic. Look it up.

                  • visarga 8 months ago

                    OCR system use vision models and as such they can make mistakes. They don't sample but they produce a distribution of probability over words like LLMs.

                    • undefined 8 months ago
                      [deleted]
                    • alex_suzuki 8 months ago

                      One of my worries for the coming years is that people will forget what deterministic actually means. It terrifies me!

                      • noduerme 8 months ago

                        Not to get real dark and philosophical (but here goes) it took somewhere around 150,000 years for humans to go from spoken language to writing. And almost all of those words were irrational. From there to understanding and encoding what is or isn't provable, or is or isn't logically deterministic, took the last few hundred years. And people who have been steeped in looking at the world through that lens (whether you deal with pure math or need to understand, e.g. by running a casino, what is not deterministic, so as to add it to your understanding of volatility and risk) are able to identify which factors in any scenario are deterministic or not very quickly. One could almost say that this ability to discern logic from fuzz is the crowning achievement of science and civilization, and the main adaptation conferred upon some humans since speech. Unfortunately, it is very recent, and it's still an open question as to whether it's an evolutionary advantage to be able to tell the difference between magic and process. And yeah, it's scary to imagine a world where people can't; but that was practically the whole world a few centuries ago, and it wouldn't be terribly surprising if humanity regressed to that as they stopped understanding how to make tools and most people began treating tools like magic again. Sad time to be alive.

                    • llm_trw 8 months ago

                      It really isn't since those systems are character based.

                      • 8n4vidtmkvmk 8 months ago

                        OCR tools sometimes make errors, but they don't make things up. There's a difference.

                    • bbor 8 months ago

                      Looks awesome! Been doing a lot of OCR recently, and love the addition to the space. The reigning champion in the PDF -> Markdown space (AFAIK) is Facebook's Nougat[1], and I'm excited to hook this up to DSPy and see which works better for philosophy books. This repo links the Zerox[2] project by some startup, which also looks awesome, and certainly more smoothly advertised than Nougat. Would love corrections/advice from any actual experts passing by this comment section :)

                      That said, I have a few questions if OP/anyone knows the answers:

                      1. What is Together.ai, and is this model OSS? Their website sells them as a hosting service, and the "Custom Models" page[3] seems to be about custom finetuning, not, like, training new proprietary models in-house. They might have a HuggingFace profile but it's hard to tell if it's them https://huggingface.co/TogetherAI

                      2. The GitHub says "hosted demo", but the hosting part is just the tiny (clean!) WebGUI, yes? It's implied that this functionality is and will always be available only through API calls?

                      P.S. The header links are broken on my desktop browser -- no onClick triggered

                      [1] https://facebookresearch.github.io/nougat/

                      [2] https://github.com/getomni-ai/zerox

                      [3] https://www.together.ai/products#custom-models

                      • jurnalanas 8 months ago

                        the project author is Devrel from Together.ai. This is a fantastic way to advertise a dev tool, though.

                        • gexla 8 months ago

                          My guess is together.ai is at least partially sponsoring the demo.

                          • magicalhippo 8 months ago

                            Yeah was hoping for something I could self-host, both for privacy and cost.

                            • rajansheth 8 months ago

                              together.ai serves 100+ open-source models including multi-modal Llama 3.2 with an OpenAI compatible API

                            • sdflhasjd 8 months ago

                              Here's a bit of a quirk: I uploaded a webcomic as an example, all the dialog was ALL CAPS, but the output was inconsistently either sentence case or title case between panels.

                              I also tried some real examples a problem I'd like to use OCR with: I've got some old slides that needs digitising, and most of them are labelled, uploading one of these provides the output:

                                The image appears to be a photograph of a slide or film frame, possibly from an old camera or projector. The slide is yellowed with age and has a rectangular cutout in the center, which is filled with a dark gray or black material. The cutout is surrounded by a thin border, and there is some text written on the slide in black ink.
                              
                                The text reads "Once Upon a Time" and is written in a cursive font. It is located at the bottom of the slide, below the cutout. There is also a small number "1069" written in the same font and color, but it is not clear what this number refers to.
                              
                                Overall, the image suggests that the slide is an old photograph or film frame that has been preserved for many years. The yellowing of the slide and the cursive writing suggest that it may be from the early 20th century or earlier.
                              
                              So aside from unnecessary repetitious description of the slide, (and the "yellowing" is actually just white balance being off, though I can forgive that), the actual written text (not cursive) was "Once Uniquitous." and the number was 106g. It's very clearly a 'g' and not a '9'.

                              What I think is interesting about this is that it might be a demonstration of biases in models, it focuses too much on the slide being an antique that it hallucinated a completely cliche title. Also, it missed the forest for the trees and that the "black square" was the slide being front-lit so the text could be read, so the transparency wasn't visible.

                              Additionally, the API itself seems to have file size or resolution limits that are not documented

                              • philips 8 months ago

                                I have recently used llama3.2-vision to handle some paper bidsheets for a charity auction and it is fairly accurate with some terrible handwriting. I hope to use it for my event next year.

                                I do find it rather annoying not being able to get it to consistently output a CSV though. ChatGPT and Gemini seem better at doing that but I haven’t tried to automate it.

                                The scale of my problem is about 100 pages of bidsheets and so some manual cleaning is ok. It is certainly better than burning volunteers time.

                                https://github.com/philips/paper-bidsheets

                                • wriggler 8 months ago

                                  I'd love to hear how Handwriting OCR (https://www.handwritingocr.com) compares for your task.

                                  It's not free, but its accuracy for for handwritten documents is the best out there (I am the founder, so am biased, but I'm really excited about where the accuracy is now). It could save you time and for your 100 page project would cost only $12.

                                  • KetoManx64 8 months ago

                                    My main qualm with a project like yours is that I have to upload my documents to a third party and trust them with that data. I have a couple thousand pages worth of journal entries from the last decade and I would never upload those to a website to get OCR'd, but with a local Ollama model I have full control of the data and it all stays local.

                                    • wriggler 8 months ago

                                      I understand your concern, and it's a common one. However, we can only give assurances in our privacy policy that your data is used only to perform the OCR, and nothing else. You can delete all data from the server immediately after downloading your results and no trace will be left.

                                      Of course a local solution like Ollama is preferable for privacy reasons but, for now, the OCR performance of available local models is just not very good, especially from handwritten documents. With a couple thousand pages of journal entries, that means a lot of post-processing and editing.

                                  • mosselman 8 months ago

                                    What about using llama3.2-vision to do the OCR bit and then deferring to ChatGPT to do the CSV part?

                                  • notsylver 8 months ago

                                    I've been doing a lot of OCR recently, mostly digitising text from family photos. Normal OCR models are terrible at it, LLMs do far better. Gemini Flash came out on top from the models I tested and it wasn't even close. It still had enough failures and hallucinations to make it faster to write it in by hand. Annoying considering how close it feels to working.

                                    This seems worse. Sometimes it replies with just the text, sometimes it replies with a full "The image is a scanned document with handwritten text...". I was hoping for some fine tuning or something for it to beat Gemini Flash, it would save me a lot of time. :(

                                    • philips 8 months ago

                                      Have you tried downscaling the images? I started getting better results with lower resolution images. I was using scans made with mobile phone cameras for this.

                                      convert -density 76 input.pdf output-%d.png

                                      https://github.com/philips/paper-bidsheets

                                      • notsylver 8 months ago

                                        That's interesting. I downscaled the images to something like 800px but that was mostly to try improve upload times. I wonder if downscaling further and with a better algorithm would help.. I remember using CLIP and found different scaling algorithms helped text readability. Maybe the text is just being butchered when its rescaled.

                                        Though I also tried with the high detail setting which I think would deal with most issues that come from that and it didn't seem to help much

                                      • og_kalu 8 months ago

                                        >Normal OCR models are terrible at it, LLMs do far better. Gemini Flash came out on top from the models I tested and it wasn't even close.

                                        For Normal models, the state of Open Source OCR is pretty terrible. Unfortunately, the closed options from Microsoft, Google etc are much better. Did you try those ?

                                        Interesting about Flash, what LLMs did you test ?

                                        • notsylver 8 months ago

                                          I tried open source and closed source OCR models, all were pretty bad. Google vision was probably the best of the "OCR" models, but it liked adding spaces between characters and had other issues I've forgotten. It was bad enough that I wondered if I was using it wrong. By the time I was trying to pass the text to an LLM with the image so it could do "touchups" and fix the mistakes, I gave up and decided to try LLMs for the whole task.

                                          I don't remember the exact models, I more or less just went through the OpenRouter vision model list and tried them all. Gemini Flash performed the best, somehow better than Gemini Pro. GPT-4o/mini was terrible and expensive enough that it would have had to be near perfect to consider it. Pixtral did terribly. That's all I remember, but I tried more than just those. I think Llama 3.2 is the only one I haven't properly tried, but I don't have high hopes for it.

                                          I think even if OCR models were perfect, they couldn't have done some of the things I was using LLMs for. Like extracting structured information at the same time as the plain text - extracting any dates listed in the text into a standard ISO format was nice, as well as grabbing peoples names. Being able to say "Only look at the hand-written text, ignore printed text" and have it work was incredible.

                                        • pbhjpbhj 8 months ago

                                          The OCR in OneNote is incredible IME. But, I've not tested in a wide range of fonts -- only that I have abysmal handwriting and it will find words that are almost unrecognisable.

                                        • danvk 8 months ago

                                          I've had really good luck recently running OCR over a corpus of images using gpt-4o. The most important thing I realized was that non-fancy data prep is still important, even with fancy LLMs. Cropping my images to just the text (excluding any borders) and increasing the contrast of the image helped enormously. (I wrote about this in 2015 and this post still holds up well with GPT: https://www.danvk.org/2015/01/07/finding-blocks-of-text-in-a...).

                                          I also found that giving GPT at most a few paragraphs at a time worked better than giving it whole pages. Shorter text = less chance to hallucinate.

                                          • pbhjpbhj 8 months ago

                                            Have you tried doing a verification pass: so giving gpt-4o the output of the first pass, and the image, and asking if they can correct the text (or if they match, or...)?

                                            Just curious whether repetition increases accuracy or of it hurt increases the opportunities for hallucinations?

                                            • danvk 8 months ago

                                              I have not, but that's a great idea!

                                          • 8n4vidtmkvmk 8 months ago

                                            That's a bummer. I'm trying to do the exact same thing right now, digitize family photos. Some of mine have German on the back. The last OCR to hit headlines was terrible, was hoping this would be better. ChatGPT 4o has been good though, when I paste individual images into the chat. I haven't tried with the API yet, not sure how much that would cost me to process 6500 photos, many of which are blank but I don't have an easy way to filter them either.

                                            • notsylver 8 months ago

                                              I found 4o to be one of the worst, but I was using the API. I didn't test it but sometimes it feels like images uploaded through ChatGPT work better than ones through the API. I was using Gemini Flash in the end, it seemed better than 4o and the images are so cheap that I have a hard time believing google is making any money even by bandwidth costs

                                              I also tried preprocessing images before sending them through. I tried cropping it to just the text to see if it helped. Then I tried filtering on top to try brighten the text, somehow that all made it worse. The most success I had was just holding the image in my hand and taking a photo of it, the busy background seemed to help but I have absolutely no idea why.

                                              The main problem was that it would work well for a few dozen images, you'd start to trust it, and then it'd hallucinate or not understand a crossed out word with a correction or wouldn't see text that had faded. I've pretty much given up on the idea. My new plan is to repurpose the website I made for verifying the results into one where you enter the text manually, as well as date/location/favourite status.

                                              • bosie 8 months ago

                                                Use a local rubbish model to extract text. If it doesn’t find any on the back, don’t send it to chatgtp?

                                                Terrascan comes to mind

                                                • 8n4vidtmkvmk 8 months ago

                                                  "Terrascan" is a vision model? The only hits I'm getting are for a static code analyzer.

                                                  • bosie 8 months ago

                                                    sorry, i meant "Tesseract"

                                              • undefined 8 months ago
                                                [deleted]
                                                • bboygravity 8 months ago

                                                  Have you tried Claude?

                                                  It's not good at returning the locations of text (yet), but it's insane at OCR as far as I have tested.

                                                  • undefined 8 months ago
                                                    [deleted]
                                                  • gexla 8 months ago

                                                    Should this be a "Show HN" post? Seems to just be the front-end and has no association we may make with the name Llama? Maybe together.ai gave them cloud space?

                                                    • mg 8 months ago

                                                      I gave it a sentence, which I created by placing 500 circles via a genetic algorithm to form a sentence. And then drew with an actual physical circle:

                                                      https://www.instagram.com/marekgibney/p/BiFNyYBhvGr/

                                                      Interestingly, it sees the circles just fine, but not the sentence. It replied with this:

                                                          The image contains no text or other elements
                                                          that can be represented in Markdown. It is a
                                                          visual composition of circles and does not
                                                          convey any information that can be translated
                                                          into Markdown format.
                                                      • Vetch 8 months ago

                                                        Based on the fact that squinting works, I applied a Gaussian blur to the image. Here's the response I got:

                                                        Markdown:

                                                        The provided image is a blurred text that reads "STOP THINKING IN CIRCLES." There are no other visible elements such as headers, footers, subtexts, images, or tables.

                                                        Markdown Content:

                                                        STOP THINKING IN CIRCLES

                                                        As the response is not deterministic, I also tried several times with the unprocessed image but it never worked. However, all the low-pass filter effects I applied worked with a high success rate.

                                                        https://imgur.com/q7Zd7fa

                                                        • mg 8 months ago

                                                          I guess blurring it is similar to reducing the resolution or to looking at the image from further away.

                                                          It's interesting that the neural net figures out the circles, but not the words. Because the circles are also not so easily apparent from looking closely at the image. It could also be whirly lines.

                                                        • DandyDev 8 months ago

                                                          I can't read this either.

                                                          Edit: at a distance it's easier to read

                                                          • thih9 8 months ago

                                                            If you squint it’s easier too. I wonder if lowering the resolution of the image would make the text visible to ocr.

                                                            • pbhjpbhj 8 months ago

                                                              I wonder if you could do a composite image, like bracketed images, and so give the model multiple goes, for which it could amalgamate results. So, you could do an exposure bracket, do a focus/blur, maybe a stretch/compression, or an adjustment for font-height as a proportion of the image.

                                                              Feed all of the alternatives to the model, tell it they each have the same textual content?

                                                          • ggerules 8 months ago

                                                            Was the original LLM ever trained on original material like this?

                                                            Pretty cool use of genetic algorithm! Would love to see the code or at least the reward function.

                                                            • echoangle 8 months ago

                                                              I can’t read anything but the „stop“ either without seeing the solution first

                                                              • wasyl 8 months ago

                                                                Why is it interesting? The image does not look like anything, and you need to skew it (by looking at an angle) to see any letters (barely).

                                                              • sinuhe69 8 months ago

                                                                Very funny. I put in 3 screen captures of a (long) document, and it did relatively well. But when I proof-read it, I realized the AI has made up passages that were not there!

                                                                The reason is probably due to the nature of screen capturing, some sentences or paragraphs were cut short. That probably kicked off the “fill in the blank” nature of the LLM and it could not resist to leave these paragraphs stand unfinished :LOL. It even put in a short conclusion paragraph that was not in the original document at all!

                                                                • abenga 8 months ago

                                                                  It boggles my mind that a technology where "making things up" is even a remote possibility is ever actually considered for use in the real world.

                                                                • rasz 8 months ago

                                                                  Old scan of Asus P3B-F motherboard schematic from 1997.

                                                                  - only managed to extract some of the text from Title Block (project name, date etc)

                                                                  - despite distinct font got all 8/B and 1/I mixed up.

                                                                  - the actual useful info got turned into

                                                                      Tables
                                                                      Table 1: [Insert table 1 here]
                                                                  
                                                                      Other Elements
                                                                      [Insert other elements here]
                                                                  • nash 8 months ago

                                                                    Holy Hallucinations batman!

                                                                    Even the example images hallucinates random text

                                                                    • KeplerBoy 8 months ago

                                                                      Same for me. The receipt headline only says "Trader Joe's" and yet the model insists on adding some information and transcribes "Trader Joe's Receipt". This is like Xeroxgate, but infinitely worse.

                                                                      Someday this will do great damage in ways we will completely neglect and overlook.

                                                                    • cheema33 8 months ago

                                                                      I uploaded a multi-page PDF and it did not know what to do. This is before I went to the github repo and noticed that it wasn't supported. I think the tool should let the user know when they upload a file that is not supported.

                                                                      • constantinum 8 months ago

                                                                        The problem with using LLMs for OCR is hallucinations. It makes it impossible to use in business use cases such as insurance, banking and health/medical — which demands high accuracy or predictable inaccuracy rate. Not to mention handling scale — processing millions of documents with speed and affordable costs.

                                                                        For all the test use cases mentioned in this thread, I’d suggest trying LLMwhisperer. A general purpose text Pre-processor/OCR built for LLM consumption. https://pg.llmwhisperer.unstract.com

                                                                        • Tepix 8 months ago

                                                                          So, i uploaded a HN screenshot and it showed some rendered text but where is the Markdown code? A site titles "Document to Markdown" that fails to give me the MarkDown? What am i overlooking?

                                                                          • xenodium 8 months ago

                                                                            Japanese OCR to structured content works very well via chatgpt API.

                                                                            https://xenodium.com/images/chatgpt-shell-repo-splits-up/jap...

                                                                            Other unrelated examples https://lmno.lol/alvaro/chatgpt-shell-repo-splits-up

                                                                            • amelius 8 months ago

                                                                              I tried it on a Walmart receipt. It misread a 9 for a 0.

                                                                              https://imgur.com/a/ni8zOmb

                                                                              • LeoPanthera 8 months ago

                                                                                I wonder what the watts-per-character is of this tool.

                                                                                • threatripper 8 months ago

                                                                                  Joules per character

                                                                                  • amelius 8 months ago

                                                                                    I'm running this with 60Hz on my HDMI output.

                                                                                    • danielEM 8 months ago

                                                                                      I think it is perfectly fine to describe it in Watts per character as you can easily determine how many characters per second you can process.

                                                                                  • AmazingTurtle 8 months ago

                                                                                    One can combine apache tika OCR and feed it together with the image into LLM to fix typos.

                                                                                    • cess11 8 months ago

                                                                                      While I'm a fan of Tika a lot of people get queasy from Java and XML, they might be better served by their preferred scripting language and https://github.com/ocrmypdf/OCRmyPDF, which has the same OCR engine.

                                                                                      • AmazingTurtle 8 months ago

                                                                                        May I introduce you to `apache/tika:2.9.2.1-full` with a REST API on 9998.

                                                                                        • cess11 8 months ago

                                                                                          Not sure what you mean. Are they making Graal-builds you can run standalone now? I only use Tika through Maven at work, might not be up to date on what happens in the project.

                                                                                    • fros1y 8 months ago

                                                                                      Are there any OCR engines out there that actually recognizes underlines properly? Even the LLMs seem to struggle to model the underline (though they get the text fine).

                                                                                      • alecco 8 months ago

                                                                                        Is it possible to do this locally with open source software? I have a lot of accounting PDFs to convert but due to privacy concerns it should not run in the cloud.

                                                                                        • criddell 8 months ago

                                                                                          Does it have to be open source, or just running locally? The paid version of Acrobat does this well. MacOS has pretty good built-in OCR capabilities and Windows isn’t far behind.

                                                                                          If you have the hardware for it, you can run some LLMs locally. Although for accounting data, I probably wouldn’t trust it.

                                                                                          • cess11 8 months ago

                                                                                            Either you need to be somewhat tolerant when it comes to misinterpretations and hallucinations, or you'll be proofreading a lot.

                                                                                            A cheap hack is to push the documents through pdftotext from Poppler and if nothing or very little comes out, push them through OCRMyPDF and pipe it to pdftotext. If it's scanned you probably want some flags for deskewing and so on.

                                                                                            To make a bulk load of PDF mostly greppable it's a decent technique, to get every 0 as a 0 you're probably going to proofread every conversion.

                                                                                            • Eisenstein 8 months ago

                                                                                              I don't recommend using it for anything important unless you very diligently proofread it, but I made one that runs locally that I linked to elsewhere in this post:

                                                                                              * https://news.ycombinator.com/item?id=42155548

                                                                                              • bugglebeetle 8 months ago

                                                                                                Yes, Docling and Marker do very similar things and can be run fully locally.

                                                                                              • undefined 8 months ago
                                                                                                [deleted]
                                                                                                • generalizations 8 months ago

                                                                                                  How does it handle images? That has seemed to be the major weak point of these doc-to-markdown systems.

                                                                                                  • d1sxeyes 8 months ago

                                                                                                    Seemed pretty good with handwriting. Didn’t make any mistakes with numbers in the sample I tried.

                                                                                                    • burnt-resistor 8 months ago

                                                                                                      I might've broken it as I gave it the Intel developer’s manual combined volumes. }:)

                                                                                                      • joeyblueee 8 months ago

                                                                                                        get this error in console when requesting /ocr, and a 504 status code """ An error occurred with your deployment

                                                                                                        FUNCTION_INVOCATION_TIMEOUT """

                                                                                                        • revskill 8 months ago

                                                                                                          Non-English image is slow.

                                                                                                          • MattDaEskimo 8 months ago

                                                                                                            Dreamt of fine design, layers of code, art refined— found wrappers instead.

                                                                                                            Nothing to see here folks.

                                                                                                            • noduerme 8 months ago

                                                                                                              Um, I just quickly uploaded an unstructured RTF file to this and apparently broke it... unless it's just realllly slow.

                                                                                                              If this is just for converting hand-written documents, maybe put that in the header of the website. Right now it just says "Document to Markdown", which could be interpreted lots of different ways.

                                                                                                              • sumedh 8 months ago

                                                                                                                Site is dead now :(

                                                                                                                • nutlope 8 months ago

                                                                                                                  Should be up, please try again!

                                                                                                                  • mkl 8 months ago

                                                                                                                    It let me upload a file, but didn't produce any output.

                                                                                                                • anothername12 8 months ago

                                                                                                                  We tried this and it was an absolute shit show for us.

                                                                                                                  • cpursley 8 months ago

                                                                                                                    You could have at least provided some constructive feedback...

                                                                                                                  • hrpnk 8 months ago

                                                                                                                    Reading the Llama community license agreement, section "Redistribution and Use" I expected to find 'Built with Llama'. Is this not required?

                                                                                                                    https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instr... links to the community license.

                                                                                                                    • kennethwolters 8 months ago

                                                                                                                      Why don't you think that calling the app "Llama-OCR" is good enough?

                                                                                                                      • sdflhasjd 8 months ago

                                                                                                                        The license is pretty specific, if the API counts as a "service".

                                                                                                                          i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation.
                                                                                                                      • undefined 8 months ago
                                                                                                                        [deleted]
                                                                                                                      • HaiderAftab1 8 months ago

                                                                                                                        [flagged]

                                                                                                                        • nutlope 8 months ago

                                                                                                                          Thank you!