« BackShow HN: LLMpeggithub.comSubmitted by jjcm 4 days ago
  • PaulKeeble 3 hours ago

    FFMpeg is one of those tools that is really quite hard to use. The sheer surface area of the possible commands and options is incredible and then there is so much arcane knowledge around the right settings. Its defaults aren't very good and lead to poor quality output in a lot of cases and you can get some really weird errors when you combine certain settings. Its an amazingly capable tool but its equipped with every foot gun going.

    • fastily 23 minutes ago

      ffmpeg has abysmal defaults. I've always been of the opinion that CLI utilities should have sane defaults useful to a majority of users. As someone who has used ffmpeg for well over a decade, I find it baffling that you have to pass so many arguments to get an even remotely usable result

    • vunderba 3 hours ago

      It's good that you have a "read" statement to force confirmation by the user of the command, but all it takes is one errant accidental enter to end up running arbitrary code returned from the LLM.

      I'd constrain the tool to only run "ffmpeg" and extract the options/parameters from the LLM instead.

      • mochajocha 2 hours ago

        One option is to simply not run LLM-hallucinated commands. For example, you could read the documentation and write the command yourself, which would ensure it's more likely to do what you want. Alternatively, you could run arbitrary commands and accept the consequences.

        • jeffgreco an hour ago

          Why did you create this account just to post repeatedly complaining about this project?

          • j45 an hour ago

            Adding an explanation of the patented and what they do is a great step as well to teach the tool to help build muscle memory

        • davmar 4 hours ago

          i think this type of interaction is the future in lots of areas. i can imagine we replace API's completely with a single endpoint where you hit it up with a description of what you want back. like, hit up 'news.ycombinator.com/api' with "give me all the highest rated submissions over the past week about LLMs". a server side LLM translates that to SQL, executes the query, returns the results.

          this approach is broadly applicable to lots of domains just like FFMpeg. very very cool to see things moving in this direction.

          • mochajocha 2 hours ago

            Except you don't need an LLM to do any of this, and it's already computationally cheaper. If you don't know the results you want, you should figure that out first, instead of asking a Markov chain to do it.

            • tomrod an hour ago

              I believe this approach is destined for a lot of disappointment. LLMs enable a LOT of entry- and mid-level performance, quickly. Rightfully, you and I worry about the edge cases and bugs. But people will trend towards things that enable them to do things faster.

          • xnx 5 hours ago

            Reminds me of llm-jq: https://github.com/simonw/llm-jq

            • kazinator 4 hours ago

              Parsing simple English and converting it to ffmpeg commands can be done without an LLM, running locally, using megabytes of RAM.

              Check out this AI:

                $ apt install cdecl
                [ ... ]
                After this operation, 62.5 kB of additional disk space will be used.
                [ ... ]
                $ cdecl
                Type `help' or `?' for help
                cdecl> declare foo as function (pointer to char) returning pointer to array 4 of pointer to function (double) returning double
                double (*(*foo(char *))[4])(double )
              
              Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.

              If FFMpeg's command line bugged me badly enough, I'd write "ffdecl".

              • andreasmetsala 3 hours ago

                > Granted, this one has a very rigid syntax that doesn't allow for variation, but it could be made more flexible.

                That’s kind of the killer feature of an LLM. You don’t even need to have your fingers on the right place on the keyboard and it will parse gibberish correctly as long as it’s shifted consistently.

                • airstrike 3 hours ago

                  I tell Claude to do things like I have brainrot and it still understands me like "ok, gib fn innew codblock"

                • unleaded 3 hours ago

                  "declare foo as function (pointer to char) returning pointer to array 4 of pointer to function (double) returning double" i would not call English

                  • mochajocha 2 hours ago

                    Terms of art aren't not English just because they're inscrutable to non-experts.

                    • bdhcuidbebe 2 hours ago

                      That should be crystal clear to the hn crowd, or is that no longer the case?

                    • varenc 2 hours ago
                    • minimaxir 4 hours ago

                      The system prompt may be a bit too simple, especially when using gpt-4o-mini as the base LLM that doesn't adhere to prompts well.

                      > You write ffmpeg commands based on the description from the user. You should only respond with a command line command for ffmpeg, never any additional text. All responses should be a single line without any line breaks.

                      I recently tried to get Claude 3.5 Sonnet to solve an FFmpeg problem (write a command to output 5 equally-time-spaced frames from a video) with some aggressive prompt engineering and while it seems internally consistent, I went down a rabbit hole trying to figure out why it didn't output anything, as the LLMs assume integer frames-per-second which is definitely not the case in the real world!

                      • sdesol an hour ago

                        I asked your question across multiple LLMs and had them reviewed by multiple LLMs. DeepSeek Chat said Claude 3.5 Sonnet produced an invalid command. Here is my chat.

                        https://beta.gitsense.com/?chats=197c53ab-86e9-43d3-92dd-df8...

                        Scroll to the bottom on the left window to see that Claude acknowledges that the command that DeepSeek produced was accurate. In the right window, you'll find the conversation I had with DeepSeek chat about all the commands.

                        I then asked all the models again if the DeepSeek generated command was correct and they all said no. And when I asked them to compare all the "correct" commands, Sonnet and DeepSeek said Sonnet was the accurate one:

                        https://beta.gitsense.com//?chat=47183567-c1a6-4ad5-babb-9bb...

                        That command did not work but I got the impression that DeepSeek could probably get me a working solution, so after telling it the errors I keep getting, it got to a point where it could write a bash script for me to get 5 equally spaced frames.

                        I guess the long story short is, changing the prompt probably won't be enough and you will need to constantly shop around to see which LLM will most likely give the correct response based on the question you are asking.

                      • yreg 5 hours ago

                        FFmpeg is a tool that I now use purely with LLM help (and it is the only such tool for me). I do however want to read the explanation of what the AI-suggested command does and understand it instead of just YOLO running it like in this project.

                        I have had the experience where GPT/LLAMA suggested parameters that would have produced unintended consequences and if I haven't read their explanation I would never know (resulting in e.g. a lower quality video).

                        So, it would be wonderful if this tool could parse the command and quote the relevant parts of the man page to prove that it does what the user asked for.

                        • fourthark 3 hours ago

                          I always wonder what's the difference between LLMing shell commands and

                            curl https://example.com | sh
                          • yreg 9 minutes ago

                            The difference is in reviewing the output. And the LLM is not a conscious malicious actor.

                            • mochajocha 2 hours ago

                              Running arbitrary LLM output isn't (yet) seen as the terrible idea it is. Give it a few years.

                            • mochajocha 2 hours ago

                              "man ffmpeg" ought to help.

                              • yreg 10 minutes ago

                                If I have to find it in the manual myself then I don't need an LLM assistant to begin with.

                            • fitsumbelay an hour ago

                              probably more helpful for learning than actual productivity with ffmpeg but really like this project (zap emoji)

                              • dvektor an hour ago

                                this might be the best use of llm's discovered to date

                                • jerpint 2 hours ago

                                  just today using ffmpeg , I was thinking how useful it would be to have an LLM in the logs, explaining what the command you just ran will do

                                  • alpb 5 hours ago

                                    I'd probably use GitHub's `??` CLI or `llm-term` that already this without needing to install a purpose-specific tool. Do you provide any specific value add on top of these?

                                    • lutherqueen 5 hours ago

                                      Probably the fact that the AI has only access to the ffmpeg command is a value itself. Supervision is much less needed vs something that could hallucinate using rm -rf on the wrong place

                                      • stabbles 5 hours ago

                                        Did you look at the implementation? It executes arbitrary code

                                    • scosman 5 hours ago

                                      I installed warp, the LLM terminal and tried to track where it helped. It was crazy helpful for ffmpeg… and not much else.

                                      • j45 an hour ago

                                        I love that this is a bash script.

                                        Long live bash scripts universal ability to mostly just run.

                                        • behnamoh 2 hours ago

                                          this is redundant; why not just use simonwilson's `llm` that can do this too?

                                          * flagged.