• HackerThemAll a day ago

    What people seem to miss very hard is that they get interactive chat mode of all the models, including the best and newest (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite and older) totally for free. I mean when working from chat at https://aistudio.google.com/ the entire 1M context window and all is totally free of charge. You really get a very good AI for nothing.

    https://i.imgur.com/pgfRrZY.png

    • 7thpower a day ago

      Funny you mention this, I literally just got done loading the context window of AI studio up for an hour doing some prototyping and then was frustrated when I couldn’t see where I was at from billing (knew it couldn’t be that much, but I still like to know).

      I assumed because I’m on paid tiers it would still cost behind a certain usage amount, but I guess not.

      • undefined a day ago
        [deleted]
        • cma a day ago

          Can you opt out of them training on your data in that free tier?

        • matesz a day ago

          Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

          I’m using Claude Pro for daily driver and Gemini / ChatGPT free tiers.

          • rat9988 a day ago

            > Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

            Not on ai studio.

            • matesz 17 hours ago

              Oh my... I didn't know about Gemini Studio and didn't expect the possibility of it existing. Thanks for correcting!

            • HackerThemAll a day ago

              You are clearly confirming my comment above.

              • undefined a day ago
                [deleted]
                • thomastjeffery a day ago

                  How?

                  • ratg13 a day ago

                    Read the text, click the links, let it sink in

                    • thomastjeffery a day ago

                      I did that, and I assume GP did as well.

                      There is some information that you assume to have shared that we are not picking up on.

                      • what_ever 16 hours ago

                        May be ask your favorite AI about what you are missing. Or may be ask using AI studio as that won't rate limit you ;)

            • dang a day ago

              Related ongoing thread:

              Claude Sonnet 4 now supports 1M tokens of context - https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160 comments)

              • irthomasthomas a day ago

                So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS.

                • curl-up a day ago

                  Note that (in the first test, the only one where output length is reported), Gemini Pro returned more than 3x the amount of text, at less than 2x the amount of time. From my experience with Gemini, that time was probably mainly spent on thinking, length of which is not reported here. So looking at pure TPS of output, Gemini is faster, but without clear info on the thinking time/length, it's impossible to judge.

                  • jbellis a day ago

                    if they left them both on defaults, flash is thinking-by-default and sonnet 4 is no-thinking-by-default

                    • bitpush a day ago

                      > Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

                      It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.

                      • lugao a day ago

                        Anthropic also uses TPUs for inference.

                        • irthomasthomas a day ago

                          Do they rent them from Google? Or are they a different brand?

                          • ancientworldnow a day ago

                            Google provides them.

                            • irthomasthomas 14 hours ago

                              Ah cool I'll have to read up on that, I had thought that google was hoarding them.

                        • netdur a day ago

                          output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.

                        • arnaudsm a day ago
                        • undefined a day ago
                          [deleted]
                          • akomtu a day ago

                            IMO, a good contest between LLMs would be data compression. Each LLM is given the same pile of text, and then asked to create compact notes that fit into N pages of text. Then the original text is replaced with their notes and they need to answer a bunch of questions about the original text using the notes alone.

                            • rafaelmn 16 hours ago

                              Summarization ? I'm pretty sure there are benchmarks for this because people used summarization to build search indexes (at least a few years ago when I was working on this they did and there were benchmarks)

                            • ozbonus 10 hours ago

                              Mess o youxwh to yt h!

                              • daft_pink a day ago

                                i’m really curious how well they perform with a long chat history. i find that gemini often gets confused when the context is long enough and starts responding to prior prompts, using the cli or it’s gem chat window.

                                • XenophileJKO a day ago

                                  From my experience. Gemini is REALLY bad about context blending. It can't keep track of what I said and what it said in a conversation under 200K tokens. It blends concepts and statements up, then refers to some fabricated hybrid fact or comment.

                                  Gemini has done this in ways that I haven't seen in the recent or current generation models from OpenAI or Anthropic.

                                  It really surprised me that Gemini performs so well in multi-turn benchmarks, given that tendency.

                                  • IanCal a day ago

                                    I’ve not experimented with the recent models for this but older Gemini models were awful for this - they’d lie about what I’d said or what was in their system prompt even with short conversations.

                                • koakuma-chan a day ago

                                  I really doubt you can fit all Harry Potter books in 1M tokens.

                                  • PeterStuer a day ago

                                    The series is 1,084,170 words. At let's say 1.4 tokens per word, this would not fit, but it is getting close.

                                    • magicalhippo a day ago

                                      How do they do if you test[1] them for attention deficit disorder?

                                      [1]: https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870

                                      • koakuma-chan a day ago

                                        It's 2M tokens for Gemini.

                                        • chrismustcode a day ago

                                          That was previous iterations, 2.5 is 1 million context window

                                          https://ai.google.dev/gemini-api/docs/models (context window is details under model variant section with + signs)

                                          They were meant to crank 2.5 to 2 million at some point though, maybe waiting now till 3?

                                          • bredren a day ago

                                            Maybe consuming the resources internally.

                                            • koakuma-chan a day ago

                                              I mean the Harry Potter books are 2M tokens.

                                        • gcr a day ago

                                          The entire HP series is about one million words.

                                          • koakuma-chan a day ago

                                            Harry Potter and the Order of Phoenix alone is 400K tokens.

                                            • kridsdale3 a day ago

                                              And takes up a proportional width of everyone's bookshelves along side the others.

                                              • llm_nerd a day ago

                                                Curious, I found an epub, converted it to a txt, and dumped it into the Qwen3 tokenizer. It yielded 359,088 tokens, end to end.

                                                Using the GPT-4 tokenizer (cl100k_base) yields 349,371 tokens.

                                                Recent Google and Anthropic models do not have local tokenizers and ridiculously make you call their APIs to do it, so no idea about those.

                                                Just thought that was interesting.