Comments Page - Two upstart search engines are teaming up to take on Google

« Back Two upstart search engines are teaming up to take on Googlewired.comSubmitted by marban 7 days ago

ryukoposting 6 days ago
I'm beginning to suspect LLMs' viability for search purposes is dependent on your existing search habits. 40% of my search queries are just copy-pasted error messages. Another 10% are business names, for the sole purpose of finding their hours or phone number. Less than 10% are complete clauses or sentences.
I just don't see how an LLM fits my search habits. I tried using ChatGPT for search purposes, and it was dreadful. Incorrect hours for businesses, gibberish results for the copy-pasted error messages, it was an all-around failure.
Have you people been typing complete sentences into Google all these years? Is that where this whole "LLMs will replace search" thing is coming from?
- martinald 6 days ago
  LLMs open a whole new 'search' up for me, when you are not sure what you are searching for. For example, if you can't remember the name of a lib but you know roughly what it does (or if you have misremembered the name of it). This is often very time consuming.
  It's also far better than Google for recommending things (really anything) as you can use much more "precise" requirements for the recommendations and correct them, instead of getting at best some SEO spam.
  It's obviously not good for stuff that requires real time updates like opening times, though I suspect with time OpenAI et al will combine their own search index + RAG to solve a lot of this.
  The elephant in the room though is this is long term going to kill the incentive for a lot of content to be published. AI overviews have definitely reduced the amount of clicking through I do, even though I try and subconsciously 'support' sites by trying to find a link. But often it's right there.
- rustcleaner 6 days ago
  >Have you people been typing complete sentences into Google all these years?
  Well somebody has to be responsible for promoting the cancer which converted search boxes from logical set membership definition strings to "Smart™ boxes," designed to piss you off and then direct you to a sponsored product page.
- bambax 6 days ago
  I have never typed a complete sentence into Google and am still very satisfied with the quality of Google search.
  But... your comment made me realize, a good use of LLMs with search would be not to ask them directly, but to use them as a front: detect what the user wants and devise the perfect Google search to find it.
  LLMs should be very good at discriminating between a plain error message, a business name, or a general open question, building an excellent Google search for it, and parse and interpret the results.
  It seems that's what Perplexity is doing, mostly. Personally I find it slower and more cumbersome than using Google Search directly, but maybe that's the direction we're all going. Machines to help us use machines.
- loudmax 6 days ago
  There are some tasks that I would have used a search engine for in the past, because that was more or less the only option. Say, searching for "ffmpeg transcode syntax" and then spending 15 minutes comparing examples from Stack Overflow with the official documentation to try to make sense of them. Now I can tell Claude exactly what I'm trying to accomplish and it will give me an answer in 30 seconds that's either correct, or close enough for me to quickly make up the difference.
  I'm still going to turn to Google to find out what a store's opening hours or phone number is, as well as a lot of other tasks. But there are types of queries that are better suited for an LLM, that previously could only be done in a search engine.
  There's also a non-technical reason for LLM search. Google built its business on free search, paid for by advertising, which seemed like a good idea in the early 2000's. A few decades later and we have a better appreciation for the value of the ad-driven business model. Right now, there's a whole lot of money being thrown at online LLMs, so for the most part they're not really doing ads yet. It's refreshing to make a query and not have sponsored results at the top of the list. Obviously, the free online LLM business model isn't going to last indefinitely. In the pretty near future, we'll either need to start paying a usage fee, or parse through advertisements delivered by LLMs as well. But it's nice while it lasts.
  IG_Semmelweiss 6 days ago
  >>>Obviously, the free online LLM business model isn't going to last indefinitely.
  I think this is one of those points where LLMs have already changed the paradigm
  People liked being able to search, but would not pay for it. For many queries , the value wasnt there: users still had to scroll thru pages or tinker for the right query for the required result.
  Eventually search turned so much worse by seo spam, that kagi stepped in to fill the void.
  LLMs start from a different direction. The value is clearly there. OpenAI etc still have a ton of paying subscribers.
  I do think eventually some will incorporate ads, but I think innovation has revealed that theres a market -perhaps a substantial one- for fee-based information search with LLMS
  kevindamm 6 days ago
  Ads delivered via LLMs will cost more to distribute, which means higher cost for the businesses purchasing ads, perhaps high enough to deter a lot of smaller ads customers, so I think we'll see an interesting dynamic appear there. Especially if the ad-laden SEO-boosted sites suffer from further enshittification of Search, which has been spinning in its own vicious cycle lately.
- TremendousJudge 6 days ago
  I'm mostly on the same boat as you, but yes, I'd say most people type complete sentences (usually questions) into Google. The results for these types of queries are usually:
  - An autogenerated info card that usually is regurgitating wikipedia.
  - Several SEO blog spam posts that have the exact query as the title and kinda answer it while trying to sell you something. No sources for more info of course, just a link to the store or whatever.
  With this landscape, LLMs for search kinda make sense. In a way we are already 90% of the way there.
- Al-Khwarizmi 6 days ago
  I think the point is not what people type, but what people need... I haven't been typing complete sentences into Google because I know it's not the best way to use Google, but indeed many of the information needs I use google for are very often questions, even if I never typed the question directly but instead tried to find some suitable keywords.
  LLMs are indeed not the right tool for finding opening hours or phone numbers (at least for now), but I actually find them more useful than Google for error messages. I guess this depends a lot on what compiler/program is reporting the errors.
- kevincox 6 days ago
  What I want is the LLM to be the last layer of filtering. Basically I want to take the "regular" search results and ask the LLM if they are actually a good match, and downrank them if now.
  So instead of getting a list of pages that contain that error message the LLM can evaluate which ones are closest to my configuration (are they the same OS, are variable parts of the error message the same or similar to mine, does the page have a solution or at least a discussion, or just one person posting the error with no other info).
  Additionally search engines generally suck at negative phrases. For example I'm looking for some JS library that doesn't depend on React. Current search engines really suck at differentiating between "Foo for React" and "Foo, no React required". LLMs are pretty good at this.
  That being said LLMs are also imperfect, so I wouldn't want the results that they reject to be completely hidden, but they can be ranked much lower. I think the biggest improvement would come from searches with many candidate pages (like looking for some library that fits my needs) rather than rare ones (like that error that has only been mentioned twice on the internet). But lots of times I am just looking for something with a very specific set of requirements that could be evaluated by an LLM (with pretty good accuracy). That can greatly improve my experience compared to getting all textually matching searches and doing the filtering myself.
- bfeynman 6 days ago
  Similar experience. Whenever I have questions where I know what answer should be (i.e. errors, how to do something etc, ELI5) I will use LLMs. When I use google now I almost never use the AI overview because I am almost exclusively trying to discover something that I want from a source. There's also a whole interesting category of search called navigational queries where there is low entropy (and thus ad value) when someone just goes to google to search espn.com, because they will almost always not explore. I do not understand why people would use perplexity or anything over just chatgpt,claude etc unless some price thing - because it occupies this middle latent space of search I don't find useful.
- carlosjobim 6 days ago
  You should use the right tool for the right purpose. If you're searching for a physical place and want opening hours, reviews and photos, it's Google Maps. If you're looking for updates and photos for a physical place, it's Instagram. If it's detailed articles it's Wikipedia. If it's to define or translate a word, right click and "Look up". Or if it's general search, you have Kagi.
  ryukoposting 6 days ago
  My purpose is Search, and I want one right tool for it. I have had one right tool for my entire life - first Google, and then StartPage since ~2017.
  I considered Kagi. I like the concept, and I'm willing to pay for search, but Kagi costs more than I'm willing to pay.
  dartos 6 days ago
  FWIW, kagi uses bing under the hood.
  tombert 6 days ago
  I don't think Kagi uses bing anymore: https://news.ycombinator.com/item?id=36530936
  UDontKnowJack 6 days ago
  Wikipedia? Are you kidding?
  lisper 6 days ago
  What would you suggest instead?
  ricardo81 6 days ago
  wiki is adequate most of the time for proper nouns.
  carlosjobim 6 days ago
  It's a good starting point.
- nprateem 6 days ago
  "I want to write EDM like David Guetta. Suggest where to start, books, etc for someone who understands music theory"
  Good luck wading through whatever Google gives back for that.
  Also I find they do well on error messages in general.
  yoz-y 6 days ago
  The results for this particular query are quite okay actually, a relatively recent reddit thread, a quora page, then some forum and youtube results. Funnily, this very HN post is on the first page of results as well.
- hansvm 6 days ago
  It's another tool. Learn to use it, and learn when it shouldn't be used.
  Google abandoned the ability to use most search qualifiers effectively, so for niche websites I'll get zero results even with perfect exact-string-match queries, even when the site is in Google's index. On the other hand if I only vaguely remember the content and no related words or synonyms, Google is unable to turn my fuzzy feelings into an accurate search result. Plus, your ability to filter the "kind" of site is basically non-existent, making it all but impossible to find information on topics vaguely related to any word vaguely related a product you could potentially buy.
  LLMs hallucinate and have their own set of problems, but that orthogonality makes them very useful situationally.
  Not too long ago I needed to track down the blog Google references internally in the design doc of their TGIF employee voting platform, regarding Wilson scoring (using confidence intervals instead of means/medians/...). That's very easy to do in Google search if you can remember the right keywords from a decade ago (like Wilson scoring), but otherwise it's impossible. Reframing that problem for an LLM, you'd use plain English to describe everything you know, add a bit of flattery to shift the output distribution to that corner of the internet which actually knows what you're asking, potentially add one sentence to stop this last batch of models from wasting their time actually searching the web, and ask for a list of the top 5 authors and blog titles they think might be correct. That gives you a whole new set of search terms to finish your quest (in my case, the right answer was always in that list of 5, no matter how many times I reran the query).
  That property of having to add additional context (e.g., when asking for recipes, I'll describe the background of who the LLM is roleplaying first) to get a good result is annoying. Full sentences with proper punctuation, unfortunately, also help. I wrote a small tool to make it easier for me to keep track of prompts I found useful and execute them with modifications.
  As some other commenters mention, the LLM can help you with XY problems. You're searching for recipes, techniques, nutrition spreadsheets, ..., trying to craft something that meets some set of constraints (e.g., for some hypothetical set of guests you might require: no pork, most dishes have to be vegetarian, most dishes have to be gluten-free, it's fine if cooks all day so long as that isn't active prep time, you'd prefer to make it as tasty as possible while leaning in to cheaper, homestyle cooking, it has to use these red bell peppers I have already, and nutrition doesn't really matter). That's a nontrivial collection of tasks with a proper search engine, almost all of which the LLM is piss-poor at individually, unless you already have a good idea of the kind of dish you want to make. However, if you in two phases ask the LLM to brainstorm a list of 20 meal ideas and then expand on your favorite (that would be a decent time to add any modifications) then that collection of tasks gets done all at once. You _have_ to be able to look at the recipe and decide if it's any good or not, so a beginner probably shouldn't do that, but for everyone else it's a huge time saver.
  I mentioned that Google sucks at filtering the "kind" of website you'd like to visit. LLMs handle that great. Like always, you have to be able to handle hallucinations (in this case, by just going to the results and checking if they're any good), but consider a prompt like the following:
  > The web nowadays has tons of ad-infested, profit-driven, barely legible SEO drivel -- even in the top 100 search results and even from huge sites -- but all the old websites like Sheldon Brown's wealth of bicycle knowledge are still out there. List the top three old-web resources I'd want to read to learn about grafting apple trees.
  When I ran that, I got one commercial result, one journal with a wealth of paywalled information, and one forum with a huge collection of free information about the particular species of Apple I'm interested in.
  For "apple grafting" in particular, Google does just fine (a bit better arguably since the 1st Google result was the 2nd LLM result, and it's the only one I really cared about), but the more commercialized the knowledge you're looking for is the more the LLM shines out in comparison.
greatgib 7 days ago
Don't believe a single second the marketing speech about these 2 engines, they are both total crap trying to gain users with whatever hype subject is around.
Qwant used to pretend being a champion of privacy, a "french made" tech, but with a search engine mostly based on Bing, lobbying with Microsoft against our interests, and with a boss sucking as much public funding as possible to finance luxury HQ locations and lavish cars...
Qwant collapsed and was sold, but now it is just a mediocre ghost product trying to capitalize on a stained branding and on being a French/european alternative.
- hartator 7 days ago
  Qwant is just burning French tax payer money in an effort to "compete".
  jansan 7 days ago
  Is it somehow related to the Quaero search engine project that burnt 400 million € of taxpayer money in the early 2000s?
  https://en.wikipedia.org/wiki/Quaero
  https://www.heise.de/news/400-Millionen-Euro-fuer-europaeisc...
  greatgib 7 days ago
  It is not directly related except by the fact that the dynamic is the same and that it is a repeat of the previous attempt.
  There are 2 clear things that is so common in Europe:
  Politics injecting a shit load of public money thinking that if you give the money you will be able to reproduce American company success and co.
  In the end, the money is wasted for their own interest by big groups, intermediaries, and opportunists. When the thing fails, it is the fault of no one, it was just "too hard" and "maybe the money budget was not enough for such a subject"
  This is totally different to what leads innovation like Google, where you have doers that create something first and when they are able to show or convince that they have breakthrough, money will flow in by itself.
  And at the beginning cash is used for brain and development instead of giving big salaries to top management and political friends.
  The second thing that is usual is the pattern with this kind of projects:
  - corporate sucks all the money
  - responsibility is shared between multiple actors to spread the blame in case of problem.
  - project fail and corporate give up. "Not their fault"
  - one year later the initial hype subject is back on the table (European search engine sovereignty for ex) and politics announce that they will spend that much more money to resolve it
  - same corporate vampires starts again from zero...
  - and it fails the same in a loop
  Rinzler89 7 days ago
  >- and it fails the same in a loop
  Because that's the desired feature, not a bug. The system works just as intended. None of the people at the helm actually believe they can create competitors to US giants, the scope is to funnel public tax money into the right politically connected private industry pockets.
  It's just wealth redistribution with a veneer of "sovereignty", similar to large infrastructure projects, except a lot more profitable since more people understand physical infrastructure so it can more easily be scrutinized for corruption, but almost nobody understands IT infrastructure, so it can easily gamed as a bottomless pit for your tax euros that constantly fails in a loop while you socialize the losses and privatize the winnings.
  Y_Y 6 days ago
  And usually at the end of a public infrastructure project you have something of some benefit to the public, even if there's bad RoI. With tech stuff that goes nowhere at best you have a jobs programme and short-term PR for some politicians.
  The fervent wish to only let deserving people get grants results in a huge amount of box-ticking and self-promotion that (in my experience) seems to select for self-promotion parasites rather than people who want to make useful products and get rich from customers rather than government funds.
  tartoran 6 days ago
  Yep. Reminds me of how the internet started.
  carlosjobim 6 days ago
  Exactly. The worst thing that can happen for a project like this is that it becomes successful. The goal is that it fails completely, so that politicians and friends can try again in a couple of years with a massive new load of tax payer money. What about the old projects, why did they fail? Those are questions that only traitors and Russian assets would ask.
  t_mann 6 days ago
  Tbf, it's not like this approach can't work at all. Airbus was born out of a similar dynamic, and it's giving Boeing a run for its money now. Afaik France has a couple of other giants in technology-heavy industries such as shipping or mining, but I couldn't speak to their success. What seems clear by now, though, is that the approach isn't suitable for "tech" (in the typical SV sense of the word), especially consumer-centric.
  tempusalaria 6 days ago
  Airbus was a company setup by consolidating companies controlled by some of the most powerful countries in the world, which sold planes to captive state airlines and militaries controlled by those same governments and their allies.
  What an insane comparison.
  Rinzler89 6 days ago
  >Tbf, it's not like this approach can't work at all. Airbus was born out of a similar dynamic, and it's giving Boeing a run for its money now.
  It literally can't work at all. When was the last time you went and bought an Airbus? Airbus doesn't make consumer products. Passengers are the consumers flying inside them but they're not the ones buying them, it's the airlines who only have a monopoly of 2 global players to choose from in a highly regulated industry with expensive moats to enter meaning Airbus and Boeing don't really need to compete cut-throat.
  Governments excel at building large infrastructure and defense companies like Airbus, Boeing, what have you, not at building consumer products at scale like Google, Apple, etc sine their success is dictated by the consumer spending preferences, not by requirements a government makes up.
  Communist regimes did not make the best consumer products, the free market did.
  ethbr1 6 days ago
  > Politics injecting a shit load of public money thinking that if you give the money you will be able to reproduce American company success
  The problem with public funding is not the what but the to whom?
  It works fine if you put the right person in charge.
  However, there are very few signals to prevent the wrong person being put in charge, as it removes most considerations / incentivizes private industry uses. Which themselves are already tenuous!
  initplus 6 days ago
  It's not just that there are few signals to prevent the wrong person being put in charge, but this kind of government bureaucratic actively selects for the wrong person. These kinds of government IT projects are often soul sucking to work on, and so they attract a specific kind of applicant.
  a1o 6 days ago
  It's not only that, they would prefer to give a ton of money to a single entity with "market experience" and all the paperwork that looks good already existing. It would work better if it was lots of small amounts of money to individuals with good "business" (think about workplan, expected value and other stuff) but without paperwork - think a kid that just finished studies or a person that acquired experience freelancing and has an interesting business idea but not the money to have the paperwork. Anyway, at least it's how I feel towards the SV "tech".
  n_ary 6 days ago
  Sadly, our EU kids are more interested in being the next social media influencer or going on a world trip than grounding a business, because the paperwork and legal burden will age them faster than someone on drugs and will likely get them in hefty fine to bankruptcy due to that one arcane reporting requirement they missed about that €20/- to the tax office.
  EU needs something akin to Stripe Atlas, but that is not what the politicians want because they want EU to be manufacturing industry only, you can always import tech from other places … <shaking head emoji here>
- n_ary 7 days ago
  I used Qwant a lot in its early phase, but then they decided to become yahoo-esque and also removed the lite version, hence I went back to ddg and brave.
  cookie_monsta 7 days ago
  What does yahoo-esque mean in this context?
  n_ary 6 days ago
  Before they were lite, now the page is fairly bloated and feels like I am in early days or yahoo/msn instead of google. Sadly same is transpiring in other Bing wrappers(see ecosia-dot-org), no high hopes.
- boohoo123 6 days ago
  thats usually the case with companies that spout stupid buzz words like this. You know whats not green? A data warehouse storing all those indexes, but that think they can gain followers with their buzz words.
arnaudsm 7 days ago
Every new search engine I've seen was a Bing wrapper with sometimes light reranking.
I understand that competing with Google was borderline impossible a decade ago. But in 2024, we have cheap compute, great OSS distributed DBs, powerful new vector search tech. Amateur search engines like Marginalia even run on consumer hardware. CommonCrawl text-only is ~100TB, and can fit on my home server.
Why is no company building their own search engine from scratch?
- jasode 7 days ago
  >competing with Google was borderline impossible a decade ago. But in 2024, we have cheap compute, great OSS distributed DBs, powerful new vector search tech. [...] CommonCrawl text-only is ~100TB,
  Those example 3 bullet points of today's improved 2024 computing power you list isn't even enough to process Google's scale 14 years ago in 2010 when the search index was 100+ petabytes: https://googleblog.blogspot.com/2010/06/our-new-search-index...
  A modest homegrown tech stack of 2024 can maybe compete with a smaller Google circa ~1998 but that thought experiment is handicapping Google's current state-of-the-art. Instead, we have to compare OSS-today vs Google-today. There's still a big delta gap between the 2024 OSS tech stack and Google's internal 2024 tech stack.
  E.g. for all the billions Microsoft spent on Bing, there are still some queries that I noticed Google was better at. Google found more pages of obscure people I was researching (obituaries, etc). But Bing had the edge when I was looking up various court cases with docket #s. The internet is now so big that even billion dollar search engines can't get to all of it. Each has blindspots. I have to use both search engines every single day.
  arnaudsm 7 days ago
  I was talking about text-only, filtered and deduped content.
  Most of Google's 100PB is picture and video. Filtering the spam and deduping the content helped Google reduce the ~50B page index in 2012 to ~<10B today.
  the-rc 7 days ago
  Where are these figures coming from?
  Larrikin 7 days ago
  But what if I don't want to search Reddit, stack overflow, and blogs from the early 2000s and all the content you just threw away as irrelevant actually contains information I am looking for. There is an entire working generation that never heard a modem sound and has never even made a consideration for making sure their content is accessible in plaintext.
  I'm sure all the LLM providers are already considering this, but there's so much important information that is locked away in videos and pictures that isn't even obvious from a transcript or description.
  graemep 7 days ago
  There is still large opportunity. Most of my searches are for plain text information.
  > But what if I don't want to search Reddit, stack overflow, and blogs from the early 2000s
  That is a strawman. There are huge numbers of websites (including authoritative ones like governments and universities) and a lot of content.
  > There is an entire working generation that never heard a modem sound and has never even made a consideration for making sure their content is accessible in plaintext.
  If they want video they will do the same as everyone else and search Youtube. Different niche.
  > I'm sure all the LLM providers are already considering this, but there's so much important information that is locked away in videos and pictures that isn't even obvious from a transcript or description.
  That is true, but if you are getting bad search results (and the market for other search engines are people who are not happy with Google and Bing results) that does not help much are you are not seeing the information you want anyway.
  flir 6 days ago
  > That is a strawman. There are huge numbers of websites (including authoritative ones like governments and universities) and a lot of content.
  Ya know... a search engine that was limited to *.gov, *.edu and country equivalents (*.ac.uk, etc) would actually be pretty useful. Ok, I know you can do something like it with site: modifiers in the search, but if you know from the beginning you're never going to search the commercial internet you can bake that assumption into the design of your search engine in interesting ways.
  And the spam problem goes away.
  Hmm.
  ipaddr 7 days ago
  That explains why you were 10 times more likely to find something 15-20 years ago then you are today. They reduced the size by dropping a lot of sites and not crawling as much. We expect google to be at 100PB x 100 with the growth of users and content over that time period. Someone made the decision to prioritize a smaller size over a more complete index and some A/B test was run and turned out well.
  undefined 7 days ago
  [deleted]
  echelon 7 days ago
  Just serving up content from Reddit and HN and a few other websites would be enough to beat Google for most of us. Sprinkle in the top 100 websites and you have a legitimate contender.
  There is no open web anymore. Google killed it. There are probably fewer than 100k useful websites in the world now. Which is good for startups, because the problem is entirely tractable.
  kristopolous 7 days ago
  It's all very regional. Despite its namesake, the world-wide-web is aggressively local. Some properties are global but after a handful, it's all country/language/region based.
  No matter what type of market analysis I do, I almost invariably find there's something different that say, the Koreans or the Europeans are using. The Yelp of Japan is Tabelog, the Ubereats of the UK is deliveroo, the facebook of russia is vk.ru etc.
  That's really the beach head to capture - figure out what a "web region" is for a number of query use-cases and break in there.
  lukas099 7 days ago
  Search engines only searching the top 100 websites is like, the opposite of the way I want things to go...
  zeagle 7 days ago
  Reddit is a good example of a company that is territorial about it's content being indexed or scrapped. I can't even access it via most of my VPN provider's servers anymore due to them blocking requests.
  flir 6 days ago
  I dunno. Everybody's got their own personal long tail, and everybody's got a different long tail.
  (I don't think the open web is dead, but it's looking awfully unwell).
  Theodores 7 days ago
  I liken Google Search and YouTube to how Blockbusters video rental stores used to operate.
  If you went into Blockbusters then there was actually a small subset of the available videos to rent. Films that had been around for decades were not on the shelves yet garbage released very recently would be there in abundance. If you had an interest in film and, say, wanted to watch everything by Alfred Hitchcock, there would not be a copy of 'The Birds' there for you.
  Or another analogy would be a big toy shop. If you grew up in a small town then the toy shop would not stock every LEGO set. You would expect the big toy shop in the big city would have the whole range, but, if you went there, you would just find what the small toy shop had but piled high, the full range still not available.
  Record shops were the worst for this. The promise of Virgin Megastore and their like was always a bit of a let down with the local, independently owned record shop somehow having more product than the massive record shop.
  Google is a bit like this with information. Youtube is even worse. I cottoned on to this with some testing on other people's devices. Not having Apple products, I wanted to test on old iPads, Macbooks and phones. For this I needed a little bit of help from neighbours and relatives. I already knew I had a bug to workaround, and that there was a tutorial on Youtube I needed to do a quick fix so I could test everything else. So this meant I had to open Youtube on different devices owned by very different people, with their logged in account.
  I was very surprised to see that we all had very similar recommendations to what I could expect. I thought the elderly lady downstairs and my sister would have very different recommendations to myself, but they did not. I am sure the adverts would have been different, but I was only there to find a particular tutorial and not be nosy.
  I am sure that Google have all this stuff cached at the 'edge', wherever the local copper meets the fibre optic. It is a model a bit like Blockbusters, but where you can get anything on special request, much like how you can order a book from a library for them to get it out of storage for you.
  The logical conclusion of this is to have Google text search becoming more like an encyclopedia and dictionary of old, where 90% of what you want can be looked up in a relatively small body of words. I am okay with this, but I still want the special requests. There was merit in old-school Alta Vista searches where you could do what amounts to database queries with logical 'and's 'or's and the like.
  The web was written in a very unstructured way, with WYSIWYG being the starting point, with nobody using content sectioning elements to scope headings to words. This mess suits Google as they can gatekeep search, since you need them to navigate a 'sea of divs'.
  Really a nation such as France with a language to keep need to make information a public good with content structured and information indexed as a government priority. This immediately screams 'big brother', but it does not have to be like that. Google are not there to serve the customer, they only care about profits. They are not the defenders of democracy and free speech.
  If a country such as France or even a country such as Sweden gets their act together and indexes their stuff in their language as a public good, they can export that knowhow to other language groups. It is ludicrous that we are leaving this up to the free market.
  WalterBright 7 days ago
  > It is ludicrous that we are leaving this up to the free market.
  If you leave it up to the government, inevitably you're going to get only information approved by the people in power in that government.
  You could call that search engine "Pravda".
  mapt 7 days ago
  You don't have to match Google's technical prowess if that capability is being superceded by MBAs doing aggressive enshittification.
  munk-a 7 days ago
  You'd be surprised how long it takes to enshittify a piece of tech as well established as Google. The MBAs may be trying but there are still a lot of dedicated folks deep in the org holding out.
  echelon 7 days ago
  It's funny. I usually can't tell that from the quality of the search results.
  canadianfella 7 days ago
  Compared to what?
  klez 6 days ago
  Compared to Google, X years ago, for example. Unless I'm mixing threads up, that's what we're talking about anyway: the degradation of Google's search results.
  kbolino 6 days ago
  Yeah, the problem is that Google today is still generally better than its competitors today, even though Google today is worse than Google yesterday.
  gkbrk 6 days ago
  Result quality of Kagi is miles ahead of Google
  tinodb 5 days ago
  Indeed
  gregw134 6 days ago
  The article you linked doesn't say anything about 100 petabytes
  jasode 6 days ago
  >The article you linked doesn't say anything about 100 petabytes
  Excerpt from the article: >[...] Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. [...]
  Your comment did make me pause and sanity check the math: https://www.google.com/search?q=%22100+million+gigabytes%22
  In any case, a lot of people translated "100 million gigabytes" to "100 petabytes" based on that blog : https://www.google.com/search?q=google+search+index+estimate...
  What's the current best estimate of its size now in 2024?
- weitendorf 7 days ago
  To properly compete with Google search you have to:
  1. Create your own index/crawl and true search engine on top of it, rather than delegating that back to bing or Google. In doing so, solve or address all the related problems like SEO/spam, and cover the “long tail” (expensive part)
  2. Monetize it somehow to afford all of the personnel/hardware/software costs. Probably with your own version of Google’s ad products.
  3. Solve user acquisition. Note that Google’s user acquisition process involves multiB/yr contracts with Apple and huge deals with other vendors who control the “top of funnel” like Mozilla Firefox, and that this is the sole purpose of chrome/android/chromebook/etc who you’ll never be able to make a deal with. You will probably at the very minimum need to implement your own platform (device, OS, browser alone probably won’t cut it).
  4. Solve the bootstrap problem of getting enough users for people to care about letting you index their site, without initially being able to index a lot of sites, because you are not important enough.
  5. Somehow pull all of this off in plain site of Google (this would take many years to build both technically and in terms of users) without them being able to properly fend you off
  6. Somehow pull all of this off in spite of the web/web search seeming like it’s going to die off or fade into irrelevance
  OR you can dedicate a decade of your life to something more likely to succeed.
  WalterBright 7 days ago
  I've heard more or less the same thing said about IBM, Walmart, Microsoft, etc.
  There is one way to compete with Google search. Google search is a general search engine. But what if you only wanted medical information? information about cars? information about physics? electronics? history? etc.
  Specializing makes for a much smaller scope of the problem, and being specialized means it could deliver more useful results.
  For example, imdb.com. I don't ask google about movie stuff, I go to imdb.com because it specializes in movie info.
  MichaelZuo 6 days ago
  There are already extensive databases with search functions in most industries…?
  Some of them charge a lot for access but they certainly exist.
  bool3max 6 days ago
  I never go to IMDb directly as both it and its search function are noticeably slower than Google’s. So I just tend to Google “<movie name> IMDb” and click the first result.
  arrakark 7 days ago
  I'd like to add to your #5. If Google deems you a legitimate threat, then they can just de-crapify their own search for a bit by going back to their old algo. It's extremely easy for them to fight back.
  Maakuth 7 days ago
  Is that true though? I think a bit part of the algorithm evolution is to combat undesirable forms of SEO. I would rolling back the algorithm to some older generation would make the results quite spammy.
  undefined 6 days ago
  [deleted]
  johnisgood 6 days ago
  That would be a victory for users.
  teddyh 6 days ago
  A temporary reprieve, more like. Google will just wait for the competitor to fail and then go back to their old ways.
  johnisgood 6 days ago
  Yeah, that is what I meant. :(
  dr_kiszonka 7 days ago
  I often wonder if banning (or very strongly penalizing) websites with affiliate links would largely get rid of spam results. Sure, some very useful websites have affiliate links, but perhaps it would be worth it to at least let users hide them?
  ricardo81 7 days ago
  I don't think it's a solution. The average person seems to already have a strong dislike of ads, or paying for something that would otherwise be free (likely funded by ads or aff links)
  A solution could be more than one algorithm being used to rank results, i.e. other engines, other rules. They'll likely use many of the signals available to them that Google uses for quality and relevance, but highly unlikely a genuine alternative search would rank them exactly the same- and much more unlikely an SEO could rank well in multiple engines.
  The aff links aren't the problem, it's the proliferation of pages that are created solely to rank and get the links clicked on. Sometimes the content is useful, sometimes it's padded nonsense.
  haccount 7 days ago
  The solution is to force feed all affiliate linked sites into a living archive LLM that digests them into a summary stripped of all links.
  You run this bloated mass as a co-engine to your search. If you stumble upon any of the digested sites or articles you just have the site-eater blob regurgitate a html re-creation locally. They get no traffic, you get whatever they wrote purged of all links and with optional formatting to remove the fluff BS copywrite that many of these sites pad the tiny core of usefulness with.
  carlosjobim 6 days ago
  > You will probably at the very minimum need to implement your own platform (device, OS, browser alone probably won’t cut it).
  What do you mean? Google gained dominance just being a dot com URL. This in a time when competitors were already very well established.
  kbolino 6 days ago
  Google got big when being a dot-com URL meant something and the primary way the average person accessed the Internet was through a desktop PC. Neither of those things is true anymore.
  carlosjobim 6 days ago
  That's why you got to have an app. People install more apps than ever before, it's completely normal.
  kbolino 5 days ago
  I don't think people associate search with an app. Search is something built in to the operating system or browser (a distinction which is disappearing too).
  carlosjobim 5 days ago
  People use apps more than they use a browser. It's not a big deal to reform search to be an app, it will be natural. People listen to podcasts without knowing anything about mp3 files or file systems, they edit and upload their photos without knowing about jpegs, etc.
  gregw134 6 days ago
  Imo you need to do all of this, plus be compelling different. Nobody is going to beat Google playing Google's game.
  Yeul 6 days ago
  The Chinese did it. Nationalism is a powerful force.
  nprateem 6 days ago
  You missed out the only thing that's actually important: give users a compelling reason to use your site over Google.
  Google's dominance will never be upended by an incremental improvement.
  n_ary 6 days ago
  How do you let the 99.999% of the population know that your site is better? I have seen friends and family now directly search in FB, if it does not return anything satisfactory, they open the very default browser that came with their device and type the query in the address bar, now whatever the default search engine is on their device that came preset gets the traffic.
  Only people I saw manually typing the url are my office colleagues(mostly engineers of some sort), even management does the same(skipping the FB part obviously) and you can see it when they are sharing their screen and they want to search for something.
  To pull people out of google moat, first step is becoming their default search engine.
  nprateem 5 days ago
  This is backwards. The first step is innovating and creating a radically better search experience. This is then easy to market (and is likely to spread by word of mouth). Users love and it becomes the main way they search.
  Eg see how chatgpt grew
- pseudocomposer 7 days ago
  I’m my second year into Kagi and loving it. I actually just upgraded to get Kagi Assistant (basically, cloud access to every LLM out there). But the search alone is worth every penny, and it’s built/operated fully in-house as far as I know.
  https://kagi.com
  rurp 7 days ago
  They are highly dependent on outside search engines. Someone from Kagi gave an explanation on HN of their search costs and why they can't go lower, and calling the Google API on many (most?) search queries was a major driver of their costs.
  It's great that they are developing their own index, but I'm skeptical that it makes up more than a tiny fraction of what they can get from Google/Bing. DDG has been making similar claims for years but are still heavily reliant on Bing.
  This isn't to knock on upstart search engines. I think that Google Search has declined massively over the past 5-10 years and I rarely use it. More competition is sorely needed, but we should be be clear eyed about the landscape.
  w10-1 7 days ago
  Being dependent as a fall-back seems like that's the right solution: Provide your own index, but vector in the default if you don't get it right.
  That way users get tailored search without losing scale.
  malfist 7 days ago
  As much as I like kagi and wish it success, it's not a search engine from scratch. Kagi uses other search engines (Google and Bing) wraps them and does a light reranking
  bayindirh 7 days ago
  Kagi is also building their own index at the background, and mixes these indexes as you search.
  When I search Kagi for "Hacker News", results start with this fine text:
  65 relevant results in 1.09s. 47% unique Kagi results.
  So, other indexes are fillers for Kagi's own index. They can't target their bots to places, because they don't have the users' search history. They can only organically grow and process what they indexed.
  jfengel 7 days ago
  How is it possible that a search for "Hacker News" produces only 65 results? There are thousands of pages out there with that exact phrase on it (including many sub-pages of this site).
  The first result is almost assuredly the right one, but either they're ruling out a lot of pages as not-what-you-meant, or their index is really small.
  bayindirh 7 days ago
  It's another feature of Kagi. They know they have thousands of results, but they provide you a single page of most relevant results. If you want to see more because you exhausted the page, there's a "more results" button at the bottom.
  Kagi reduces mental load by default, and this is a good thing.
  heraclius1729 7 days ago
  That's an interesting positive spin on what is clearly a cost-cutting feature. I'm on the unlimited search tier, so it hasn't really been a big deal, but it's worth noting that clicking the "more results" button charges your account for an additional search too. Or at least it used to.
  bayindirh 7 days ago
  I'm also on the unlimited search tier, and honestly never needed to reach for the more results button even once. Kagi always delivered.
  Having a low number of results has a net benefit of lower cognitive load for me, so I like how Kagi returns less results, not more.
  But "more results" is counted towards a new search, and I didn't know that. Thanks for pointing out.
  ezekiel68 7 days ago
  Key differentiator: Kagi still properly responds to the negation sign and quotes in your search terms. This is why I pay for it. The signal-to-noise ratio is way higher than with other engines.
  Zambyte 7 days ago
  Light reranking is a huge understatement, given the amount of preferences they let you manually set for ranking.
  chabad360 7 days ago
  No they do not, they have their own indexes.
  https://help.kagi.com/kagi/search-details/search-sources.htm...
  rkangel 7 days ago
  That page includes the text "Our search results also include anonymized API calls to all major search result providers worldwide".
  They source results from lots of places including Google. One way that you can confirm this is to search for something that only appears in a recent Reddit post. Google has done a deal with Reddit that they're the only company allowed to index Reddit since the summer.
  DuckDuckGo gets no answers if you specify only results from the last week: https://duckduckgo.com/?q=caleb+williams+site%253Areddit.com...
  Kagi is fine if you do the same: https://kagi.com/search?q=caleb+williams+site%3Areddit.com&d...
  edit: I don't think this is a bad thing for Kagi. I'm a very happy subscriber, and it's nice for me that I still get results from Reddit. They're very useful!
  immibis 7 days ago
  Note that due to adversarial interoperability, search engines other than Google can scrape Reddit if they try hard enough. A rotating residential proxy subscription, while pricey, likely still costs orders of magnitude less than what Google paid. The same goes for Stack Overflow. You can also DIY by getting a handful of SIM cards. CGNAT, usually a scourge, works in your favour for this application since Reddit can't tell the difference between you loading 10000 pages and 10000 people on your ISP loading one page each (depending on the ISP)
  undefined 7 days ago
  [deleted]
  catlikesshrimp 7 days ago
  I loved kagi, but their cost is prohibitive for me.
  The only way there is a chance for me to afford Kagi might be to buy "search credit" without a subscription and without minimum consumption. And then it would only be good if they allowed more than 1000 domain rules and showed more results (when available)
  usr1106 7 days ago
  The search credit model sounds great. I would probably also pay Youtube credit. I use it too rarely that paying their monthly rate makes sense. The user experience with ads sucks, so I further try to reduce my usage. At least that's good for the environment and I can do more useful things.
  mambru 7 days ago
  Something like this? https://athenut.com/
  carlosjobim 6 days ago
  If $10 is too much to afford, the problem is not really Kagi. You need to improve your economic situation urgently, because that is starvation level poverty. Seeing that you're posting on a message board like this, I assume you have at least average talent and capabilities to have a paid occupation.
- 1vuio0pswjnm7 7 days ago
  "CommonCrawl [being] text-only is only ~100TB, and can fit on my home server."
  Are any individual users downloading CC for potential use in the future?
  It may seem like a non-trivial task to process ~100TB at home today but in the future processing this amount of data will likely seem trivial. CC data is available for download to anyone but, to me, it appears only so-called "tech" companies and "researchers" are grabbing a copy.
  Many years ago I began storing copies of the publicly-available com. and net. zonefiles from Verisign. At the time it was infeasible for me to try to serve multi-GB zonefiles on the local network at home. Today, it's feasible. And in the future it will be even easier.
  NB. I am not employed by a so-called "tech" company. I store this data for personal, non-commercial use.
  1024core 7 days ago
  Is Common Crawl updated frequently?
  astonex 7 days ago
  They do monthly-ish releases https://index.commoncrawl.org/
- ColinHayhurst 7 days ago
  Mostly because Google bought, developed, acquired or effectively control all the major distribution points with default placement deals: eg Apple, Samsung, Chrome, Android, Firefox. In time remedies are coming though, in the antitrust case lost versus DoJ.
  Another major factor is that building a search index and algorithms that searches across billions of pages with good enough latency is very hard. Easy enough for 10s of millions scale search but a different challenge for billions.
  Some claim(ed) click-query data is needed at scale, and are hoping for that remedy. Our take is what is the point of replicating Google. Anyway, will this data be free or low cost? You know the answer.
  Cloud infrastructure is very expensive. We save massively on costs by building our own servers, but that means capital outlay.
  Eisenstein 7 days ago
  Remember that MS lost the anti-trust too, and after the presidential transfer in 2000, they pretty much dropped it with few consequences for MS.
  gary_0 7 days ago
  If I had to guess I'd say history is going to repeat itself and Google will also escape any significant consequences.
  immibis 6 days ago
  Maybe, maybe not. Remember that Google is "woke" - an enemy of the faction that now finds itself in power. They might continue the lawsuit to set an example.
- unionpivo 7 days ago
  Because nowdays more than ever content you need is in silos.
  Your facebooks/twiters/instagram/stack overflow/reddit ... And they all have limited expensive api's, and have bulk scrapping detection. Sure you can clobber together something that will work for a while, but you can't runn a buissness on that.
  Aditionaly most paywalled sites (like news) explicitly whitlist google and bing, and if someone cretes new site, they do the same. As an upstart you would have to reach out to them to get them to whitelist you. and you would need to do it not only in USA but globaly.
  Anothe problem is cloudflare and other cdns/web firewalls, so even trying to index mom and pops blog site could be problematic. An d most of the mom and pop blogs are nowdays on som ploging platform that is just another silo.
  Now that i think about it, cloudflare might be in a good position to do it.
  The AI hype and scraping for content to feed the models have increased dificulty for anyone new to start new index.
  arnaudsm 7 days ago
  This is the best (and saddest) answer. LLMs break the social contract of the internet, we're in a feudalisation process.
  The decentralized nature of the internet was amazing for businesses, and monopolization could ruin the space and slow innovation down significantly.
  ColinHayhurst 7 days ago
  > LLMs break the social contract of the internet
  The legal concept of fair usage has and is being challenged, and will best tested in court. Is the Golden Age of Fair Use Over? Maybe [0].
  [0] https://blog.mojeek.com/2024/05/is-the-golden-age-of-fair-us...
  tensor 7 days ago
  While LLMs have accelerated, it, it was already the case that silos were blocking non-Google and non-Bing results before LLMs. LLMs have only made existing problems of the web worse, but they were problems before LLMs too and banning LLMs won't fix the core issues of silos and misinformation.
  immibis 6 days ago
  You're thinking too much by the rules. You can absolutely scrape them anyway. Probably the biggest relevant factor is CGNAT and other technologies that make you blend in with a crowd. If I run a scraper on my cellphone hotspot, the site can't block me without blocking a quarter of all cellphones in the country.
  If the site is less aggressively blocking but only has a per-IP rate limit, buy a subscription to one of those VPNs (it doesn't matter if they're "actually secure" or not - you can borrow their IP addresses either way). If the site is extremely aggressive, you can outsource to the slightly grey market for residential proxy services - for fifty cents to several dollars per gigabyte, so make sure that fits in your business plan.
  There's an upper bound to a website's aggressiveness in blocking, before they lose all their users, which tops out below how aggressive you can be in buying a whole bunch of SIM cards, pointing a directional antenna at McDonald's, or staying a night at every hotel in the area to learn their wi-fi passwords.
  unionpivo 6 days ago
  > You're thinking too much by the rules. You can absolutely scrape them anyway. Probably the biggest relevant factor is CGNAT and other technologies that make you blend in with a crowd. If I run a scraper on my cellphone hotspot, the site can't block me without blocking a quarter of all cellphones in the country.
  I am familiar with most of that, and there is a BIG difference between trying to find a workaround for one site, that you scrape ocasionaly, than to to find workaround for all of the sites.
  Big sites will definitely put entire ISP's behind annoying capachas that are designed to stop exactly this (if you ever wonder why you sometimes get capatchas that seem slow to load, have long animations, or other annoying slow things, that is why etc.)
  And once you start making enough money to employ all the people you need for doing that consistently, they will find a jurisdiction or 3 where they can sue you.
  Also good luck finding residential/mobile ISP's that will stand by, and not try to throttle you after a while.
  You definitively can get away with doing all of that for a while, but you absolutely can't build sustainable businesses on that.
  immibis 5 days ago
  There are many rationalizations to not try.
  nijave 7 days ago
  And JavaScript/dynamic content. Entrenched search engines have had a long time to optimize scraping for complex sites
- ezekiel68 7 days ago
  Kagi is not one of these and I love it enough to pay for it. It has kept the semi-"advanced" features of respecting the negative sign and emphasizing results that match quoted terms in the search query.
- nijave 7 days ago
  I'm guessing there's no money in it unless you glue an ad machine to the side and use search to drive advertising.
- onli 7 days ago
  Brave search is doing it properly. Bought tech, but still.
  miyuru 7 days ago
  very recently found about brave goggles. amazing way to give control to users, for ex: blocking pinterest or searching domains popular with HN or own list.
  https://search.brave.com/goggles/discover
  Mistletoe 7 days ago
  Did not know about this, this is what I've wanted Google to do for so long. Pinterest, etc. will be banished to the nether realm.
  vishnumohandas 7 days ago
  > Bought tech
  Could you please elaborate?
  onli 7 days ago
  Brave search is a continuation of cliqz. A German company that developed a proper search engine, with an independent index. They shut down, but the tech got sold off.
  Cliqz was the first time for me that a Google alternative actually worked really well - and it, or now brave search, is what parent was asking for :)
  prophesi 7 days ago
  Yet we're still back to Larry Page and Sergey Brin's conclusion in their "The Anatomy of a Large-Scale Hypertextual Web Search Engine" research paper[0]:
  > We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.
  Brave Search Premium hasn't been around nearly as long as their free tier serving ads, and I'm not confident this conflict of interest is gone.
  Having independent indexes is a win regardless though.
  [0] https://snap.stanford.edu/class/cs224w-readings/Brin98Anatom...
  BrendanEich a day ago
  Brave Search Premium launched before we showed any search ads:
  https://x.com/brave/status/1466510541128548362
- thomvaughan 6 days ago
  It's worth noting that building a solid search index needs more than just text. The full Common Crawl archive includes metadata, links, page structure, and other signals that are needed for relevance and ranking, and to date is over 7 PiB, so that's rather more than tends to fit on home servers
- analog31 7 days ago
  It occurs to me that competing with Google on their home turf and at their scale might be impossible. But doing what Google was good at, when they just a search engine, might not be all that much harder today than it was back then. And it may fly "under the radar" of Google's current business priorities.
  flir 6 days ago
  I'm thinking about how Etsy took "the good bit" of Ebay and ran with it. Or how tyre-and-exhaust shops take "the profitable bit" of being a mechanic and run with it. I know there's a name for this, but it eludes me.
  analog31 6 days ago
  It could be they also took the bit that eBay abandoned. There was a point when eBay announced to the business press that they were going to de-emphasize the "America's yard sale" aspect, and shift towards being a regular e-commerce site like Amazon.
- z3ncyberpunk a day ago
  Damn must be nice to be rich enough to be able to casually have 100TB of space for such projects.
- stephantul 7 days ago
  Because it is super expensive and difficult to keep an index up to date. People expect to be able to get current events, and expect search results to be updated in minutes/seconds.
  eitland 7 days ago
  The Swede behind search.marginalia.nu has had a working search engine running at a single desktop class computer in a living room, all programmed and maintained on his spare time, that was so good that in its niches (history, programming, open source comes to mind) it would often outshine Google.
  Back before I found Kagi I used to use it everytime Google failed me.
  So, yes, given he is the only one I know who manages this it isn't trivial.
  But it clearly isn't impossible or that expensive either to run an index of the most useful and interesting parts of internet.
  marginalia_nu 7 days ago
  I think the problem with search is that while it's relatively doable to build something that is competitive in one or a few niches, Google's real sticking power is how broad their offering is.
  Google search has seamless integration with maps, with commercial directories, with translation, with their browser, with youtube, etc.
  Even though there's more than a few queries they leave something to desire, the breadth of queries they can answer is very difficult to approach.
  l72 7 days ago
  This is one of those things that I think is interesting about how "normal" people use the Internet. I am guessing they just always start with google.
  But for me, if I want to look up local restaurants, I go straight to Maps/Yelp/FourSquare(RIP). If I want to look up releases of a band, I go straight to musicbrainz. Info about Metal Band, straight to the Encyclopeadia Metallum. History/Facts, straight to Wikipedia. Recipes, straight to yummly. And so on. I rarely start my search with a general search engine.
  And now with GPT, I doubt I even perform a single search on a general search engine (google, bind, DDG) even once a day.
  alisonatwork 7 days ago
  "Normal" people don't start with Google, not any more. They start with Facebook, Instagram, X, Reddit, Discord, Substack etc. That's exactly the problem, the world-wide web has devolved back into a collection of walled gardens like things were in the BBS era, except now the boards are all run by a handful of Silicon Valley billionaires instead of random nerds in your hometown.
  yazzku 7 days ago
  You have just described how "innovation" is often just a power shift, often one that does not benefit the user. Old is new again, but in different hands; the right hands, of course.
  immibis 6 days ago
  I've heard teenagers today often search first on tiktok. To find restaurants.
  undefined 7 days ago
  [deleted]
  marginalia_nu 7 days ago
  No search engine is refreshing every website every minute. Most websites don't update frequently, and if you poll them more than once every month, your crawler will get blocked incredibly fast.
  The problem of being able to provide fresh results is best solved by having different tiers of indices, one for frequently updating content, and one for slowly updating content with a weekly or monthly cadence.
  You can get a long way by driving he frequently updating index via RSS feeds and social media firehoses to provide singnals for when to fetch new URLs.
  stephantul 7 days ago
  I meant this in response to the parent that Common Crawl only updates every month, which seemed to imply that this was sufficient.
  This is too slow for a lot of the purposes people tend to use search engines for. I agree that you don't need to crawl everything every minute. My previous employer also crawled a large portion of the internet every month, but most of it didn't update between crawls.
  tech234a 7 days ago
  See also: IndexNow [1], a protocol used by Bing, Naver, Yandex, Seznam, and Yep where sites can ping one of these search engines when a page is updated and all others will be immediately notified. Unfortunately it does seem somewhat closed as to requirements for joining as a search engine.
  [1]: https://www.indexnow.org/
  mdaniel 6 days ago
  What year was this thing created, because the /.well-known URI scheme[1] has existed for a long time and is designed for this kind of junk https://www.indexnow.org/documentation#:~:text=Hosting%20a%2...
  1: https://www.iana.org/assignments/well-known-uris/well-known-...
  tech234a 6 days ago
  It was announced October 18, 2021: https://blogs.bing.com/webmaster/october-2021/IndexNow-Insta...
  mdaniel 6 days ago
  The irony of a website from 2 major search engines looking like it was made in the early 2000s doesn't escape me. But, to my original point, there's absolutely no way they were ignorant of well-known URIs
  arnaudsm 7 days ago
  Big fan of your work Viktor, thanks for everything you build and how much you document it
  arnaudsm 7 days ago
  Some sources update faster than others, you could index news sources hourly and low velocity sites weekly. Google does that. CommonCrawl gets 7TB/month, indexing and vectorizing that is quite manageable.
  stephantul 7 days ago
  I think it is more complex than that. Common crawl does not index the whole web every month. So even if you use common crawl and just index it every month, which you could do pretty cheaply admittedly, I don't think that would lead to a good search index.
  Running an index is an extremely profitable business, from multiple points of view (you can literally earn money, but also run ads, you get information you can sell, you can buy mindshare). Everybody is looking for indexes beyond Google and Bing, but there are none. If it really is as easy as indexing common crawl, then I think we'd have more indexes.
  notyourwork 7 days ago
  This reads like the “I could build Dropbox in a weekend”.
  arnaudsm 7 days ago
  Haha yes, but my argument is not about individuals, but about "tech" companies that externalize everything and do not develop tech internally.
  iterance 7 days ago
  It's just files, how hard could it be?
  rwaksmunski 7 days ago
  No that hard, took me 4 weekends to build a private search engine with Common Crawl, Wikipedia and HN as a link authority source. Takes about a week to crunch the data on an old Lenovo workstation with 256gb ram and some storage.
  marban 7 days ago
  For news-only there's https://littleberg.com
- gurgunday 7 days ago
  Brave Search ONLY uses their own index and the results are really good! I highly recommend it.
  HKH2 6 days ago
  Brave image search is not that good though.
- r721 7 days ago
  >Why is no company building their own search engine from scratch?
  Here's one list:
  >A look at search engines with their own indexes
  https://seirdy.one/posts/2021/03/10/search-engines-with-own-...
  HN discussions:
  https://news.ycombinator.com/item?id=26429942
  https://news.ycombinator.com/item?id=31820149
  https://news.ycombinator.com/item?id=40626011
- xbmcuser 7 days ago
  Google ingests almost the whole public web almost every day. I don't see any startup competing with them they might come out with a great algo or something but will need the infrastructure and huge investments to compete.
  Even then after using Google Gemini subscription the last few months I think the problem is not Google search rather the web ecosystem as if google gives me most of the answers without having to click any link you and I might be happy but billion people living off those links directly or indirectly won't be happy.
- whiplash451 7 days ago
  Because “search” in 2024 is much more than a search engine. It takes billions of dollars of investment per year to build something competitive to Google head-on. OpenAI may have a shot with their chat.com
  The only viable option when starting small is to grow in the cracks of Google with features that users really love that Google won’t provide.
- dmafreezone 7 days ago
  Isn’t it obvious? Look around. Hackers have been stamped out even on hacker news. Now it’s all FAANG lifers and MBA or VC types trying their hand at grifting. Nothing good comes of that. Whoever is making the next thing “for real” just gets acquired and shut down.
  Moreover, get out of your echo chamber and you’ll see that for a majority of humanity Google is the internet. You have to supplant the utility, not just the brand. Most businesses cannot handle the latter let alone the former. If you want something to replace Google, you have to think about replacing the internet itself. But not many are that bold.
  lazystar 6 days ago
  > Whoever is making the next thing “for real” just gets acquired and shut down.
  why do you think amazon's trying to lay people off? they want them to create new startups that they can acquire.
- LunaSea 6 days ago
  A single CommonCrawl dump might be 100TB but that represents less than 1% of the Internet. CommonCrawl crawls news parts of the Internet every month / trimester and there is little overlap between crawl dumps.
- gargan 6 days ago
  Even ChatGPT Search was revealed to be a Bing wrapper, this meme summed it up - https://ibb.co/8csc3gv
  Spivak 6 days ago
  Because the hard part isn't the compute, vector dbs, or whatever. It's the huge evergreen index of the whole internet. Getting over the hump of "every site lets your crawler work, gives good results to it, and bypass paywalls" is a massive barrier.
- Animats 7 days ago
  Because most of the good sites won't let you crawl them any more, unless you're Google.
  immibis 6 days ago
  Crawl them anyway. Adversarial interoperability is the name of the game.
- kev009 7 days ago
  There is probably room for one or five lifestyle businesses but convincing venture capital to drop the megabux to go big would be a feat and eventually land at some sub-optimal state anyway.
  Finding some hack to democratize&decentralize the indexing and expensive processes like JavaScript interpretation, image interpretation, OCR, etc is an open angle and even an avenue for "Web3" to offload the cost. But you will ultimately want the core search and index on a tighter knit cluster (many computers physically close to one another for speed of light reasons, although you can have N of these clusters) for performance so it's a hard nut to crack for making something equitable for both the developers and any prospectors and safe from various takeovers and enshitifies. Let us know if you know a way.
  zzo38computer 7 days ago
  I would want to use a search engine that does not perform JavaScript interpretation, image interpretation, OCR, etc. (This is not the same as excluding web pages with JavaScripts from the search results. They would still be included but only indexed by whatever text is available without JavaScripts; if there isn't any such text, then they should be excluded. This would also apply if it is only pictures, video, etc and no text, then they also cannot be indexed, whether or not they have JavaScripts.)
  kev009 6 days ago
  Luckily none of these things are mutually exclusive.
  Thanks to all the SPA idiocy you will miss enough content to matter if you have zero JS interpretation, so you would want to let the user choose which indexes they want for a query because sometimes you need these other resource types to answer the query.
- ColinHayhurst 7 days ago
  We are @mojeek
- throawayonthe 7 days ago
  [dead]
rustcleaner 6 days ago
The best way to stop SEO is to build a bot which detects commercial activity on the page (and/or availability of cart/payment controls), and commercial language in the text of the page (does the content look like an ad or product/sales page). Also combine with number of hops to a commercial page (with cart, prices, advertising material, etc), and create a distance metric. Offer the user a commercial activity slider. Sliding to zero commercial activity should result in zero product pages or SEO spam.
Also important are user provided blacklists and whitelists of domains. Experts-Exchange was a site that called for this on search engines like Google, as Experts-Exchange was universally a cash sticky-trap waiting for searchers in pain to pony up. No, and constantly pasting -site:xyz each time is unacceptable.
ADDENDUM COPY&PASTE:
One blind spot I missed: advertising and telemetry activity are to count strongly in the commercial score.
When I slide the Commercial slider to zero, only web pages which there are no obvious detectable ways for the host or author to make money from showing me the page should be returned.
Bad behavior stems from the drive to make money. Make a search where it is impossible to make money by being a result in the search (especially if a Commercial slider is slid to low or zero).
- kccqzy 6 days ago
  This seems unrealistic. Sliding the commercial slider to zero is unlikely what anyone wants. Even Wikipedia would be hidden due to its incessant thirst for donations. Visiting a nice open source repo that happens to have Patreon or GitHub sponsor? Gone. Visiting the help page for a hardware product I've already purchased? That page probably upsells you on a higher tier product and is gone. Trying to get news? All news sites have ads now and they are gone; the only ones without ads are those paid for by Big Oil or Big Pharma or similar with a hidden agenda to influence you.
  Instead of bending the web into something it is not, have you tried simply visiting a library and getting all information from there?
  beefnugs 6 days ago
  God i detest that attitude: all software sucks because it makes assumptions instead of as many options for the user as possible.
  rustcleaner 6 days ago
  Yes. IMO ideally a good program isn't basic and easy to use for McNormie as it is functionally composable, like groups and algebras.
- xivzgrev 6 days ago
  There’s a simple reason there’s so much SEO spam: Adsense.
  Maximizing ad profit on search traffic necessitates more paid ads, both on search results page and the web pages. They won’t say that directly of course but that’s what we’ve seen play out over last 20 years.
  Vague metrics like “good user experience” get trumped by hard metrics like total $ / search.
  rustcleaner 6 days ago
  One blind spot I missed: advertising and telemetry activity are to count strongly in the commercial score.
  When I slide the Commercial slider to zero, only web pages which there are no obvious detectable ways for the host or author to make money from showing me the page should be returned.
  Bad behavior stems from the drive to make money. Make a search where it is impossible to make money by being a result in the search (especially if a Commercial slider is slid to low or zero).
- doctorpangloss 6 days ago
  > bad behavior stems from the drive to make money
  “The only moral income is my income”
  It’s definitely an interesting POV. A lot of the way Google works comes from the fact that people use Google at the moment they’ve already decided they want to buy something. Versus say Instagram where you might see ads but you haven’t decided to buy something and then use Instagram. It comes down to what people other than you are using search engines for.
  TeMPOraL 6 days ago
  > A lot of the way Google works comes from the fact that people use Google at the moment they’ve already decided they want to buy something.
  It is exactly the point at which returning sponsored links maximally screws the user over.
  littlestymaar 6 days ago
  > A lot of the way Google works comes from the fact that people use Google at the moment they’ve already decided they want to buy something
  You missed a key part of the comment:
  > Offer the user a commercial activity slider. […] if a Commercial slider is slid to low or zero
- pphysch 6 days ago
  > The best way to stop SEO is to build a bot which detects commercial activity on the page (and/or availability of cart/payment controls), and commercial language in the text of the page (does the content look like an ad or product/sales page).
  Thanks for describing the detection algorithm, I will now design my spam sites to defeat it.
  This is how SEO works. There are no silver bullets.
  rustcleaner 6 days ago
  All you can do is poison my results with garbage which won't make you money, which means you are paying out to do that.
  pphysch 3 days ago
  What do you mean? My pages will return obfuscated JS (or WASM) that render into regular-looking commercial features and links, but only if you have applicable headers. So you need a highly sophisticated visual analysis engine, with potentially multiple spoofed Header passes per page, to catch anything. Doesn't sound cheap to scale.
  freeone3000 6 days ago
  I guess you could make a spam site that didn’t have affiliate links, advertisements, or marketing text… but why bother with the spam?
- eddywebs 6 days ago
  That's an excellent point ! I had contemplating a need to build a search engine that only returns results from webpages with no trackers or ads.
- pembrook 6 days ago
  The presence of affiliate links should be the biggest indicator that the information provided is going to be heavily financially incentivized (ie. bad).
  Would be much simpler to calculate and detect. I would be super surprised if Google's algorithm doesn't already know how to detect affiliate links.
  rustcleaner 6 days ago
  I argue ANY money making mechanisms or language need to be detected and contribute to the commercial score.
  When I am looking up baking recipes or paleontology text on T-Rex, that should not be a commercial[izable] activity. Full stop! Commercial activity is like sexual activity in this specific regard: it's unwanted! 98% of my searches are for information that autists would pour their hearts into for free out of love, and that commercial interests will only detract from (such as pirated content and software cracks... lol wut did u xpect hm?). If I could pass a law which bled webmasters and search engines dry in fines until something like this is set up... I would! (Elect me BDFL of the world.)
  homebrewer 6 days ago
  Many high quality youtube videos on e.g. wristwatch repair have lots (sometimes dozens) affiliate links to every tool used, which is just an additional way for the author to make some money. If your search engine filters this out, it's worthless.
  pembrook 6 days ago
  While you might like and trust these youtube creators, incentives drive everything.
  It's downright silly to believe these creators won't behave rationally, given those incentives (eg. lean more and more toward pushing products with the highest affiliate payouts).
  I've observed this myself in pretty much any niche I follow. Every youtuber/creator eventually devolves into an affiliate shill, no matter how honest and high quality their content is at the beginning.
  If you're getting paid when you say something is good, you're not trustworthy. Period. What's funny is we all used to understand this! The internet broke everyone's brain.
ricardo81 7 days ago
Shoutout for Mojeek. Highlights:
- Has its own index
- Had a pro-privacy privacy policy well before DDG existed
- No conflated marketing (DDG wrt it passing on 3 octets of an IP to Bing for local searches, Ecosia for not including Bing's carbon footprint as a meta search engine)
- Has an API
It's a smaller index and has more limited resources, but pretty much the best genuine alternative, and great for finding older resources that have long since been buried by G/B.
- whiplash451 7 days ago
  Great. Now go monetize it.
  ricardo81 6 days ago
  I'm an API customer.
lexlambda 7 days ago
> “We could de-rank results from unethical or unsustainable companies and rank good companies higher,” Kroll says of the eco-minded Ecosia.
Understandable knowing Ecosias goals, but I find it rather concerning their vision of a better search involves deciding what is good and bad.
Ranking by quality (against spam & SEO sites) is fine, but it should be applied equally to all Websites, and not target specific companies.
- iterance 7 days ago
  The measure of success of a search engine is how quickly I leave it with the info I want in hand.
  I too find this a bit strange. Downranking results that would otherwise be naturally highly ranked seems only to inhibit the operation of the search engine.
- TreetopPlace 7 days ago
  Yes, people aren't going to use a search engine that politically skews the results. It will end up as a tiny website for a very narrow niche of person, similar to eg Mastodon.
- bananapub 7 days ago
  > I find it rather concerning their vision of a better search involves deciding what is good and bad
  the entire purpose of a search engine is to do this, you've been grossly confused about the entire space if you think this isn't exactly what everyone is trying to do.
  Nasrudith 7 days ago
  Yes, but it usually isn't "good or bad" in the moral judgement sense. Not to mention the many logical flaws involved with trying to make such prioritization. If I'm searching for say desert eagles refusing to return any gun related content and only giving me birds isn't helping anything or anyone.
  Although come to think of it I'm surprised that I haven't heard of any attempts of fundamentalists to make "moral" search engines that do things like exclude evolution.
  undefined 7 days ago
  [deleted]
bilekas 7 days ago
It shouldn't be too hard to achieve what Google were good at before. Their recent search results for me (last 6-12 months) have been so far removed from what I'm searching it felt like a meme.
Even after rephrasing things, more details, special quotations etc that everyone knows as the 'search tricks' the results are terrible.
- bayindirh 7 days ago
  I'm using Kagi for quite some time. It's invisible to me. I search, get 30ish high quality results per search, and I'm a happy camper. No ads, no seo grafting, nothing.
  Moreover, I can block sites and customize my own search results. This feels good.
  When I first started using Kagi, it felt like leaving a closed building and stepping out to open air.
  eitland 7 days ago
  Almost same here:
  One of the few software tools I care to pay for except Jetbrains.
  Only in practice I almost never use filter or block features, because Kagi does out-of-the-box what we always wanted to do our selves in Google: block spam sites.
  The ranking also seems to be better for some reason somehow.
  The funny thing is it doesn't feel like a step forward, but rather like a step back to Google ca 2009 - 2012 somewhere.
  bilekas 7 days ago
  I’ve heard good things about it actually I must go check it out now. Thanks for reminding me. I don’t mind paying for things that save me time.
- beAbU 7 days ago
  I moved to DDG a couple of years ago, and initially, I found myself often using the `!g` switch, but I honestly can't recall the last time I needed to do that. Only when I'm shopping do I find the goog to be a slightly better tool for finding products sold by niche suppliers.
  I honestly think google's monopoly on search at this time is 100% powered by momentum, there is almost no other reason to use it over something like DDG or hell, even Bing!
  nickpsecurity 7 days ago
  Same here. DDG usually works fine for me. Even more so if using advanced techniques like quotes for keywords and - to remove junk.
  Voultapher 7 days ago
  Basically exactly the same thing here. It used to be that google had the better results, but DDG got a little better and google a lot more shit, so here we are I'm pre-filtering what I look for.
  davemp 6 days ago
  I’ve been on ddg for years too, but I’d say their results quality has dropped as well. I haven’t really been happy with either recently.
ThinkBeat 7 days ago
For me, it seems the direction for search is going towards AI sites. (Gemini, ChatGPT)
Trying to reinvent Google/ Search in 2024 seems a bit like jumping the shark
- hengheng 7 days ago
  Quite the opposite. The part of the crowd that has site:old.reddit.com in their muscle memory has to be the premium end of the search market.
  Sure, garbage tier searches will be done LLM style. But few smart people might be bored with that, and pay for something better.
  hackernewds 7 days ago
  The audacity of power users assuming they're the majority
  Liquix 6 days ago
  Parent comment makes no assumption that they are the majority. But a minority of users (probably hacker types) who search with boolean operators and keywords instead of typing full sentences do indeed represent a portion of the market willing to pay $$$. These users crave a search engine which returns what they searched for, not what the provider's black box algorithm comes up with.
  GNU/Linux as a desktop OS is an example of this market. 95%+ of people will work with Windows or MacOS their whole career which fits their use case perfectly. But the 5% of powerusers who choose Linux gain so much productivity and professional value that it's a thriving ecosystem with plenty of lucrative businesses built around it.
- IAmGraydon 7 days ago
  I agree, and I think it's half and half for me. Many times when I search, my query ends with a question mark. They're questions, looking for a simple answer. Those are the searches I have been going to LLMs for more and more lately. As the LLMs get better and hallucinate less, this is becoming more and more viable. Other searches are looking for a page. Where to buy a particular component. Where to download a particular library. Things like that. For that, Google is still useful. The thing is, I think the first kind of search makes up a huge amount of Google's business. I would hazard a guess that it actually makes up more than half of their search traffic, and the loss of that kind of query is going to seriously damage them. There's also nothing stopping LLMs from eventually giving us answers to the second kind of query either, as they start being able to ingest and incorporate real time data. To me, it seems inevitable that LLMs will kill Google's main source of revenue (search is 57%) and eventually their entire company, as much of the rest of what they do is subsidized by search. They may be too large at this point to adjust to this extinction-level change to the environment.
  eternal_braid 7 days ago
  I would argue that answer seeking queries have lower commercial value, compared to queries about products or services.
- cush 6 days ago
  Completely agree. Search engines are dead to me.
  For general information like recipes, tech stuff, how-to's, etc LLMs blow search engine results out of the water. Even the local/contextual content is better. The new maps feature in ChatGPT is amazing.
  The only case where a search engine beats an LLM for me is when I'm looking for a site where the site itself provides a bespoke experience like shopping, where the engine provides a quicker path to the url I don't yet have in my browser history, like "<Some band> Tour" -> somebandofficial.com
- jay_kyburz 7 days ago
  I want to remake Yahoo!
  I want to browse a catalog of interesting web content.
  Like walking down the isle of an old fashioned newsagent, flicking through the magazines that stand out.
- TRiG_Ireland 7 days ago
  The thing is, often I want articles, opinion pieces, and interesting reading. Sure, sometimes I'm searching for facts or answers to programming questions, but most of the time LLMs are not going to give me what I'm looking for.
cphoover 7 days ago
Why isn’t there a distributed, decentralized or open index that all of these startups can utilize? I understand that these startups are all are focusing in on different problem areas, but doesn’t it make sense to have something like open street maps so that all of these companies can share their compute resources in order to maintain something competitive with the big guys? Or even if it’s not fully decentralized these startups teaming up to build a bigger index for themselves makes a lot of sense to me.
I have no knowledge of this field but something like that would seem seem to make sense.
- MrDrMcCoy 7 days ago
  Yacy is still around. While I wouldn't want to disrupt it's decentralized/p2p nature, I think there's a case to be made for a community-managed central aggregation server to help seed the index at various snapshots. I might even be interested in helping run such a thing.
- ricardo81 7 days ago
  A shared index would surely be nice (Common crawl is perhaps an example of one that could be used) but say you had 10 search engines running from it. One decides a page is very important and updates constantly, so should be fetched every 30 minutes. Another search engine decides a page is spam and doesn't need to be recrawled. There's backend choices that affect the shape and crawl directions of the index.
  Then things like whether the crawler should render the page (Using the end DOM content rather than the original source), does it do any tokenisation of the content, store other metrics etc, or does that need to be done by the end search engines.
  Also there's issues with crawling Reddit, sites behind Cloudflare etc that others have went into more detail on this comment page.
- knadh 7 days ago
  Pretty much exactly what I have been thinking lately. Write about it recently here: https://nadh.in/blog/decentralised-open-indexes/
FredPret 7 days ago
This is somehow not about Perplexity.
Like many, I tried many other search engines, starting with DuckDuckGo way back when. I always ended up Googling (or !g… -ing).
Perplexity is the first one that consistently works for both code questions (what’s this error message) and local questions (where’s my nearest store X and when do they close). Now they just need to speed it up a bit - Google queries are effectively instant.
- PittleyDunkin 7 days ago
  I would strongly prefer a search engine that searches and doesn't attempt to proactively answer questions.
  BLKNSLVR 7 days ago
  I've recently started using Perplexity and find that I've started using it for finding answers to specific questions, where I'll still use traditional search to find "a few sites on which I may be able to cobble together a useful set of information about <topic>".
  So, for me, "search" has fragmented into two separate use cases, one of which is served by the new not-really-a-search-engine.
  ryukoposting 6 days ago
  This is one of the few comments advocating for LLM search use that actually lines up with my experience trying to use it. It's pretty good for a narrow subset of my search queries, but it's totally useless for everything else.
  I gave up on it because "two search tools instead of one" is just a really hard sell for me. When I go to search something, the keystrokes/hand motions are totally intuitive. Having two different patterns was annoying. StartPage works great for me. I have no reason to replace it, much less half-replace it with something that has only proven to be incrementally better at a few tasks. That's a matter of personal preference, though.
  FredPret 7 days ago
  I like that too but the benefit of the answer model is near-zero spam. Perplexity gives an answer with references, effectively combining search with the answer.
- bayindirh 7 days ago
  I'd rather hear the word from the horse's mouth rather than a hallucinating parrot which paraphrases stuff as it "understands/thinks it knows".
mxuribe 7 days ago
This is quite welcomed! I like seeing more competition in such spaces. What would be really interesting is if DDG got involved...and those 3 entities (DDG, Ecosia, and Qwant) worked together. I know DDG is simply a wrapper around Bing...but i think one of these others has a similar Bing back-end...so if someoine who has the mindshare of DDG got into that, i think woul;d make it even more interesting. I know Google is quite the behemoth, but still, would be great to see some shake-ups (assuming of course that the results are good and provide value, etc.).
- bionsystem 7 days ago
  qwant had bing as a backend a few years back iirc.
alabhyajindal 7 days ago
I was hoping this will be about Kagi. A lot of companies that make bold promises of taking on Google end up in the drain 2 years later.
- disqard 7 days ago
  I'm a paying customer of Kagi.
  Kagi itself runs on top of Google. It's certainly worth paying for (IMO), but let's not kid ourselves -- it is *not* a competitor (as in, can replace Google). Without Google, there is no Kagi.
  SushiHippie 6 days ago
  > Without Google, there is no Kagi.
  Google isn't the only search engine they use (they have a small selfmade index, Brave, Marginalia, and others which they don't disclose, and "vertical information" like Wikipedia, Wolfram Alpha, etc..), but probably most of the results are from Google. Without Google there would still be Kagi, but maybe not as good as today.
  alabhyajindal 6 days ago
  I used Kagi for half a year - I loved it!
  I was hoping Kagi got some media coverage and this was an article covering how it's so much better than Google.
  My problem is with companies who start by declaring they are going to take on Google. These kind of companies are nowhere to be found a few years later.
  ezekiel68 7 days ago
  Right. But the higher signal-to-noise ration is so worth it. That's why I pay for Kagi as well.
gitprolinux 2 days ago
I've been trying to take them on with https://BestInternetSearch.com though implementing features including AI interesting facts is something that is done and other innovations coming soon. The problem with taking on giants is that more emphasis is needed for small businesses and their software services such as my search engine.
ElijahLynn 7 days ago
I've been using the chatGPT browser extension for a week now. It replaces Google search for me in Brave.
It was a little odd at first but now I find I won't go back. I love I can ask follow up questions, and it shows the listing results on the right just like a search engine. But it also gives my a such summary on the left, again, with the ability to ask follow up questions.
Anyone trying to build a search engine to provide a Google-like experience is going to fail, in my opinion.
chatGPT search is a game changer. I'll pay $20 month for this forever.
- rajup 7 days ago
  It was the exact opposite for me interestingly. I couldn’t stand chatGPT search beyond a few days, I guess because my search patterns don’t need a bunch of useless words thrown in (locations, very specific keywords that lead to websites etc). The speed is also a big factor. Google search feels almost instantaneous and watching ChatGPT search emit a bunch of tokens before getting to what I need drove me to frustration.
elashri 7 days ago
http://archive.today/qEyTJ
renewiltord 7 days ago
Do you guys remember Cuil? Man that was a whacky ass engine. Just gave you all sorts of comical results.
- fred_is_fred 7 days ago
  For anyone not around the hype around that thing was so over the top. It was a for-sure google killer. The first few days it barely loaded due to the load and when it did, yeah, not great results!
PhasmaFelis 7 days ago
I really hope they can make it work.
We all hate how shitty the big sites have gotten in the name of maximizing profit, but I think it's more insidious than that. It would be one thing if running (say) a news site with decency and integrity was merely less profitable; there would still be people doing it. But I fear that it's actually become impossible to sustain a business like that. The ones that try either die or sell out to survive. (Or are so limited in scope that one person's unpaid part-time labor can sustain them.)
madrox 7 days ago
My hot take on this new era of search engines is that "search is a bug" and even trying to be a search engine is a fool's errand. Search solved a problem of the legacy internet where you wanted information and that information would be on one of a million websites.
If someone is going to disrupt Google, it's because they've cut out the middleman that is search results and simply give you what you're asking for. ChatGPT and Perplexity are doing the best here so far afaik
- nickpsecurity 7 days ago
  Search is still better for getting to specific, existing documents you need. Even the RAG people have been finding that out with hybrid models becoming more popular over time. I also think you can update search indexes more cheaply than further pretraining LLM’s.
  heraclius1729 7 days ago
  Not to mention that the cost per search in terms of compute and energy is so much smaller for web search than for running an LLM. I forget the exact numbers now, but it was orders of magnitude as I recall.
  Search engines are just cheaper to run. I don't know that there's a good, long term model for a free LLM-based search replacement because of how much higher the operating costs are, ad supported or not.
  nickpsecurity 7 days ago
  On top of that, search usually uses CPU instead of GPU. A large infrastructure with CPU’s is easier to reuse for jobs other than search.
  madrox 6 days ago
  These are great reasons why this business will be hard, but given how ChatGPT and Perplexity are making inroads into search traffic, you can't deny it's an experience consumers prefer.
  nickpsecurity 6 days ago
  I agree that there’s interest in it. I found ChatGPT and AI search very convenient in some situations where I used them. Other times they hallucinated. I have no idea, though, what customers prefer until I see large-scale surveys by companies not pushing A.I..
  It could also become a differentiator allowing multiple suppliers. On one hand, you have people doing search for quality results. Other search engines include the AI results. The user could choose between them on a job by job basis or the search provider might, like !G in DDG, allow 3rd-party AI search as an option.
  The bigger problem I have is with scale for the dollar. Search companies with their own indexes already mostly failed. There’s a few using Bing. It’s down to just three or four with their own index. Massive consolidation of power. If GPU’s and AI search cost massively more, wouldn’t that problem further increase?
  madrox 6 days ago
  This is a great point, but I wonder how much of that kind of search intent is part of google's traffic. If that becomes the only reason people use Google I wonder if they'll go the way of Yahoo. Maybe that's hyperbolic, but there was a time when Yahoo's dominance seemed unquestionable (I'm old).
  To be clear, I'm not arguing for everything should be part of a pretrained LLM, but the experience of knowledge searching that ChatGPT and Perplexity provide are pretty superior to Google today (when they work).
  nickpsecurity 6 days ago
  I’ll add that I used to love Yahoo Directory. I couldn’t imagine it doing anything but grow. Sadly, it wasn’t to be.
OscarTheGrinch 6 days ago
Anyone else recently tried to search for a location, even using keyword map, and still not be shown a Google Maps link? I would much rather be shown a map, than whatever BS AI generated dogwater summary they are dishing out instead.
If Google mangement can't see the user value of putting their own useful products in search, what hope is there for the rest of the world's useful information?
daft_pink 7 days ago
Until they actually produce results, does any one actually believe them. I tried a lot of search engines to replace google until finally settling on Kagi.
usemycomputer 7 days ago
Since I don't want to pay for searching, I think that an alternative is to allow search engines to use my computer (one per cent of computer power). The problem is how can I be sure that the use of my computer is not reading my private information?
Another idea is to provide rating for some local products in exchange for searching.
- aniviacat 7 days ago
  I'd love to pay for search, but don't want my searches to be tied to an account.
  I'm still wishing that one day we might get real microtransactions.
- ezekiel68 7 days ago
  Since I can never know that, I pay for search (just offering the alternate practical view. I know many will not wish to do so)
keyboardJones 7 days ago
I found this article much more informative:
https://techcrunch.com/2024/11/11/ecosia-and-qwant-two-europ...
6510 7 days ago
https://www.qwant.com
https://www.ecosia.org
karaterobot 6 days ago
I suspect that Two Upstart Search Engines Are Teaming Up To Hopefully Get Bought by Google would be a more accurate headline.
netcan 7 days ago
If Google search ad business dwindled, the remaining ad-supported tech will all be targeting based.
The adtech parts of ai are going to be weird.
myflash13 7 days ago
Is it just me or does it feel like a lot of Big Tech empires are immovable? There are no winner takes all markets now. Massive incumbents are not replaced by another even more massive startup that gobbles up the market, but rather a collection of specialized alternatives. So Google is not being replaced, but is slowly losing bits and pieces to ChatGPT, Kagi, Perplexity, and others. Facebook/X lose a little to BlueSky, Mastodon, Threads and Telegram. Netflix by a dozen streamers. Etc. The age of massive upheavals is over? I can’t remember a single Big Tech company that went under completely in the last decade, just slowly became “not the only one”.
- Nasrudith 7 days ago
  It went on for far longer than the last decade. The thing is that it takes a long time for a very big company of any sort to die, so it can zombie along for years. Even when private equity notorious is pretty much trying to kill the company.
grizzles 7 days ago
They need to make a browser with new features. That is the route to getting new search users.
llm_nerd 6 days ago
Since ChatGPT added web search augmentation my usage of Google has dropped probably 90%. There are still those times where I hit Google -- usually involving "real world" things like finding a restaurant website and similar type things -- but for a shocking number of my normal run-to-google moments now I use ChatGPT.
NFL scores (live!) and random stats like point differentials or F1 driver / event queries. Of course the dozens of coding questions a day in this era where we all jump languages so frequently we have to keep looking up how to do basic stuff. Cooking instructions, and things like "can I substitute butter for lard". Product documentation stuff like "how can I assign a wheel on my Alesis QX61 to the pan in Logic Pro, and record it live to a track" or "how can I disable a malfunctioning pitch wheel on that controller". Historical and science queries. Language queries.
It has completely changed my usage, very much for the better. Web search provided RAG has been a game changer.
And while Google is outrageously fast, getting results on the screen quickly isn't the end. Then I'm digging through shitty bloated webpages or looking through a hundred Reddit comments filled with misinformation.
nektro 7 days ago
Marginalia and Kagi are the only alternatives worth considering
readingnews 7 days ago
How things change... I recall having a subscription (had one of those friends who always seemed to know what was coming out right before it came out) right when they started publishing, what was that, 1994? It was so cool.
I just viewed the front page, looks like the definition of "internet chum".
But seriously, to upend Google, you are going to need to be the default on what people use, which I think for now is phones.
Another barrier, maybe they need to get away from this: "will require succeeding at home and growing revenue, which largely comes from running ads." So what do you do when everyone runs some ublock-origin thing? We need to figure out a search monetization beyond "feed me weird things I do not want" on the sidebar. Should websites pay? No, wait, then only those with real money are on the web. Should we pay? Now, wait, we have had it for "free" for too long (could be wrong, I might at this stage in the game pay to have an actual search engine, like say google circa-internet 2004, of course the net was a different place, but still).
miyuru 7 days ago
every new search engine i have tried so far, failed to do regional searches properly. for some queries i need regional index without adding my country to the query.
- ricardo81 7 days ago
  Bing and Google do have the benefit of websites defining their market via their webmaster tools sections. And a critical mass of click through metrics, etc.
NoMoreNicksLeft 6 days ago
Without some innovation in business model, who cares? They could be a hundred times as good as Google at its peak, but as long as they remain advertising firms (or become them, as they soon would), what would be the point?
ohstopitu 7 days ago
> Ask the search engine Ecosia about “Paris to Prague” and flight booking websites dominate the results. Ecosia’s CEO Christian Kroll would prefer to present more train options, which he considers better for the environment. But because its results are licensed from Google and Microsoft’s Bing, Ecosia has little control over what’s shown. Kroll is ready for that to change.
While I think Google sucks right now and we need something new, this specific reason is so dumb. Unless I add the word "train" or "air" etc. I would much rather be shown either all options or the one that I care about most (if it's flying, then so be it - the search engine can't and shouldn't try to filter out options FOR me without my consent)
- crestfallen 7 days ago
  Respectfully, the search engine is "allowed" to do what it wants to for its business purpose. In Ecosia's case, that is to prefer environmentally sound modes of travel, sites, or businesses. And that's fine! What it means is that it might not be the search engine for you or for me. And that's fine too!
ezekiel68 7 days ago
Did you ever hear the one about how two competing armies in China cooperated to resist the Japanese Imperial Army during World War II?
lazyeye 6 days ago
Fingers crossed they are able to do some damage to Google. A terrible company filled with awful people.
guywithahat 7 days ago
I would love to see Yandex take English search more seriously. They're great for some things, but their spam filters are way too harsh, my email has been marked as spam since the day I made it and I can't respond to support since the email account is flagged. I have frequent issues with their search as well, I constantly have to fill out their captcha's which never let me through anyways
aurareturn 6 days ago
This is why we shouldn’t over regulate tech. Market forces are strong in search.
- latexr 6 days ago
  It says right on the subtitle (emphasis mine):
  > Generative AI and new rules targeting tech giants are giving Ecosia and Qwant fuel to challenge Google and Microsoft and develop a web index for Europe.
  “New rules” means regulation. “Market forces” kept the incumbents at the top, it is regulation that is giving others a chance.
  aurareturn 6 days ago
  What are the rules?
  latexr 6 days ago
  You can literally do a web search for “new rules for big tech”.
  aurareturn 6 days ago
  Ok, but which rule specifically allows these two search companies to compete?
  latexr 5 days ago
  It’s in the article:
  > But Kroll believes tech advances have made affordable indexing more possible, and new EU regulations limiting the power of gatekeepers such as Google are making it a worthwhile pursuit.
  Which links to another Wired story:
  https://www.wired.com/story/europe-dma-breaking-open-big-tec...
  They’re talking about the Digital Markets Act.
  https://en.wikipedia.org/wiki/Digital_Markets_Act
  Which is also what you’d get inundated by when doing the search I mentioned above. None of this is hard to find. If you want to discuss the article, it’s expected you make a minimum of effort to check what it says.
onetokeoverthe 7 days ago
yandex!!!
khana 7 days ago
[dead]
undefined 7 days ago
[deleted]
telgareith 7 days ago
[flagged]
starlite-5008 7 days ago
[flagged]
- rglullis 7 days ago
  testing? I've been using it for almost 3 years already...
  Diti 7 days ago
  You’re replying to a LLM bot.
  rglullis 7 days ago
  Thank you. Flagged.
pixxel 7 days ago
>“We could de-rank results from unethical or unsustainable companies and rank good companies higher,” Kroll says of the eco-minded Ecosia.
Censorship…for reasons. How’s this any better.
spamdotlol 7 days ago
Google isn't even a competitor in the search space anymore. They've been completely unusable for a decade.
- iamacyborg 7 days ago
  A cursory glance at their market share in the search space clearly says that’s not true.
  For a big site I help run, we’re getting about 8.2x the impressions on Google compared to Bing.
  eitland 7 days ago
  To interpret GP charitably I think they mean that Google is there not because they are a good search engine these days but because of inertia.
  For you as a site owner, Google is the best: it delivers the impressions.
  For a user who wants to search, Google has gone downhill since around 2009 and the only thing that confuse me is why DDG - who initially felt better - chose to run after Google down the path of insisting to give me results for things I didn't ask for.
  (The usual answer is: "It is so much harder than in 2009 and SEO is so crazy these days, that nobody can do it, not even Google", to which I have to point out that even marginalia - run by one Swede - manages to do it in the niches it prioritizes.)
  rurp 7 days ago
  > the only thing that confuse me is why DDG - who initially felt better - chose to run after Google down the path of insisting to give me results for things I didn't ask for.
  I have the same frustration. The killer feature that got me to switch from Google to DDG was that DDG would reliably return results for the search query I entered, long after Google had stopped doing so. Now that they've taken the same path the benefit is much less. Although I suspect this is more due to a change on the part of Bing than a conscious decision from DDG.
- wslh 7 days ago
  It remains a competitor as long as it continues to capture attention (eye-balls), even if its usability has diminished.
  Gualdrapo 7 days ago
  It remains a conmpetitor as long as it continues to be the default search engine in at least two of the most important mainsteam web browsers.
  wslh 7 days ago
  I believe your answer is a subset of what I just said?