It might be time to update the mission statement.
“Our mission is to organize the world’s information and make it universally accessible and useful”
* for us, advertisers and our AI models
My guess is that AI training is the main issue.
Data that you can prove was generated by humans is now exceedingly valuable ...and most of that comes from the days before LLMs. The situation is a bit like how steel manufactured before the nuclear age is valuable.
But why would people train on excerpts from Google Books when the whole books can be downloaded on libgen and such?
copyright reasons?
Both are a copyright violation
Anna's Archive [0]:
> The largest truly open library in human history
Mirrors https://open-slum.org/
Since I pretty much only use Google Books for public domain books, old magazines, and newspapers I haven't noticed any problem with it. Maybe it's not as dead as this person thinks.
This was addressed in the post, I'm sure you just missed it when you read it:
"But a few days ago they removed ALL search functions for any books with previews, which are disproportionately modern books." <emphasis mine>
No the search results went from pretty good to absolute garbage https://bsky.app/profile/adamnemecek.bsky.social/post/3mdbup...
My guess is they detected being scraped and did this as preventive measure.
my guess is that the copyright landscape changed due to AI training, and these publishers won't let Google use that data anymore
The books are still there, it seems like the rankings have changed though.
Thats easy.
Check out library genesis, Anna's archive, and scihub for content.
Piracy isnt theft if buying isnt ownership.
Ironic those doing the most for making information open and accessible are the criminals.
Of course. When it's criminal to make information open and accessible, only criminals will make information open and accessible.
None of these does full text search.
And they are under constant threat by nation states. sci-hub hasn't seen new papers in ages.
Build a local index
My problem is finding references I don't know about.
zlibrary does
I'd wonder if you'd ever consider putting up a downloadable mirror of their full-text search db?
Huh, the search is not amazing but it will have to do. Thanks! Are there others?
The Internet Archive supports full-text search on (AFAIK) its entire scanned book collection, even books that aren't available for borrowing.
This is actually pretty good.
My guess: Text search and indexing is expensive. And you are getting some kind of AI vector search instead.
Which tends to be kind of poop compared to true text search.
Title is: Google has seemingly entirely removed search functionality from most books on Google Books
The change happened on or around Jan 21. Overnight the results went from pretty good to absolute trash.
Here are two screenshots taken on Jan 20 and Jan 23 https://bsky.app/profile/adamnemecek.bsky.social/post/3mdbup...
They don't do full text search anymore esp for copyrighted books. I wonder if this is not a regression but an intent to give them a let up in the AI race.
It isn't obvious why the left results are preferred over the right results.
The left results are contemporary, the right are decades old. That includes editions of the same book --- surely the newer edition is going to be preferred by most readers.
> surely the newer edition is going to be preferred by most readers.
Why? Where different editions exist, the reader will want to know which one they're getting, but they're unlikely to systematically prefer newer editions.
But also, Google Books isn't aimed at "readers". You're not supposed to read books through it. It's aimed at searchers. Searchers are even less likely to prefer newer editions.
I guess. That's not immediately clear to me. However, browsing around on Google Books suggests to me that it is the corpus which changed, not the algorithms.
The corpus is still the same, like searching the name of the book will find it, but the full text search.