« Back20 years of Google Scholarblog.googleSubmitted by thepuppet33r 7 hours ago
  • teruakohatu 4 hours ago

    The best thing, by a long way, that Google Scholar has achieved is denying Elsevier & co a monopoly on academic search.

    In most universities here in New Zealand, articles have to be published in a journal indexed by Elsevier's Scopus. Not in a Scopus-indexed journal, it does not count anymore than a reddit comment. This gives Elsevier tremendous power. But in CS/ML/AI most academics and students turn to Google Scholar first when doing searches.

    • freefaler 4 hours ago

      or turn to sci-hub and annas-arhive :)

      • philipkglass 3 hours ago

        You use Google Scholar to find papers you're interested in, then use sci-hub to actually read them.

        • freefaler 3 hours ago

          indeed... and use Zotero with the correct plugin to download them automagically

          • epcoa 3 hours ago

            sci-hub hasn't been updated in 4 years and the sources for annas-archive like nexus-stc are seriously hit or miss (depends on the field).

            • freefaler 3 hours ago

              Nothing lasts forever, but the model of buying a paper for 40$ from Elsevier isn't much better. Depending on the field there are other sources, but still a hit rate is about 85-90%.

        • teruakohatu 3 hours ago

          Does sci-hub have up to date content these days?

          Having pretty wide journal access through my institution means I don’t need to reach out to sci-hub.

          • epcoa 3 hours ago

            sci-hub proper hasn't been updated since it's indefinite pause in december 2020. Alternatives are of variable success depending on field. It might be better for CS/Math, but medicine and life sciences it's pretty bad.

            • whimsicalism 3 hours ago

              i believe they paused due to an indian court injunction and the case was heard this year, does anyone know any update?

          • whimsicalism 3 hours ago

            scihub is dying unfortunately :( the good news is it is happening just as all the fields i'm interested in except for some experimental physics & biology have moved to OA

        • random3 3 hours ago

          Fun fact about Google Scholar: it’s "free", but it’s just another soulless Google product - no clear strategy, no support, and a fragile proprietary dependency in what should be an open ecosystem. This creates inherent risks for the academic community. We need the equivalent of arXiv for Google Scholar

          • sitkack 38 minutes ago

            And that is semantic scholar, https://www.semanticscholar.org/

            • mapmeld 19 minutes ago

              For people unfamiliar, Semantic Scholar is run by the Allen Institute and has been researching accurate AI summarization and semantic search for years. Also they have support for author name changes.

            • afandian 2 hours ago

              The Invest in Open site has a good directory of open tools.

              https://infrafinder.investinopen.org/solutions

              • kergonath an hour ago

                Yes. On one hand I’d like Google to improve things a bit. There are some rough edges, which is a shame because it indexes some things that are not in Scopus or Web of Knowledge, like theses and preprint repositories. On the other hand I worry that some manager somewhere would kill it if they realised that it is still around.

                • griomnib 16 minutes ago

                  I’m fairly sure they only exist because Larry/Sergei might give half a fuck if they killed it outright, and it has a small enough team that the cost savings for killing aren’t enough for Ruth to want to make that argument.

                  • random3 an hour ago

                    Every 1-2 months when Chrome updates I get banned by their throttling mechanism because I their extension makes too many requests and they see "unusual traffic"

                    It can take 1-2 weeks to go away and be able to use it. There's no way to get in contact with anyone. Tried the Chrome extension email, support forums.

                    It's a good reality check. There's no real support behind it and it can go away just like Google Reader did.

                    I think the motivations behind it are laudable, but they should not be the answer to the actual problem.

                • lbeckman314 5 hours ago

                  > 18. A paw-sitive contribution to Physics. F.D.C Willard (otherwise known as Chester, the Siamese cat) is listed as a co-author on an article entitled: “Two, Three, and Four-Atom Exchange Effects” that explores the magnetic properties of solid helium-3 and how interactions between its atoms influence its behavior at extremely low temperatures. Chester’s starring role came about because his co-author/owner, Jack H. Hetherington wrote the entire paper with the plural “we” instead of a single “I.”

                  ---

                  'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J. H. Hetherington and F. D. C. Willard [0, 1, 2]

                  [0] https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...

                  [1] https://xkeys.com/about/jackspages/fdcwillard.html

                  [2] https://en.wikipedia.org/wiki/F._D._C._Willard

                  • thepuppet33r 7 hours ago

                    Yes, Google deserves to be distrusted and avoided as a whole, but Google Scholar is a genuinely net good for humanity.

                    • dumpHero2 6 hours ago

                      I have similar feeing for Gmail (it's effective anti spam engine), google maps and google docs (which pioneered shared docs. It feels outdated on many fronts now, but it was a pioneer).

                      • whiplash451 5 hours ago

                        Try MS OneDrive before calling google docs outdated

                        Google spanks everyone else on robustness and responsiveness

                      • coderintherye 5 hours ago

                        Good for users of Gmail, but is it a net good? Gmail spam prevention is great for the Google Apps orgs I manage. However, for the other inboxes the vast majority of spam they receive comes from @gmail.com

                        • thaumasiotes 2 hours ago

                          > Gmail spam prevention is great for the Google Apps orgs I manage.

                          Gmail is unlikely to let spam through.

                          But that doesn't make its spam filter great; it's also very prone to blocking personal communication on the grounds that it must actually have been spam. The principle of gmail's spam filter is just "don't let anything through".

                          It would be much better to get more spam and also not have my actual communications disappear.

                        • gray_-_wolf 3 hours ago

                          Most of the spam I get is from gmail. Maybe they should apply their so effective spam engine to outgoing mail as well...

                          • crazygringo 3 hours ago

                            It's probably not. You can put any domain you want on the "from" address. Just because it says it was from Gmail doesn't mean it actually was, unless it's signed with DKIM etc.

                            I had a domain for a while that people got spam "from" all the time. It had nothing to do with me and there was nothing I could do about it.

                            • dpifke 3 hours ago

                              I run mail servers for myself, a couple of side projects, and some friends and family. A double-digit percentage of all spam caught by my filters is from Google's mail servers, not just forged @gmail.com addresses.

                              Of the "too big to block outright" spam senders, behind Twilio Sendgrid and Weebly, Google is currently #3. Amazon is a close #4. None of the top four currently have useful abuse reporting mechanisms... Sendgrid used to be OK, but they no longer seem to take any action. Google doesn't even accept abuse reports, which is ironic because "does not accept or act upon abuse reports" is criteria for being blocked by Google.

                              Most spam from Google is fake invoices and 419 scams. This is trivially filtered on my end, which makes it perplexing Google doesn't choose to do so. I can guarantee that exactly 0% of Gmail users sending out renewal invoices for "N0rton Anti-Virus" are legitimate.

                              • gray_-_wolf 2 hours ago

                                I would hope google has DKIM and SPF set.

                            • roflmaostc 6 hours ago

                              anti-spam is only an issue if people dump their email anywhere. I usually register my mail on webpages as first.last+webpage@mail.com and once they would spam this mail, it gets blacklisted.

                              I literally get only 1-3 real spam mails per month without any filter.

                              • dripton 5 hours ago

                                Words great, until a page rejects email with a '+' in it.

                                • AshamedCaptain 5 hours ago

                                  Or just knows about this Gmail trick (it's been 20 years already) and sends spam to your real mailbox.

                                  Actually, I am surprised _any_ spammy website these days would even honor the part after the +, and not just directly send to the real mailbox name.

                                  • thechao 3 hours ago

                                    I used to require a "+..." on all emails. Any email that didn't have the "+..." was sent to Spam automagically. My family were whitelisted. I gave up, because too many websites (early on) refused to take the "+..." marker, so I ended up losing too much to Spam. It's easier to just let Google sort it out.

                                    • gnopgnip 3 hours ago

                                      It's part of RFC 5233 Sieve Email Filtering: Subaddress Extension

                                    • hks0 5 hours ago

                                      Not everyone's cup of tea, but quite nice if one can afford it: I have my personal domain and a catch-all inbox. So if I want to register at acme-co.xyz I will just use acmecoxyz@my-domain.tld

                                      Maybe I should start using random words though? Wonder if someone will go bananas seeing their brand's name on my domain.

                                      • kroltan 2 hours ago

                                        Yeah, I've had to explain that a couple times already, usually when dealing with customer support or in-person registrations.

                                        And a "malicious" actor can get away with pretending to be another company by spoofing the username if they know your domain works like that. I don't think this has reached spammers' repertoire yet, but I wouldn't be surprised.

                                        Eventually I'd like to have a way of generating random email addresses that accept mail on demand, and put everything else in quaraintine automatically.

                                      • 6510 5 hours ago

                                        dots are ignored, can filter by john.doe@gmail.com

                                        not sure about capital letters

                                      • janalsncm 5 hours ago

                                        I see this recommendation everywhere and I am genuinely surprised that it works. Any spammer can find out your real address since there is an obvious mapping from + addresses to your real address. An actual solution would hide this mapping.

                                        • bachmeier 5 hours ago

                                          Yeah. Fastmail masked addresses are random. The best you can do is guess that an address might be masked, due to it not being johnsmith@fastmail.com, but it provides no information about your real email address.

                                      • globular-toast 3 hours ago

                                        Google maps would only be a net good if the data was available under a free licence. As it is they take data from people that should have gone to a public project like OpenStreetMap.

                                        • wbl an hour ago

                                          I ran into trouble because Open Topo does not report a stream the 7.5" series does. There's serious data quality issues that can make it not work for some applications.

                                          • arccy 3 hours ago

                                            "take", these people would never have produced any data if gmaps wasn't there...

                                            • hatthew 3 hours ago

                                              At one point I contributed quite a bit to google maps, because it was the primary map system I was using at the time. Had I been using an OSM-based system, I would have made contributions there instead.

                                              • arccy 2 hours ago

                                                indeed, osm can't paint itself like a victim, it needs good end products to bring in contributors.

                                      • malshe 5 hours ago

                                        I use Google Scholar daily and it's been a fantastic resource. Google Scholar with Zotero completes my articles search and storage.

                                        Btw, Anurag's last name is misspelt under the picture. It reads "Achurya" instead of "Acharya"

                                        Edit: They fixed it

                                        • mananaysiempre 6 hours ago

                                          21. Google Scholar will deny access to you if you (need to) self-host a VPN on a common VPS provider. Being a Google product, it also can’t be special-cased in your routing table. (I genuinely had to retrain myself to use Google Scholar again once I no longer had that need.)

                                          22. Switching on sort by date will impose a filter to papers published within the year, and you cannot do anything about that.

                                          • eesmith 6 hours ago

                                            > 22. Switching on sort by date will impose a filter to papers published within the year, and you cannot do anything about that.

                                            !!! And here I thought it's been broken for years, and a sign of decay due to lack of internal support.

                                            • buildbot 5 hours ago

                                              I swear this was working for me until literally today, it was really useful to find older ML papers?!

                                              • mananaysiempre 4 hours ago

                                                There is filter by date and sort by date. The former works. The latter, when enabled, even adds a banner on top of the page (in large but gray type) that says “Articles added in the last year, sorted by date”, and resets any filter you might have set before.

                                                • MichaelZuo 4 hours ago

                                                  Was this change ever logged or noted some way? Or did it just show up one day?

                                                  • philipkglass 3 hours ago

                                                    If it ever returned time-sorted results without limit, that was long in the past. It has truncated results to one year for the last several years I have used Scholar.

                                                    • crazygringo 3 hours ago

                                                      It seems so intentionally "broken", I can only guess it is to prevent scraping? Since searching for generic-ish search terms and sorting by date is a common scraping strategy.

                                                      Still, you'd think they'd do a cutoff of e.g. 500 or 1,000 items rather than filter by the past year.

                                                      So I can't help but wonder if it's a contractual limitation insisted on by publishers? Since the publishers also don't want all their papers being spidered via Scholar? It feels kind of like a limitation a lawyer came up with.

                                          • zeroonetwothree 6 hours ago

                                            Google Scholar is so good. I started doing research right when it came out and it was amazingly helpful. I can’t imagine how it was done before.

                                            • dekhn 3 hours ago

                                              I'd go to the card catalog (index), turn my question into a bag of words (tokenize), fetch all the cards matching each token (posting lists), drop cards which didn't include enough of the tokens (posting list intersection), ordering the cards by the number of tokens they matched (keyword match ranking), filter at some cutoff, and then reorder based on the h-index of the author (page rank). Then I would read each paper in order, following citations in a breadth-first manner.

                                              (the above is a joke comparing old school library work to search engines circa 2000; I didn't actually do all those steps. I'd usually just find the most recent review article and read the papers it cited).

                                              • IshKebab 6 hours ago

                                                There are alternatives, like Web of Knowledge. You basically need to be in a Uni for that though.

                                                • leephillips 4 hours ago

                                                  I would go to the library and pull volumes of Science Citation Index off the shelves. Yes, Google Scholar was a revolution.

                                                • dctoedt 6 hours ago

                                                  I'd not known about "F.D.C. Willard" — the nom de plume of a Michigan State physics professor's Siamese cat, Chester — who was listed as a co-author of a number of the professor's physics papers.

                                                  More on Chester and his co-author status: https://en.wikipedia.org/wiki/F._D._C._Willard

                                                  • elashri 6 hours ago

                                                    I did not know about PDF Scholar Readee extension [1]. Unfortunately the reason is that I use Firefox only (and safari iOS) and it is not available there. The AI outlines will be useful and I can think of myself using it.

                                                    I do not want to comment on number 20. I really wished that I joined CERN 10 years earlier but then it is the mistake of my parents :)

                                                    [1] https://chromewebstore.google.com/detail/google-scholar-pdf-...

                                                    • GeoAtreides 4 hours ago

                                                      oh no

                                                      they remembered google scholar exists

                                                      it's a great product and I don't trust google at all not to break it or mess with it

                                                      • crazygringo 3 hours ago

                                                        Google employs a lot of people from academia. Scholar is used and loved by a lot of people within Google. It's been around for two decades. I really don't think it's going anywhere.

                                                        • dekhn 2 hours ago

                                                          Reader was used and loved by a LOT of people WITHIN google, but it was shut down (and the leadership that loved it even made arguments in front of the company why it "had to be shut down").

                                                          AFAICT Scholar remains because Anurag built up massive cred in the early years (he was a critically important search engineer) with Larry Page and kept his infra costs and headcount really small, while also taking advantage of search infra).

                                                      • svat 6 hours ago

                                                        Related: 2014 article by Steven Levy, titled "The Gentleman Who Made Scholar": https://www.wired.com/2014/10/the-gentleman-who-made-scholar...

                                                        • Thrymr 5 hours ago

                                                          > Would he want to continue working on Scholar for another ten years? “One always believes there are other opportunities, but the problem is how to pursue them when you are in a place you like and you have been doing really well. I can do problems that seem very interesting me — but the biggest impact I can possible make is helping people who are solving the world’s problems to be more efficient. If I can make the world’s researchers ten percent more efficient, consider the cumulative impact of that. So if I ended up spending the next ten years going this, I think I would be extremely happy.”

                                                          Has he still been working on it in the 10 years since this article? His name is in the byline of the new blog post, but it's not clear from that how much he's been working on it.

                                                          • the-rc 4 hours ago

                                                            12-13 years ago, I ran the system that inlined Scholar and other results on the main search result pages. Anurag was still involved, but AFAIR Alex, the other author of the post who also had been there from the start, worked on most code changes. I would guess that things are more or less the same today. (Because it had such limited headcount, Scholar was known to lag behind other services when it came to code/infrastructure migrations.)

                                                            • jll29 4 hours ago

                                                              Thanks for that inside scoop, even if it's a bit dated; I wonder if they read this discussion, perhaps.

                                                              An important feature request would be a view where only peer-reviewed publications (specifically, not ArXiv and other pre-print archives) are included in the citation counts, and self-citations are also excluded.

                                                              A way to download all citation sources would also be a great nice-to-have.

                                                        • p4bl0 3 hours ago

                                                          I wish GScholar wouldn't embrace bibliometrics so much. Sort papers by date (most recent papers first) by default on an author's page rather than by citation count, or at least give author the choice to individually opt-in to sort by date by default.

                                                          • pkoird 2 hours ago

                                                            Unpopular opinion but I really liked Microsoft Academic instead until they canned it, sadly.

                                                            • breuleux 28 minutes ago

                                                              I liked Microsoft Academic far better, if only because it actually had an API.

                                                              • afandian an hour ago

                                                                What do you make of OpenAlex, which inherited the dataset?

                                                              • jrochkind1 3 hours ago

                                                                > 1. The team started with just two of us.

                                                                My guess for a while has been that it was back to two of them! if that!

                                                                • afandian 4 hours ago

                                                                  Some fun Google Scholar history from another perspective.

                                                                  https://youtu.be/DZ2Bgwyx3nU?t=315

                                                                  I recommend you watch the rest of the video, on the subject of open/closed and enclosure of infrastructure.

                                                                  • renewiltord 6 hours ago

                                                                    Google Scholar is fantastic stuff. I am so grateful for it. It’s crazy how easy it is to find papers these days by just going to it. University library search functions are completely useless in comparison.

                                                                    • gexaha 5 hours ago

                                                                      The most fun fact is that it still exists!

                                                                      • photochemsyn 33 minutes ago

                                                                        I've been using Google Scholar for a long time, but I'm finding ChatGPT search with well-crafted prompts gets more focused and relevant results than a complex keyword search on GS does. However it's often still easier to find a link to the pdf version of the paper using GS, but then scihub is still an option and can work when all else fails.

                                                                        • theanonymousone 2 hours ago

                                                                          The post uses the expression "delve into" :-/

                                                                          • sourcepluck 2 hours ago

                                                                            Is this a jokey reference to that time Paul Graham upset large amounts of Nigerians on Twitter? Or, rather, genuine concern at the thought that the article may have been generated by chatbots?

                                                                            • trash_cat 2 hours ago

                                                                              It´s because Taylor Swift´s lates album uses a lot of ´delve´.

                                                                          • robwwilliams 4 hours ago

                                                                            Our department uses GScholar as a great research-focused CV generator. Not used formally except that faculty pages have a link to their GS pages.

                                                                            • chris_wot 27 minutes ago

                                                                              How long till they kill it?

                                                                              • russellbeattie 5 hours ago

                                                                                Huh. I tried the "Listen to article" button, because I knew it was going to be generated and was curious to hear how it sounded.

                                                                                Interestingly, it highlighted the words as it read. I haven't seen that before online. Not sure how useful it is (especially for anyone interested in this particular topic), but I thought it was a neat innovation nevertheless.

                                                                                • chromatin 6 hours ago

                                                                                  21. No API

                                                                                  • kylebenzle 6 hours ago

                                                                                    I was hoping it would be 20 tips and tricks on how to use the service better not random fun facts about its history :-(

                                                                                    • wseqyrku 4 hours ago

                                                                                      For a second I thought this was buzzfeed for some reason.

                                                                                      • looneysquash 24 minutes ago

                                                                                        Oh good, it's just a celebration and not an announcement that they're killing it.