« BackFind the oldest line in your repomilofultz.comSubmitted by surprisetalk 5 months ago
  • lb1lf 5 months ago

    Not on Git, but I was curious and grepped through the Siemens S7 repository I maintain at work; we've been using the same comment practice since forever, with the date in ISO8601 format. (Since before ISO 8601 even was a thing!)

    Oldest I found?

    1986-06-17: Trygve glemte å sjekke om vi deler på null. Fikset.

    (Trygve forgot to check whether we divide by zero. Fixed.)

    • nonameiguess 5 months ago

      Yep. Huge limitation of this approach is the assumption the code has always lived in the same VCS. I remember migrating a monorepo for geointelligence processing algorithms years back to multiple repos, and its own history was it had been in Mercurial before Git, ClearCase before Mercurial, and God knows what earlier than that. But I was reviewing old Fortran code for handling specific vehicle missions and could see from the mission codes that some of these were satellites launched in the late 80s, so the code itself had to be even older than that.

      • loeber 5 months ago

        Incredible. This is like Graffiti from Pompeii.

        • NoMoreNicksLeft 5 months ago

          You've got me beat... at a previous job, there were comments from 1991 complaining about how it was ported/rewritten from cobol.

          • AlotOfReading 5 months ago

            A former job had a similarly old codebase. I made the mistake of asking where the docs were and someone pulled out this typewritten booklet of yellowed pages from 1981 that was the only remaining copy of the originals in the early 70s.

            The code was on at least its 4th language by that point and is probably still running robots today.

            • jamesfinlayson 5 months ago

              Ported to what?

              • NoMoreNicksLeft 5 months ago

                Oracle Pro*C. You don't want to know.

                • NetOpWibby 5 months ago

                  I’ve never even heard of this before. Talk about old-school!

                  • bogomog 5 months ago

                    I forgot it existed, was using it at a couple of jobs circa 1989-1992. Completely blocked it out of my mind.

            • akoboldfrying 5 months ago

              Ugh. That is sooo Trygve.

              • lb1lf 5 months ago

                Don't be too hard on him. For sufficiently large values of zero, you don't really need to check...

            • js2 5 months ago

              You don't need to blame every file[1]. Use `git rev-list` to find your oldest commit:

                git rev-list --reverse --date-order HEAD | head -1  # or
                git rev-list --reverse --author-date-order HEAD | head -1
              
              To see the files in that commit:

                git ls-tree -lr <commit-id>
              
              To see a particular file:

                git show <commit-id>:/path/to/file  # or
                git cat-file -p <commit-id>:/path/to/file
              
              [1] Caveat: I suppose this doesn't account for files which no longer exist or that have been completely re-written.

              https://git-scm.com/docs/git-rev-list

              https://git-scm.com/docs/git-ls-tree

              https://git-scm.com/docs/git-show

              https://git-scm.com/docs/git-cat-file

              https://git-scm.com/docs/gitrevisions

              • kazinator 5 months ago

                Yes, your caveat is correct: it's possible that none of the lines present in your oldest commit have survived into the current head commit of your main branch.

                Not sure how the article's algorithm deals with renames. If a file being renamed as a deletion and addition, then that conceals the age of the lines.

                • js2 5 months ago

                  > Not sure how the article's algorithm deals with renames

                  It's relying on `git blame`'s default behavior which is: "The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off). To follow lines moved from one file to another, or to follow lines that were copied and pasted from another file, etc., see the -C and -M options."

                • roetlich 5 months ago

                  > I suppose this doesn't account for files which no longer exist or that have been completely re-written.

                  But... that's the point of this? Finding the initial commit is not nearly as fun as looking at the oldest code that is still running.

                • skeptrune 5 months ago

                  I like leaving something like gitlens on so I can see the super old lines ad-hoc when I naturally come across them. It's fun to get glimpses of the past.

                  • cmgriffing 5 months ago

                    Take my upvote :)

                  • lutherqueen 5 months ago

                    Similar oneliner to paste on MacOS terminal and get the eldest line for each file extension:

                    for ext in $(git ls-files | grep -vE 'node_modules|\.git' | awk -F. '{if (NF>1) print $NF}' | sort -u); do echo -e "\n.$ext:"; git ls-files | grep "\.$ext$" | xargs -I {} git blame -w {} 2>/dev/null | LC_ALL=C sort -t'(' -k2 | head -n1; done

                    • OJFord 5 months ago

                      It's probably almost always going to be a boring config line(s) in the initial commit?

                      A section header in a pylintrc or Cargo.toml, a Django settings.py var, etc. Or even an import/var in a file that's core enough to still exist, import logging and LOGGER = ... for example.

                      • lionkor 5 months ago

                        You underestimate the amount of software that starts with CRUFT

                        • OJFord 5 months ago

                          No, I think a lot of it remains a long time.

                      • hoten 5 months ago

                        "Initial Commit", 9 years ago (transfered an at-the-time 15 year old SVN repo)

                        sigh..

                        • krick 5 months ago

                          FWIW, when I ported SVN repos at work, I converted the commit history as well.

                          • notwhereyouare 5 months ago

                            we hired contractors to move us from source gear to git and they said "moving the history would be too hard, so we didn't"

                            lost probably 10+ years of history

                            • undefined 5 months ago
                              [deleted]
                              • almostnormal 5 months ago

                                Renaming the company is also fun when its name is used for folders / package paths. The history isn't lost, just unusable.

                                • kridsdale3 5 months ago

                                  There are a million files in the Meta monorepo starting with FB. I don't think they even changed the practice.

                                  • pavlov 5 months ago

                                    You see, the “fb” in fbcode stands for “fierce & beautiful”.

                                    • SenHeng 5 months ago

                                      “Fast Breaking”

                                • JensRantil 5 months ago

                                  facepalm

                                • hoten 5 months ago

                                  yeah... wasn't me that did it though. Same group of people did it with a git repository they "recreated" more recently. They just don't know how to software.

                                  I got my hands on the old SVN but it's a few TBs and so I had some trouble unzipping it. Maybe someday I'll patch a branch for blame archeology.

                                • jamesfinlayson 5 months ago

                                  Sigh indeed... at a previous job there was a project that was a port of an Algol project that began in 1992. I have no idea what version control systems were used in its history (wouldn't be surprised if it started with no version control) but the last version control migration was from Team Foundation Service to GitHub and of course it was just a single commit of the then current master. 23 years of history gone.

                                • verytrivial 5 months ago

                                  Our code base still has ghost comments about code being just so because the NeXT compiler won't accept it any other way. No one has the heart to remove them.

                                  • jamesfinlayson 5 months ago

                                    I picked up a project from 1999 a few years ago that still had far pointer macros - I didn't think they were still a thing in 1999 so I'm not sure why they were there to start with. I think I've left them though.

                                    • nortlov 5 months ago

                                      I imagine future engineers as archaeologists of software development, in a way, digging through ghost comments like fossils in the code.

                                      • chrisweekly 5 months ago

                                        future engineers? archaeology is an essential part of virtually every real-world software project

                                    • zellyn 5 months ago

                                      In our monorepo (of 101470 Java files, according to

                                          find . -name '*.java' | wc -l
                                      
                                      ), I shudder to think how long that would take. For large repos, I imagine you could get quite a bit faster by only considering files created before the oldest date you've found so far.
                                      • kazinator 5 months ago

                                        If any of the lines form the repo's first commit happen to be untouched, then that's a huge short-cut: those lines are the oldest. Finding one of those lines manually is a pretty easy task. Enumerating them all accurately, less so.

                                        • JensRantil 5 months ago

                                          Not sure why all the lines of code. This is much shorter:

                                              git ls-files|xargs -n 1 git blame --date=format:%Y%m%d -f |grep -Eo '\d{8}.*' |sort -r | head -n 1 | sed 's/^[^)]*) \t//'
                                          
                                          (on MacOSX)
                                          • js2 5 months ago

                                            Get in the habit of using `ls-files -0` and `xargs -0` to prevent surprises. But there's no need to blame every file:

                                            https://news.ycombinator.com/item?id=42883340

                                          • jamesfinlayson 5 months ago

                                            Hm, tried this on a Mac but something must be askew - it returned a commit by me in 2022 in a repo that has existed since at least 2017.

                                            • JensRantil 5 months ago

                                              Whoops, it extracted the newest row :) Correction:

                                                  git ls-files -z|xargs -0 -n 1 git blame --date=format:%Y%m%d -f |grep -Eo '\d{8}.*' |sort -n | head -n 1 | sed 's/^[^)]*)*[ \t]//'
                                              
                                              This also use a null delimiter for the files and fixes a small sed pattern bug.
                                            • pc86 5 months ago

                                              Formatting strikes again

                                              • JensRantil 5 months ago

                                                Thanks. Fixed.

                                            • abejfehr 5 months ago

                                              doesn't seem to work on macOS, I get:

                                                find: illegal option -- t
                                                usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
                                                       find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]
                                              • spatten 5 months ago

                                                I ran into that too. Turns out that you need to add in the path you want to search when invoking the command. If it's your current working directory, use `.`

                                              • undefined 5 months ago
                                                [deleted]
                                                • donatj 5 months ago

                                                  I wrote a similar maybe hacky script using `git blame` on every file. In our main application, we still have a couple lines from the initial commit in 2011.

                                                  • lionkor 5 months ago

                                                    At my old job, I remember it was some time at the beginning of the 1990s. I was born like 8 years after the code I was working on was written.

                                                    • kridsdale3 5 months ago

                                                      Oldest file I ever had to fix was the same age as me: 1986. Found a bug in 2013. Timezone math.

                                                    • rozenmd 5 months ago

                                                      > README.md 2021-01-28 17:27:57 +1100

                                                      Huh, TIL the birthdate of my business was actually a couple of days ago.

                                                      • ceejayoz 5 months ago

                                                        I suspect it'll be index.php's <?php line, lol.

                                                        • rdc12 5 months ago

                                                          For the codebase I was working on today (in C). At first it was just } so filtered those out, then it was /* (no comment detail on that line) so again filter them. Then it was a bunch of #includes.

                                                          Not surprising but not insightful at all unfortunately

                                                        • password4321 5 months ago

                                                          Always start your git repos with an empty commit, right?

                                                          • francisofascii 5 months ago

                                                            Maybe a readme.md with the initial name, a license file, and a .gitignore file. Whatever it is that all repos would have regardless of the language or application type.

                                                            • password4321 5 months ago

                                                              Apparently not really helpful in the least anymore.

                                                              https://stackoverflow.com/questions/77446305/what-are-the-be...

                                                            • inglor_cz 5 months ago

                                                              Sep 30, 2008 at 5:03:58pm, revision 1.

                                                              It is SVN, though, and not Git.

                                                              • kridsdale3 5 months ago

                                                                What if my repo is older than git?

                                                                • wiml 5 months ago

                                                                  At $oldjob our Git repos had all been created by importing history from SVN, and that SVN repo had been created as an import from CVS, so there were some fairly old timestamps in the history. Only about a decade older than Git, though. Nothing from the era of SCCS or (bare) RCS.