In 1999 when MP3 was getting attention, I tried to do this. I encoded a file, then inverted it, and mixed it back into the original.
It didn't cancel anything out.
The reason: Mp3 dramatically alters phase. Because all the phases are different, it's hard to naively determine how the signal is altered.
Years later, I took the time to write a series of tools to investigate lossy audio: https://andrewrondeau.com/blog/2016/07/deconstructing-lossy-...
Oh! Is that an artifact of Joint Stereo encoding ?
"What MaGuire has proved here is that the songs we listen to every single day are not the exact master copy that the artist recorded and wanted for us to hear. Instead, they are slightly stripped versions of their art run through a set of standards created by a bunch of engineers in 1993. For many people, that won’t matter. The songs sound almost the same, but the compression of music into an MP3 format is an important question to weigh when considering artistic intent and analyzing songs that aren’t exactly the original."
I feel like this analysis isn't well grounded in what artists and sound engineers actually do, or how they think.
When I've used lame --preset extreme, it uses variable bit rate which was developed some time later.
This article could at least have a paragraph explaining (in a dumbed-down way) why various kinds of psychoacoustic masking (temporal, frequency) make what's removed almost inaudible anyway. Reading the linked source (https://www.theghostinthemp3.com/theghostinthemp3.html), he at least used LAME, but at a fixed 128 kbps bitrate, not in VBR mode =(
EDIT: nerds should read about sfb21 (https://wiki.hydrogenaud.io/index.php?title=LAME_Y_switch), AAC, Vorbis and Opus (CELT) aren't just theoretical improvements
IIRC then 128kbps and lower means a 16kHz low-pass filter is applied, while it's not for higher bitrates. So in that case not just psychoacoustics.
I was under the impression that MP3 can't represent any frequency above 16kHz, no matter the bitrate.
Then they would all be 32khz, instead of 42.1khz or 48khz.
Ironically, the 'diff' is compressed anyway, because it's on Vimeo, so that's not the actual diff either!
This is like those articles on 10-bit colour where they show you "how vibrant and rich" the higher bit depth is, but you're reading it on the same old 8-bit-per-channel monitor.
There is a trick there. A sound can mask another sound. You will not be able to tell the difference with both sounds playing at the same time, but if you subtract them you can hear it because there is no masking.
I always loved to test the ears of my "Audiophile" friends. They will tell you how different MP3s are. You make a bet they can not differentiate them in 20 trials better than chance. I won with most people but some professional musicians that can identify little differences.
I spent half my professional career doing listening tests (MUSHRA and P.800), specifically on test items like Tom's diner. 128 kbps mp3 ist fairly easy to pick out, especially if you can compare it to the original. Double the bitrate and it's a real challenge.
Modern codes like opus are much more efficient. At high bitrates they are fully transparent and anybody who claims to be able to hear a difference is full of shit. Put them in a controlled setting and they fail every time.
LAME @192kb/s VBR was transparent 20 years ago, that said FLAC is still a good choice because storage is cheap now and you don't want to have to deal with a copy-of-a-copy situation.
Some Young folk think that 24bit/192kHz is the one-true-form who would think a 16/44 FLAC is a lossy encode, and then there's the vinyl folks. (I like vinyl, but not for the fidelity).
Required reading: https://xiph.org/video/vid2.shtml
I've found 128 kbps opus to be the best quality to stream my music when I'm not home. It is very fast to encode on the fly, and outside the house I mostly listen to music with either Bluetooth headphones or sometimes in a car, so playing something like flac would be a waste of bandwidth.
Maybe I'm old, but I do not hear a difference between 128 kbps opus and flac. I mainly use flac because it is an excellent archival format and you can encode it to different formats easily.
Yea same with me. Unless you have a perfect setup this is good enough. Though if you want to do a little experiment, try Fatboy Slim's Kalifornia, the beginning is notorious for destroying transform based codecs.
I think these kind of blind listening tests are fundamentally flawed. For example in the graphics realm (games, video encoding, colour science, etc) all it takes is a momentary black screen between two comparison images to make it vastly more difficult to detect differences. Likewise side by side is also more difficult than swapping between two images instantly. Audio makes it impossible to do an instant swap, at best you’re getting the equivalent of a side-by-side comparison
Audio does not make it impossible to do an instant swap. Any good ABX tool lets you switch between test/reference samples with zero delay. Hear for yourself:
https://abx.digitalfeed.net/list.html
(you can press A, B, or X on the keyboard for instant switching)
If anything those tests make it easier to find subtle differences, which is good if transparency is the goal. I don't think that makes them fundamentally flawed. They are used throughout the industry, making results comparable.
Of course there are other ITU tests that work without hidden references, looping or even A/B comparison. They require a much bigger listener pool, are more expensive and take longer, thus used less often during development.
Maybe not fundamentally flawed but audio ABX testing is focused towards short term memory and opinion (especially in unskilled subjects) than I would like. I don't think there is any right answer to audio blind tests.
I'll trust actual validated limits of human perception such as 16/48 audio, 1~3dE colour, etc. And techniques used in video encoding like psnr, ssim, etc are also pretty well grounded in science. Also SINAD
But anything involving a human blindly comparing audio is into audiophile pseudoscience territory, no matter how large a cohort of people or how it is executed
>A sound can mask another sound.
Details:
https://en.wikipedia.org/wiki/Auditory_masking
Bernhard Seeber of the Audio Information Processing Group at the Technical University of Munich has some good demonstration videos on Youtube:
" I won with most people but some professional musicians that can identify little differences."
Even this seems unlikely. I remember a test from the C't Computer magazine that has a very good reputation. And there were many professionals and as far as I remember, they were not able to tell the difference.
Fun fact: The only person that scored significantly was a person that loved punk music and had an ear damage.
[EDIT] https://www-parkrocker-net.translate.goog/threads/komprimier...
What bitrates are you using? I see practically no difference between v0 and anything above that on most things but sub 192kbps it can be very evident. I feel like a lot of the FLAC people have a hardcoded bias from the Limewire days that's hard to shake off once you've got it, you're basically listening to FLAC for the assurance that you're not missing something (which is a fair reason to a point imo).
You can try it yourself:
ffmpeg -i original.wav -codec:a libmp3lame -b:a 192k output.mp3 && \
ffmpeg -i output.mp3 decoded.wav && \
ffmpeg -i original.wav -i decoded.wav -filter_complex "[1:a]aresample=async=1,volume=-1.0[inverted];[0:a][inverted]amix=inputs=2:weights=1 1" difference.wav
Note that the original project did more involved processing, as described on https://www.theghostinthemp3.com/theghostinthemp3.html:
"Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour."
This produces an error in version 7.1 for me:
[AVFilterGraph @ 0x6000008b59d0] More input link labels specified for filter 'aeval' than it has inputs: 2 > 1
[AVFilterGraph @ 0x6000008b59d0] Error linking filters
Failed to set value '[0:a][1:a]aeval=val(0)-val(1):c=same' for option 'filter_complex': Invalid argument
Error parsing global options: Invalid argument
The theory behind it is simple: Subtract each audio sample in B from each audiosample in A.
You can do the same thing in your DAW¹ by putting A (e.g. the original) onto one channel and B (the processed sound) onto another. Then you invert the phase of B and listen to/export the sum.
This trick works also for audio gear that claims it does amazing things to your sound (here you just need to make sure to match the levels if they have been changed). Then you can look how much of the signal has truly been affected by your 1000 bucks silver speaker cable.
¹ Digital Audio Workstation, something as simple as Audacity should do the trick
> Then you can look how much of the signal has truly been affected by your 1000 bucks silver speaker cable.
I have a friend who has spent ridiculous sums of money on audio gear. Like, he's in his 50's, and still lives with his parents (in part) because of it. Over the years, I've learned I will never convince him that he's being fleeced, but I've wanted to make a site to host such A/B comparisons for a very long time, to perhaps get through to others what a waste most of the "audiophile" gear is.
Have you ever checked out the https://hydrogenaud.io/index.php?action=forum forums?
use this instead: ffmpeg -i original.wav -i decoded.wav -filter_complex "[1:a]aresample=async=1,volume=-1.0[inverted];[0:a][inverted]amix=inputs=2:weights=1 1" difference.wav
But honestly the only thing you get is something that subjectively sounds exactly the same, but lower volume. Probably due to the fact that subjective sound experience is more related to the fourier transform of the waves than it is to the waves themselves.
It's because mp3 dramatically changes phase. As a result, merely mixing the inverted original won't leave you with what's filtered out.
That technique will work with simpler compression techniques, like companding. (Companding is basically doing the digital equivalent of the old Dolby NR button from the cassette days.)
Interesting approach. So are we only able to hear those sounds now because the rest of the music was removed, which would ordinarily mask the missing sounds?
To say that the mp3-encoded version is not "what the artist recorded and wanted for us to hear" would imply that we can hear all sounds in the uncompressed recording.
Yes exactly this. On a high bitrate stream these losses are imperceptible (yes even by you golden ears out here) due to how auditory filters work in human hearing.
That's assuming we understand human auditory filters AND human beings have uniform auditory filters all around
We don't need to understand those things to determine if the differences are audible. We can simply perform listening tests:
https://en.wikipedia.org/wiki/ABX_test
A properly conducted ABX test is the most favorable condition possible for detecting a difference. If you can't ABX it you can't hear it.
There's an ABX testing website with various lossy formats you could try:
https://abx.digitalfeed.net/list.html
However, failing to ABX those specific samples does not guarantee you are unable to tell the difference in all circumstances. There are some sounds that are unusually difficult to encode ("killer samples"). This is an especially big problem for MP3. The LAME project has a collection of killer samples for MP3:
https://lame.sourceforge.io/quality.php
More modern lossy formats are less susceptible to killer samples, but theoretically there could still be problematic cases.
You can test this. Get a bunch of audiophiles in a room, play various recordings from various media (24KHz wav, mp3 at good bitrates and encoding settings, bad mp3, ogg, CD, vinyl etc) without showing them which is which, ask which one they like the most and see if the results correlate with the supposed quality of the source.
I don't have a source ready but this has been a hot topic in audiophile land for decades and tldr is they'll pick out the really bad sources (eg <128kbps mp3) but not the rest. Basically the results look like those from a blind beer tasting test: no correlation between winner and supposed quality, except if the quality is especially bad.
I'm no scientist, but to me "audiophiles who really care about this stuff can't pick out the good MP3 from the uncompressed original" is sufficient proof that MP3 is, actually, based on a sufficiently well-understood model of human hearing.
So what. People listened to music on mechanical gramophones and enjoyed it. Too many audio engineers think it's all about the sound, when in the end it's about the music and the feelings it expresses.
The way I think about recorded music is, the instrument being played is ultimately a set of speakers. That’s a musical experience to be encountered on its own terms. Live music is a wonderful thing, and presenting an illusion of live performance is certainly one thing a recording / reproduction system can do, but that doesn’t mean fidelity is the only thing we value when listening to a recording. Consider the punk music fans who enjoy a cheap stretched out cassette tape; I’m not going to deny them their enjoyment any more than I’ll deny the golden eared audiophile. They enjoy listening to different instruments.
And this brings me to my point: the engineer is no mere technician whose job is only fidelity. The engineer is part of the performance through speakers, an artist no less than the musicians being recorded.
"People listened to music on mechanical gramophones and enjoyed"
Herman Hesse (or rather Harry Haller in Steppenwolf) used to get enraged by that - distorted garbage and a blasphemy to the godly componists. But then eventually he overcame it because of exactly that reasoning - if people enjoy it and are touched by the music - it works good enough. That is the purpose of music, not arbitrary perfectionism.
You're naïve if you think that 1) Good enough prevents better from existing for some people, 2) Technological advances had the same consequences for all material. For example, orchestral music massively benefited from CD's increased dynamic range.
Very interesting! The audio of Tom's Dinner rejected by the encoding sounds mesmerizing to me. I still find it to be musical - it reminds me of a record I bought a really long time ago, it was called modulation & transformation on mille plateaux, it's a collection of songs in the abstract and experimental genre.
Fairly lackluster article.
Not all .mp3's are created equally and can vary in how lossy they are based on the bitrate.
If you care enough to want to hear exactly what the artist wants you to hear, you just listen to the lossless version.
> Not all .mp3's are created equally
No kidding. There are a decent number of options to pick for quality of mp3. VBR/CBR, bitrate, joint/stereo/mono, and so on. I personally just pick something that sounds fairly close to the original. But that only really matters for me when it is side by side. Give me a few days between picking and I can not really tell anymore.
The "lossless" version is in fact not entirely lossless. There is much that gets lost in the process of digitization, and even if there is no digitization happening the analogue equipment has some frequency response range.
Maybe theoretically but for our purposes doesn't the Nyquist–Shannon sampling theorem mean it's essentially lossless to our ears?
Found the original author's page about the project (no longer on the internet): https://web.archive.org/web/20211011015410/http://ryanmaguir...
One interesting thing to note: this is a composition, not an analysis. It's not fully documented exactly what modifications to the "raw data" were made.
I can open the original link, https://www.theghostinthemp3.com/theghostinthemp3.html, just fine. Maybe it's reinstated? Or there was a temporary problem?
I think the thing that's really sticks out is that the breath noise are gone, which is one of the things that gives the track its character. Willing to bet the same kind of thing happens to fret noise as well.
My headphone rig was always optimized for a nice guitar sound, and I would second that. The sound of fingers on the string and the "pluckiness" of the guitar is what gets lost.
Beyond that, the specific thing that i noticed gets lost is bass character on some tracks. Ex: Some drum and bass tracks just don't hit at low bitrate. This aspect sometimes feeds into the low guitar strings though, where they might have a bit less body.
Lastly is of course sound staging, but that's something that a headphone setup is very sensitive to.
As for quality differences, I basically fall in line with the consensus on this thread. FLAC and 320 are indistinguishable. 192khz is almost always indistinguishable and good enough, although there are some situations where it might be slightly noticeable. 128 is pretty easy to tell the difference with the good setup.
There is also the rare track with amazing production and or very cool stuff happening somewhere in the spectrum, so I don't entirely write off someone wanting a plus one on the take above. "I absolutely love this jazz album. It's been a large part of my musical journey as a human, I can a/b test this at 320, but for this album, I really want it lossless." I can respect that. I got to know some of my test tracks pretty well (ex Black Sun Empire & Arrakis for bass), and while 192 was fine for other tracks, I wanted 320 or lossless on those.
Funny article.
"The exact master copy that the artist recorded and wanted for us to hear" In the digital era, does that even, uniquely, exist?
"a set of standards created by a bunch of engineers in 1993" Nice!
Was hoping the article would mention double blind studies about the ability to perceive differences and the quality between various audio file format, available elsewhere. Interesting, though not as overwrought as the reporting in this article.
This could also be done on visual compression with JPEGs.
Or on video compression, for that matter.
It just shows though that these diffs are invisible to a human - by design.
Animals and aliens might cringe at these images and sounds, though.
ps, you could do the same thing with watermarked content.
This is why we dont encode mp3 in 96kbps or whatever.
I would have liked a comparison with Ogg and perhaps other formats. I hear a lot about MP3 throwing away a lot more than Ogg but I'd love to see real data on it.
I don't think the claim is that MP3 throws a lot more away, it's that Vorbis sounds better. The two are very different, Vorbis might be throwing a lot more away, but if it's only the stuff you can't hear, then it'll sound better.
ogg is a container format, not a coding format.
Vorbis, then.
The article doesn't mention at what bit-rate the difference track was made, anyone knows? Seems disingenuous and pro-"authentic" otherwise.
It also doesn't really mention how this “lost” material was identified. If you just subtract the encoded from the original, then any phase difference will make it sound like material “disappeared”, while in reality, it just came very slightly earlier or later.
I guess it was exactly as you write - but instead of slightly earlier or later, the "lost" sounds are the high frequencies (a lot of hissing, clicking etc.) - the actual sound is mostly still there, but slightly "muffled" because it contains only the lower-frequency components.
It's 128kb. The useful and informative article is mentioned at the end: http://theghostinthemp3.com/theghostinthemp3.html
If it's really the original MP3 version of Tom's Diner produced by the Fraunhofer engineers, probably not a very high bitrate. Aside from that, I would say that even back in 2015, MP3 was already on its way out and replaced by better (while still lossy) compression methods?
With space not much of an issue anymore, FLAC is pretty much the default nowadays. And even though Opus and AAC and others have better encoding than old MP3, but I guess a 128 Kbps MP3 encoded in the original Fraunhofer l3enc (the best back then) and one encoded with LAME will be different - and the LAME version will be "better" because of improvements in psychoacoustics? At least I remember l3enc being MUCH better than anything else at 128 Kbps (Xing lol, cymbal washing anyone?) before LAME came along.
> With space not much of an issue anymore, FLAC is pretty much the default nowadays
The default for what? Space is not the only consideration. What about bandwidth?
I'm pretty sure spotify, deezer and the others are not transmitting FLACs, especially not at the base quality level.
Now I want this comparison for Opus. It doesn't do that whole psychoacoustics thing, does it? But it also somehow manages to ~double the compression ratio compared to MP3 without any noticeable difference in the sound quality.
I haven't gotten into reading about audio and video compression yet, but it from where I am standing now it really looks like magic.
When I first experience CD audio it was too high pitch compared to tape versions. MP3 came along and each song sound different depending on the MP3 compression settings.
Some CDs were mastered to have pre-emphasis that boosted high frequencies if the player didn't properly account for it. This caused them to sound "high pitched".
...I think this person just created a new genre of music. Something like: "What's lost noise."
I immensely enjoyed listening to the "lost material" of Tom's Diner and would like to hear more of this!
Maybe one could diff with a lower quality version, one where more has been cut away, more is lost/left over? There are so many possibilities!
Lost-Fi.
That's the Zeitgeist.