IMPORTANT UPDATE:
I messed up.
In the original post and graphic below, I said that the results were almost certainly influenced by the comments on Facebook. But I had no idea by how much.
Since then, Kenny has run another poll, using a different voting system that allows us to see the way votes are cast over time, and we can see a clear and very strong bias introduced into the results by people’s comments on the Facebook thread.
That means the same thing will have happened in the original poll, and I commented there myself, too – meaning I have to accept that I was part of the problem.
In a nutshell, every time someone posts their preference in a Facebook comment, there’s a corresponding boost in votes for their favourite master. In the second poll, the votes for one of the masters trebled in just a few hours after one of the comments, taking it from second place to a commanding lead.
This new information means we really can’t take the results too seriously. My original title for this post was “Humans versus robots: Humans WIN – by a huge margin”. I should have known better. That conclusion isn’t valid, and I let my passion for dynamics (and humans!) got the better of me.
It doesn’t invalidate the poll completely – after all, not everyone will have been influenced by the comments, and if a particular master prompts people to make comments in the first place and people then agree with it, that also tells us something.
But my headline and conclusion in the original version of this post were over the top, and I’ve decided to edit it. I considered removing it completely, but I think it’s a good example of how confirmation bias can influence us all, even when we think we’re immune to it !
So, with all that being said, here’s the original post, with visible strikethrough and comments in italics:
People are always asking me what I think of “automated mastering” – services like LANDR and Aria, for example – or “intelligent” mastering plugins like the mastering assistant in Ozone 8.
So I tested them – or at least, LANDR. Once by myself, and once someone else did it – without my knowledge !
And each time, I concluded that while the results weren’t nearly as bad as you might fear, provided you use the conservative settings, they still weren’t as good as what I could do.
However
Both these tests were non-blind. I didn’t cheat and listen to a LANDR master before doing my own masters, but when listening and comparing I always knew which master was which.
And that means I was open to expectation bias – and so are you, when you’re reading or listening to me talk about them.
So maybe our opinions are influenced by that, and if we didn’t know which was which, we would have made different choices. In fact Graham’s test wasn’t loudness-matched, which could also influence the results. The different versions were close, and in fact mine was a little quieter than the others, which should have been a disadvantage in theory – but you know me: it’s not a valid test if it’s not loudness-matched, as far as I’m concerned.
Just recently though, Kenny Gioia did something different.
Kenny’s Test
Kenny created six different masters of the same song – three by humans, three by machines. And not just any humans – one of the masters was by an unknown engineer at Sterling Sound, one was by John Longley, and the third was by none other than Steven Slate. Steven doesn’t claim to be a mastering engineer, but he certainly knows his way around a studio !
For the machines, Kenny asked members of his Facebook group which services to use, resulting in the choices of LANDR, Aria and Ozone 8.
Kenny then set up a poll, and asked people to listen and vote for the masters they liked best.
And here’s where it gets interesting
First, Kenny made the files anonymous, so that no-one could tell which was which – and second, he loudness-matched them, so that listeners wouldn’t be fooled by the ‘loudness deception‘.
Which means that provided people didn’t look at the waveforms, there was no way to tell which was which, except by listening.
As far as I know, this is the first time a blind, loudness-matched poll like this has been done.
[Edit – we now know it wasn’t nearly blind enough – see above !]
And the results were fascinating interesting
You can see a summary of how they came out in this infographic, illustrated with analysis of the dynamics of each master using my Dynameter plugin, but I wanted to take a little more time to make some extra comments here. First though, the disclaimer:
We need to remember this wasn’t a scientific test, even though it was loudness-matched and [kind of] blind. People could see how other people were voting, which results in a subtle kind of peer pressure. You can download the files and look at the waveforms, or measure them in other ways, so people might have made decisions based on that, rather than the sound alone. And perhaps most importantly of all, people were commenting and discussing what they heard all the while the poll was running – which results in a distinctly un-subtle form of peer pressure bias !
[I now think this effect was the most important factor for the surprisingly big difference in overall votes]
And, this is just one test, with one song. Kenny’s mix already sounded pretty good, and was very dynamically controlled, so different songs might have given very different results.
BUT
The results are still compelling suggestive. We can’t rule out the possibility that they would have been different if the votes and comments had been hidden [they would !] but I suspect these actually just caused the final scores to be more exaggerated than they would otherwise have been, rather than completely changed.
Here are the highlights:
Humans WIN got the most votes
Even though the results were blind, John‘s master got 42% of the overall votes. Not only that, but humans scored a massive 83% of the total votes, securing all three top slots. That’s a pretty convincing victory, even if it’s not entirely unexpected.
[True, but not as impressive as it might seem. And perhaps without the effect of the comments on Facebook, the differences between the different human masters would have been much less obvious.]
Dynamics WIN played an important role
John’s winning master was also the most dynamic. Not only that, but the winning robot master with the most votes was also the most dynamic of the automated masters, although the final result was very tight.
And in fact, the only master to break the trend of “dynamic sounds better got more votes” was the Sterling Sound master. This was made back in 2009, when the loudness wars were in full effect, so it’s not all that surprising it was pushed pretty hard but again the result is quite dramatic – this Sterling master got seven times more votes than the Aria machine master of similar loudness, which is suggestive of an interesting conclusion: if high loudness is your goal, you’re better off getting it done by a human !
[I now think the results are so biased by the comments that this isn’t a fair conclusion from this poll, although it’s still my opinion.]
Default settings suck
LANDR was the only robot master with decent dynamics, for which I applaud them – but unfortunately the heavy bass EQ of the master came in for a lot of criticism in the comments, which presumably explains why it didn’t score higher.
But elsewhere the results weren’t so positive. Kenny deliberately chose the default settings for all the automated masters, and both Aria and Ozone 8 pushed the loudness to extreme levels by default, which is not only a Bad Thing (in my opinion) but also didn’t achieve a result people liked, either.
Which means I can’t help asking – shouldn’t automated services like LANDR and Aria be offering loudness-matched previews of their output ? Otherwise, isn’t the before & after comparison they offer deeply flawed, and maybe even deliberately misleading ? Hmmm…
ANYWAY, back on topic !
EQ matters
It’s fascinating that dynamics seem to have played such an important [a] part in people’s preferences, given that Kenny’s mix was pretty dense and controlled already – but the other factor is the EQ. Broadly speaking, all the human masters were somewhat brighter than the automated versions. This EQ choice suits the song better, and I suspect this is an important factor in the results – especially since the LUFS loudness matching takes EQ differences into account, as far as possible.
Aria lost
That might seem an unnecessarily blunt conclusion, but I think it’s worth saying because in many other comparisons and conversations I’ve seen, Aria has received great feedback. This may be partly because it’s the only system that uses actual analogue hardware to achieve it’s results, but I suspect it’s more likely that it simply returns louder masters by default, which sound superficially more impressive.
[Again, I think the comment bias in the results means we can’t draw any conclusions from the details of this poll. Maybe not even for the order of the human masters.
I also want to say that personally I thought the Aria master was the best-sounding of the automated masters overall, even though it was too heavily compressed and limited for my taste.]
That’s why the loudness-matching is so crucial, because that’s not how most people hear songs for the first time. The files in this test were balanced exactly as they would be if they were uploaded as single songs to TIDAL or Pandora, and in my experience you’d get very similar results on YouTube, Spotify and Apple Radio.
So this is a great real-world representation of how most people will hear songs for the first time. CD sales are falling day on day, and the vast majority of music discovery takes place online. If you want you music to stand out and make a great impression, you need it to work well when it’s loudness-matched. And that means mixing and mastering in the “loudness sweet spot” – with balanced dynamics. To find out the strategy I recommend to achieve this, click here.
Update
Several people have strongly criticised Kenny’s decision to use default settings for the automated mastering services, saying that the humans were told not to master for loudness, so the robots should have been “told” the same thing.
That’s reasonable, and Kenny says he’ll do a new tests to address this factor, but I disagree. In my opinion it wouldn’t have significantly changed the outcome of this poll. Here’s why:
- Two of the human masters were “loud” anyway – in Sterling’s case because it was done years ago, and in Steven’s because he felt it sounded best that way, presumably. Despite this, people preferred them to the similarly loud automated masters, despite being less dynamic.
- LANDR ended up pretty dynamic anyway, but the EQ wasn’t right.
- The settings Kenny “should” have apparently used for Aria are labelled “classical and light acoustic” (E) and “for very dynamic mixes” (A) in the Help text on the site. This song wasn’t either of those – it’s a heavily compressed rock mix, so Kenny’s choice was reasonable, in my opinion.
- Finally “B” is Aria’s default setting – it includes two other presets that are even louder.
So once again – no, this wasn’t a perfect test – but in my opinion the possibility for people to be influenced by other people’s votes and comments is a much more significant criticism than the presets used for the online services.
[And now I know this was this case to a much greater extent than I expected]
Conclusion
At the end of the day, tests like this are just a bit of fun, really. To get a truly definitive answer to the question of which masters people prefer, we would need a truly blind poll, without comments, and multiple tests using many different songs in many different genres, with many more people listening.
But for now, this is the best we have and I’m calling it:
Humans WIN did really well in this poll. Just as they should I want them to !
More info
I deliberately haven’t revealed which master is which in the poll here, in case you want to try the test for yourself. To download the files, click here. To see the poll and join the discussion, click here. (You’ll need to join Kenny’s Facebook group first, to get access.)
And to hear Kenny and I discuss the whole project in even more detail, you might like to listen to the latest episode of my Mastering Show podcast. We also reveal exactly which master is which, and I give my blind comments on the different masters, plus predictions about which is which.
If you’d like to take a listen, click here.
Humans versus Robot Mastering: Updated is a post from Ian Shepherd's: Production Advice
Subscribe to the newsletter for great content from the archives, special offers and a free interview - for more information, click here