With the story of a salmon and a good dose of humor, Louise Leitsch (Director of Research at Appinio) published an article a few months ago that caused quite a stir in the small world of market research. It was on a subject that, on the face of it, was not the most glamorous: statistical analysis in the world of market research. But she also takes a critical view of current practices, which she believes should not be used in academic research, unless you want to lose your reputation or even your job. She answers questions from Market Research News.
MRNews: In your article, you make some rather harsh comments about the statistical analyses commonly used in the world of marketing research… Is scientific rigor so lacking?
Louise Leitsch (Appinio): First of all, I’d like to say how much I love market research! I’ve been working in this field for 7 years now, after having had responsibilities in university research in the Netherlands, and more specifically in the field of social psychology. The last thing I want to do is cast aspersions on marketing research. But, no doubt because of my training and background, I am struck by the fact that these studies are often conducted in an unscientific manner.
Our business is called « market research », but in reality, I don’t see much research. If you submit a problem to 4 institutes, each will propose its own approach and methodology, without really explaining why. But if you ask 4 different universities how to measure people’s personalities, all the researchers will draw on the same major references. When I started working in marketing research, I tried to find the books or scientific articles that supported this or that practice. But more often than not, I was told that they didn’t exist, that they were trade secrets… Maybe I’m exaggerating a little, but some books do refer to them. If I want to develop brand tracking, for example, I can rely on the book « Better Brand Health », which compiles decades of research. But yes, indeed, the scientificity of our universe seems to me very optimizable.
If you submit a problem to 4 institutes, each will propose its own approach and methodology, without really explaining why. But if you ask 4 different universities how to measure people’s personalities, all the researchers will use the same major references.
This diversity of methods can also be seen in a positive light, as a sign that research is always open…
Yes, in the sense that it’s possible to constantly innovate. But if you’re buying research to inform important corporate decisions – on communications or R&D issues, for example – you’d probably prefer to be sure that you’re relying on the most scientifically proven methods… When you ask research practitioners what they do and why they do it, they usually reply that they’re used to doing it that way… Which isn’t exactly reassuring. In the Middle Ages, wasn’t it customary to bleed people to cure them, when in fact the effect was to kill them? As in so many other fields, real progress in medicine has come from the use of scientific knowledge.
If you’re buying research to help you make important decisions for your company, on communication or R&D issues for example, you’d probably prefer to be sure that you’re relying on the most scientifically proven methods…
You point the finger at a practice you consider aberrant: that of putting significance tests everywhere, hundreds of them for the smallest study. Why is this so reprehensible from your point of view? And what should be done instead?
A significance test is never absolute. When we apply this type of test, we have this famous p-value, and we look to see if it’s less than 0.05, or even 0.01. This value indicates not a certainty, but a probability, that there is a difference between two proportions, for example, on consumer purchase intentions for two different propositions. Is the null hypothesis true? In this particular case, a hypothesis has been formulated, and the company is wondering which product to launch. Except that, in the vast majority of tests, there are no hypotheses. In the multitude of tables, we look for any deviation that might be significant, often ignoring other results. But we’re in the realm of probabilities here, and deviations can be declared « significant » without actually being real. If you carry out 100 significance tests, there’s every chance that you’ll come up against « alpha errors », and conclude that the discrepancies don’t exist.
What’s more, just because a difference is significant doesn’t mean it’s meaningful for making a decision, even though that was the concern of the inventors behind these tests…
Just because a difference is significant doesn’t mean that it makes sense to make a decision, although this was indeed the concern of the inventors behind these tests…
How did these famous significance tests come about?
They were designed by the Guinness brewery in Dublin in the mid-18th century. At the time, the company needed to find an effective method of validating the quality of its beer, by measuring it on small samples of 50 units, rather than 1000 or 2000 people as is often the case today. And we were right in the context we mentioned earlier: the idea was to confirm or refute a hypothesis.
This notion of hypothesis is therefore the missing key that needs to be systematically reintroduced into research projects… Doesn’t this indicate a lack of connection between research and decision-makers in companies?
Absolutely, and perhaps especially in large companies where research teams can be quite distant from marketing teams. Behind the hypothesis lies the essential reason for the study. You want better packaging, or a more convincing product to target young people, or communication that is more favorable to the brand’s image, whatever… The study is there to ensure that the company’s desire has a real chance of succeeding. In this case, the significance test comes into its own. It must be applied to the hypothesis, and not to all the variables, in all the results tables. If there are no hypotheses, it’s better not to carry out any studies, or perhaps to embark on other research that will enable us to take our thinking further, in particular qualitative studies.
But, from a decision-making perspective, the hypothesis needs to be precise. If I measure the appeal of a new packaging, and the percentage of potential buyers rises from 10% to 13% on samples of 1,000 people, the difference is certainly statistically significant, with the limits we mentioned earlier. But is this difference sufficient to justify the investment required for this change? The test won’t tell, but that’s the real question that companies need to ask themselves…
A deviation may be « significant ». But is it enough to justify the investment needed to change packaging, for example? The test won’t tell, but that’s the real question a company needs to ask itself…
In addition to this notion of hypothesis, what are the possible alternatives to the use of these significance tests?
Bayesian analyses are certainly a more modern and relevant approach, but they are still relatively little used in market research. They are much more widely used in the academic world, which is increasingly turning away from significance testing. I’m optimistic about their future use in marketing research, but this will undoubtedly happen over time.
You work in a number of different countries. Do you observe any French specificities in these areas?
Ah yes! The French are the most obsessed of all with questions of sample representativeness, far more so than the British or the Spanish, for example. In Germany, this varies according to the seniority of the teams. As we all know, 100% representativeness is either impossible to achieve, or unrealistic from an economic point of view. Certain sections of the population are unreachable, even though they are part of society: I’m thinking of the homeless, refugees… Or people affected by illiteracy. But is this representativeness really necessary to help companies make decisions? I don’t think so, except in very special cases. On the other hand, when comparing the results of two tests, on packaging A and B for example, it is important to ensure that the populations studied are comparable. This raises the question of which variables should be used to ensure comparability. As Emilie has already mentioned, we at Appinio are convinced that socio-demographic data are of little relevance for this purpose, and that psychometric values are of little interest, except in very special cases.
Read also > Interview with Emilie Faget (Appinio): « Good segmentation is both enlightening and easy to implement ».
Do you see any other « aberrant » practices in the statistical analyses carried out in the field of market research?
I confess to being surprised by the practice of questioning people with a metric scale and then transposing their answers into « Top 2 Box » scores. Even though the metric variable has many advantages, including the possibility of generating averages and standard deviations, and thus factor analyses. Why « pester » consumers with metric scales, only to impoverish their responses afterwards and limit the power of analysis?
I confess to being surprised by the practice of questioning people with a metric scale and then transposing their answers into « Top 2 Box » scores (…). Why « pester » consumers with metric scales, only to impoverish their answers afterwards and limit the power of analysis?
Aren’t there certain analysis methods or principles that deserve to be used more often?
Yes, this is the case with trade-offs and, more broadly, with conjoint analysis, which is a very powerful tool for clarifying the decisions to be made in a company, bypassing the difficulty people may have in responding with conventional scales. Trade-offs lead them to make choices, which they do all the time as consumers. So it’s much more natural for them to respond in this way than to evaluate proposals – unless they’re teachers! (laughs)
Segmentation trees are another method I really appreciate, as they are highly effective in prioritizing the causes of a phenomenon. It’s also a good way of making assumptions, rather than relying on rather poor socio-demographic criteria.
Do you have any final important message?
As market researchers, I believe we have a real responsibility. We need to change practices, and push back the overuse of significance testing. This could save us the infamy of being « pinned » by the Data Colada blog, as has been the case for some renowned academics who have lost their reputations and sometimes even their jobs through the misuse of statistical tests. But above all, it will help us to do our job as well as possible, which is to inform and secure decisions in organizations and businesses.
FOR ACTION
- Interacting with the interviewee: @ Louise Leitsch