Library Home: Evaluating Sources: Cherry Picking Caution

Looking for significance and finding the illusion of trends.

One of the easiest logical traps for people to fall into is to confuse coincidence with causality.
In research, a scientist would ideally

Start with a hypothesis such as:
- "variable A" causes or significantly influences"effect X"
- "variable A" is NOT significantly causing or influencing "effect X"
Propose experiments to falsify this hypothesis.
(If something cannot be falsified, it is not a good scientific theory, it is a philisophical theory.)
Perform the experiments to test the hypothesis
Reflect on the results of the experiments

When the data is collected, however, the relationship between variable A and effect X may be vague or insignificant.

Lack of significant correlation is a common problem because:

Natural systems are complex
It is unlikely that one variable will be the sole or most significant factor in an effect X
No matter how many variables you test, you may never test the most significant variable, because you don't know what it is.

When no significant relationship can be indicated between the variable A and the effect X, it can seem like a the researcher has wasted time and money, and they may feel pressure to justify the work and expense by finding SOMETHING significant in the data.

This can lead to "cherry picking" data to try and find a significant correlation.
"Cherry picking" or looking for trends in data is not, in itself, bad science ...unless it is not treated as a new hypothesis to be tested!

The bad science part is if you take data that shows a trend and treat it as a theory proven, without doing more tests that could disprove the new hypothesis. The trend may be accidental or coincidental and non-repeatable.
In other words, it is fine to look for a trend in your data, but the "suspected trend hypothesis" needs to be tested with experiments that can falsify the "suspected trend theory",because the trend may be an illusion. Replicating the trend is essential to creating confidence in the trend theory indicated by the data collected.

Problems arise when pressure to publish, or lack of funding, make it impractical to re-do the experiment to properly test the new trend theory.

Thus, it is important to remind yourself that correlation between variables and effects does not necessarily indicate causation - Such a trend or relationship merely indicates a new hypothesis to be tested.