Video Details

Will Critchlow takes to the Whiteboard this week to show us some of the common mistakes in turning raw data into actionable recommendations, and how to avoid them.

Independence is important

If you are expecting to do analysis over many sets of trials, particularly involving humans, you need to be careful that the results are independent. For example, if you include the same person multiple times for the same questionnaire, their answers will be skewed as they have already answered the same questions.

Don’t worry too much, but acknowledge it as a problem. With PPC display copy tests for example, you need to be aware that a lack of independence exists. This is because you cannot be sure if the person has seen that ad never or multiple time before.

Repetition can be problematic

Repetition is the key to many statistical problems, but in other ways it can be the downfall. For example, when you are given the confidence interval, like 95% certainty of an event occurring. That number says that, the chance of getting this extreme a difference in results by chance if these two things were identical is less than 5%. Basically, less than 1 in 20 times this event will happen by chance.

Therefore, be very careful of celebrating a certain result too early. Run many, many trials, and when an event occurs, stop and rerun it. If it comes up again as statistically significant, then go tell your boss.

Don’t train on your sample data

If you train your model only on your sample data, you will only teach the model to work with the data it has learnt. You need to train it on one set, then test it on a completely independent set. If it works on the independent set, you have a good model.

Don’t blindly hunt for patterns

This is where you can run into problems with correlation rather than causation. For example, you may hypothesise that the average number of words affects your rankings. You may find that the correlational data says that a certain number of words leads to these rankings, however, as there is no relationship between the two variables, you’ll just end up looking stupid in front of your boss.