Pondering Predictabilityby Claudia Perlich
In the context of building predictive models, predictability is usually considered a blessing. After all - that is the goal... to build the model that has the highest predictive performance. The rise of 'big data' has, in fact, vastly improved our ability to predict human behavior, thanks to the introduction of much more informative features.
However, in practice, the target variable is often more differentiated than accounted for in the data. For example, some customers churn (from a telecom provider) because they are moving, others because they got a better offer in the mail, and the third because their home is in a location with terrible reception. These are all positives for a model that learns to predict churn, but the predicted outcome has occurred for very different reasons. In many applications, such mixed scenarios mean the model will automatically gravitate to the one that is easiest to predict at the expense of the others.
We will cover a number of applications where this takes place: clicks on ads being performed 'intentionally' vs. 'accidentally', consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have a significantly negative impact on the usefulness and ethics of predictive modeling.