- This technique is often used when the evaluation metric is F-score
- You need to do thresholding to determine if an example is in the positive or negative class
- This is hard when:
-
- The train and test distributions are different
-
- You introduce new models into your blend
- cause each model introduces new biases & variances
- Solution: use percentile thresholding
-