the idea of this technique is to not train on complete instances. rather, we make a model that only trains on a subset of each training example benefits: we need less GPU memory to train one example