I want a notebook situation where the platform understand sampling, so that, whi...

ivalm · on Jan 28, 2020

You can sample data if you want already (or sequentially load partial data, which is what I usually do if I just want to test basic transformations), but if you need to worry about rare occurrences (and don't know the rate) then sampling can be dangerous. For example, when validating data there are edge cases that are very rare (ie sometimes I catch issues that are less than one record per billion), it can be hard to catch them without looking at all of the data.