One day, the team mentioned on their standup that the de-duplication of data was taking much longer than expected. It just wasn’t proceeding the way it should and as a result our timelines were backing up a bit.
I asked, casually, “Are you hashing and scanning?”
The entire team looked at me like I had three heads.
At first, I wondered if somehow I was out of touch as it wasn’t my primary area of expertise and had been a while since I did day to day data work. Was I out of touch? Was I asking the dumbest question that could be asked with no one gutsy enough to tell the COO, “Duh — of course.”
Then, one brave junior spoke up. “What do you mean?”
I was a little relieved but also a bit shocked. So I answered, “Take a representative subset of your data and cryptographically hash it. Then use the generated hash to filter instead of trying to force your way through the entire dataset.

No comments:
Post a Comment