Sunday, September 10, 2023

One day, the team mentioned on their standup that the de-duplication of data was taking much longer than expected. It just wasn’t proceeding the way it should and as a result our timelines were backing up a bit.

I asked, casually, “Are you hashing and scanning?”

The entire team looked at me like I had three heads.




At first, I wondered if somehow I was out of touch as it wasn’t my primary area of expertise and had been a while since I did day to day data work. Was I out of touch? Was I asking the dumbest question that could be asked with no one gutsy enough to tell the COO, “Duh — of course.”

Then, one brave junior spoke up. “What do you mean?”

I was a little relieved but also a bit shocked. So I answered, “Take a representative subset of your data and cryptographically hash it. Then use the generated hash to filter instead of trying to force your way through the entire dataset.

No comments:

Post a Comment