The Challenge of Large File Checksums

Photo by Alexandros Giannakakis / Unsplash

How often do you use a website's provided checksum to verify that a file you downloaded wasn't corrupted or worse, what if it was tampered with during a man in the middle attack?

In the modern day of internet, I suppose you won't be using the checksum values on the website to make sure these things doesn't happen. Since we just all assume that the download is correct and if not we will just press the download button again.

We are shielded from these issues most of the time as we often relay on data transfer protocols to guarantee packet delivery.

Well, as a person who have too much time on their hand sometimes, I like to use the checksum function to make sure what I'm downloading is what the vendors share.

Every time when I try to use run checksum on large files over a few Gigabyte large, it started to feel slow, takes a few extra second than normal. So I decided I needed to take this issue into my own hand, and save myself a few extra second.

What is Merkle Tree

A Merkle Tree is when you try to compute a hash object though a tree like structure:

https://en.wikipedia.org/wiki/Merkle_tree

In this case you will be chopping up the files into chunks or data blocks, which you can individually hash them creating the leaf node of the tree.

Which than you can build upon the leaf nodes by hashing each data blocks with it's neighbor. Repeating the process recursively and it will build into a full Merkle tree

Here is what it will look like:

Spiting the large file into chunks

Computer hash of each file chunk

Computing hash of the next layer continuously until final node is reached

computing hash for each chunk and continuously hash the resulting hash nodes until final node is reach

Here is a concurrent Merkle Tree program I wrote in Golang:

When writing this program, I was trying to decide if I should use the fork-join method to hash the nodes concurrently, or if I should implement a pipeline method, where each level of the tree would be handled by a separate thread, responsible for hashing the nodes at that level until reaching the final level (root).

I ultimately went with the pipeline method because I had a few other ideas I wanted to work on, and the pipeline approach seemed more straightforward to implement.

Reference:

Update and old bugs report

It's been a while... Man, a lot has happened. Earlier this month, I got to visit San Francisco. It ended up being a pretty long trip. I still had a lot of fun and got to eat some really good sushi while I was there. Lately, I haven&

How Next.js Works Under the Hood, Server vs Client components

Hello, I wanted to do a deep dive into Next.js's structure and how it differs from traditional HTTP servers such as Flask, Express.js, and many other frameworks. As you know, Next.js is specifically written for React applications, it is made to develop full stack applications

English - 蜡笔小新《大人帝国的反击》Nostalgia, and Why I Almost Built DeepSeek

English translate of the chinese blog So this is a blog in Chinese. I haven’t written in Chinese for a long time, so I just wanted to try writing again. It’s probably been three or four years since I last wrote like this. Recently I’ve been using

蜡笔小新《大人帝国的反击》Nostalgia, and Why I Almost Built DeepSeek

Thanks for Chatgpt voice for fixing my awful chinese 碎碎念 grammar 所以这个是中文的博客，很久没写了，所以想要看一下怎么写。大概有三四年没写了吧，有时候最近越来越少用中文了，感觉可以再加努力一下。所以这是想说一下以前看的小动画片吧，这边有蜡笔小新：大人帝国的反击，然后下面连了一个视频，YouTube的视频（油管的视频），可以看一下，是比较详细的介绍吧，宣传一下。如果论知名度的话，看字幕的话大概知道它。如果单说这部电影的话，我个人觉得它是我小时候看过最好的动画电影之一。《蜡笔小新》系列其实几乎每年都会出一部剧场版，但我大概从 2016 年左右开始就慢慢不看了，主要是因为太忙了，然后就是基本看不到这次重新想起来，是因为最近又开始偶尔看看《蜡笔小新》，当作“下饭番”还挺轻松的。虽然不算特别认真地看，但还是会被一些情节触动。那我对蜡笔小新《大人帝国的反击》具体的印象就是，大概大人全部走掉，然后小孩子的天堂吧。