July 07 2009

Thoughts on hashDB – 140 characters just isn’t enough

 

Long Zheng (who makes me tired just following his twitter feed) has, in the space of a few hours, taken an idea from a fun side project and turned it into a full-on site. It revolves around a centralised repository of hashes, which are managed by the community and accessible through a web service. He’s got a brief summary of his idea on the site and is looking for feedback.

 

The idea is solid (I’d use it, which is a good indicator) but I’d probably add a few things:

Trusted Users

While users can register and submit their own hashes, I’d like to streamline the hash input process by allowing people who already provide the correct hashes (that is, providing verification files with the download) to be permitted to directly upload the correct hash. Cut out the moderation, call them Trusted Users for the applications they provide, and focus on the files/downloads which do not already provide the mechanism to verify files.

Choice of Format

SHA-1 is the best choice for the moment – it doesn’t suffer from collisions like MD5 and is not computationally intensive. But what if things change? This probably falls into the “future-proofing” category but I’d like the site to be able to switch to format XYZ in the future, because the advancement (and security analysis) of the cryptography and digital signature field is rapidly changing. It wasn’t so long ago that MD5 was considered “safe”.

Descriptors + Metadata

Not sure what Long has in mind, but I’d like to see a number of optional information associated with each hash.

 

Malware Hazard: Invalid hashes are fine, but if a known hash is associated with some malware – think of those Win7 Beta ISOs which were freely available but included nasty surprises when you installed it – I’d like to warn others about it. Once it becomes apparent that the ISO I have is malicious, I’d like a mechanism to submit a comment about this hash to the site. It may not be an immediate change, but after a number of responses (other users may see the comment and verify it themselves) the moderators should flip the warning switch and consider it a serious risk.

 

Versioning: While you can have a file which is called ApplicationName.X.Y.Z.exe, what if I have an installer called setup.exe (I still get this a lot, particularly with small apps). What I’d love is to see information about the source of the file (who provided it), the actual version (date, number, whatever) and perhaps some release notes (I’m pushing my luck, I know).

 

Application History: With all this rich metadata (related products, versioning, comments), why not throw in some functionality to search for information about a specific application (assuming we’re linking hashes from related applications) that the site is tracking. The scenario for this feature may be like this: “I have application ABC and am getting some crashes when I follow the same series of steps. Google’s no help with a solution, the support process for the application is lacking (forums/phone/nonexistent/whatever) and I’m at my wits end. If I can point my installer at the DB and obtain a history of the application, I can quickly decide whether to try (and get my hands on) a different version or hit the bottle and tackle the problem tomorrow.

 

It could be as simple as a link to the product’s Downloads page, but it could also be a direct link if that is at all possible. This may not work as well (Codeplex for example comes to mind) for some sites, but many applications provide public FTP access to official releases, plus beta and nightly drops.

 

 

I’ll definitely keep an eye on how this turns out – he’s already looking for help with getting something out there. Good luck!

Comments

Long Zheng

For the sake of simplicity, let's mash together the versioning and application history idea together. I think it's a great idea and practically useful especially for software, however may be outside the scope of this project. But not impossible to implement.

Someone else could very well set up a complimentary service where it uses mashups the hashes together with some richer metadata such as publisher and versioning, similar to how Twitpic and Twitter compliments each other.

Comments are closed