Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Script to do instant MD5 collisions of any pair of PDFs (github.com/corkami)
85 points by isp on Dec 20, 2018 | hide | past | favorite | 15 comments


Announced by the author here: https://twitter.com/angealbertini/status/1075417521799528448

Commit: https://github.com/corkami/pocs/commit/3832f62d8aad64d541c5d...

Readme: https://github.com/corkami/pocs/blob/master/collisions/READM... ("With this script, it takes less than a second to collide the 2 public PDF papers")


The author is also the person responsible for a lot of the polyglot files from the PoC||GTFO series: https://www.alchemistowl.org/pocorgtfo/


I note this script takes two existing PDFs as an input and produces two new PDFs that collide with each other as output, but they do not collide with either of the originals. Thus this does not enable the obvious attack of creating a PDF that contains different text but collides with an existing PDF that you do not control. It does enable some other forms of duplicity, but only if you are the source of both documents.


This has been possible for a long time, right? Is there anything particularly innovative about this approach which has not been done elsewhere?


It's been possible to create pairs of colliding PDF files. Taking any two existing PDFs and creating a colliding pair while keeping the same visible rendered output is probably what's new.


That's not difficult so I doubt it.

For MD5 we have what's called a chosen prefix collision. Given a start of a file, you can get back two "next bits" of the file which are similar but slightly different, and both those files (the chosen prefix plus A or chosen prefix plus B) have the same MD5.

Then because this is an MD family hash you can add any fixed suffix whatsoever to both and still get the same MD5.

So you put A and B inside a part of the file that isn't visible but can influence a conditional test elsewhere. Then you use the conditional test to flow the two different outputs, all the rest of the file is the same.

Some day this will definitely be possible with SHA1 and almost certainly (though it could be decades away) SHA256 too.

SHA3 is different, despite the name, we might perhaps never find any way to collide it, and if we do it's not MD it's a sponge design, so you can't just add a suffix or the output diverges again.


I don't think that's new either, I remember doing that in 2015, to demonstrate an issue.


There's this, from 2015: https://news.ycombinator.com/item?id=8555079

And colliding SHA-1 in pdf form in 2017: https://news.ycombinator.com/item?id=13723892

Not saying you didn't, but I'm unable to find any earlier reference to making 2 arbitrary, existing pdf files collide with MD5.


I must have done the first link's thing.

But once you can do that, this isn't new. Assuming you can append garbage on a PDF, it's the same solution.

Admittedly I know how easy it is to say, "That's obvious!" in hindsight, but... isn't it?!


It is more complex than just writing past the end of file with arbitrary garbage, as you can with a jpg.

He explains the method here: https://github.com/corkami/pocs/blob/master/collisions/READM...


The fact that he produced something easy to see helps cement in the mind of more people exactly how possible it is. It's a benefit not in the sense that it was impossible before, but rather a benefit in the sense that more people will really believe that it is possible.


I didn't say it wasn't a benefit, I said there is no new innovation here. I was kind of wrong, but only insofar as it uses a different way to embed arbitrary garbage in a PDF instead of directly at the end. The concept is still the same (as it has to be).


Yes it's been done for a long time and it's trivial to do so. It's a standard course project in many computer security courses.


Also relevant, a PDF SHA-1 collider: https://github.com/nneonneo/sha1collider


sci-hub should really switch away from MD5 in their SQL dump if the future wants to be sure to be looking at the same (or same quality) article




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: