Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here, in case anyone wants to question, abuse, argue, etc.


Thank you! I' m very excited to test in my projects ( I found a working Python extension here https://github.com/shenfe/python-hyperscan)

Let me seize the opportunity. I have a problem where I need to match multiple person names (hundreds of thousands) in huge texts. Aho-Corasick works good for exact matches. Could HyperScan works for approximated matches?


We have both Levenstein and Hamming distance parameters. Do note that putting large distance numbers is not expected to perform well (i.e. matching "Smith" at Levenstein distance = 3 won't be a happy experience).


Have you tried some existing Python libraries which support fuzzy searching, such as regex and fuzzysearch?


Yes. Both grow the complexity linearly/exponentially based on the number of patterns to be searched.


Ah. You should likely use a different algorithmic approach.

If you add contact details to your user info I'll be happy to get in touch and help.


I ported Hyperscan to ARM several months ago with help from the following project: https://github.com/nemequ/simde

Would you be interested in working together to get an ARM port that's a bit cleaner and author-approved?


I understand this possibility, but it's not the way I would do an ARM port in an ideal world.

Being the original designer of Hyperscan isn't much help; I don't work at Intel any more and wouldn't have all that much clout in trying to get an ARM port into the codebase.

I'm thinking about building a followup regex matcher - considerably smaller scale, and starting with a Hyperscan code base for the tools/infrastructure/boring bits - ("ure3"). This would definitely have ARM as a first-class architecture. Anyone 64-bit and little-endian can play, I would think (legacy stuff like 32-bit, etc can go somewhere else).

I just with the ARM guys would get their shit together and support their own much ballyhooed SVE. The idea of announcing something in 2016 and not supporting it in a server chip that's due for mid-2020 is... novel.


Does the ARM version hyperscan work well? Can you describe the detailed steps? Thanks


The reference to "Australian-free" has piqued my curiosity. Do you care to elaborate? In my imagination you had an Australian collaborator that you fell out with and have been busy expunging all "Australian" code from the project!


I am Australian, as is the original technical team for Hyperscan. As of March 2018, no member of that team is still at Intel.

Expunging Australian code from Hyperscan would be a non-trivial task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: