Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've noticed that it correctly splits warm|est, cold|est, bleak|est, but darkest is a single token.

I've also seen it group `?"`, `."`, `!"`, and `.--` into single tokens.

It also splits some words like "Elton" as El|ton. Presumably in that case it has mis-idetified a -ton prefix.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: