I've also seen it group `?"`, `."`, `!"`, and `.--` into single tokens.
It also splits some words like "Elton" as El|ton. Presumably in that case it has mis-idetified a -ton prefix.
I've also seen it group `?"`, `."`, `!"`, and `.--` into single tokens.
It also splits some words like "Elton" as El|ton. Presumably in that case it has mis-idetified a -ton prefix.