The self-synchronizing aspect of utf-8 is one of the best things about the encoding, for sure.
But there are a lot of reasons to validate utf-8. The article mentions security vulnerabilities, and as a langsec devotee, I approve of always parsing inputs to prevent that sort of thing.
But even if you're reasonably confident that an invalid string isn't going to power a weird machine in your codebase (and are you, really?), validating is useful, because at that point you can trust algorithms that work on codepoints to actually work on codepoints, and don't have to constantly insert logic to detect malformation and re-synchronize your read, when all you want to be doing is working with your string.
An even simpler reason: It's easy to write something to work on some utf-8, and be quite sure that, for your application, it's going to be fed valid utf-8 or 'close enough'. And then some wire gets crossed and you're feeding it an mp3 or something.
But there are a lot of reasons to validate utf-8. The article mentions security vulnerabilities, and as a langsec devotee, I approve of always parsing inputs to prevent that sort of thing.
But even if you're reasonably confident that an invalid string isn't going to power a weird machine in your codebase (and are you, really?), validating is useful, because at that point you can trust algorithms that work on codepoints to actually work on codepoints, and don't have to constantly insert logic to detect malformation and re-synchronize your read, when all you want to be doing is working with your string.
An even simpler reason: It's easy to write something to work on some utf-8, and be quite sure that, for your application, it's going to be fed valid utf-8 or 'close enough'. And then some wire gets crossed and you're feeding it an mp3 or something.