Julia string indexing problem part 2

So, before opening a bug report, I thought I’d do more research on the Julia string indexing problem. And found that it’s not a bug to Julia. Although I’m not sure about Julia’s behavior in the matter. The actual error I got was “StringIndexError” and had nothing to do with my index being outside legal bounds. It is related to Unicode and UTF-8 though. The problem is that the byte that the next character begins at isn’t necessarily one greater than the previous byte! I assumed that each byte, from decimal 0-255 took one byte to store. But noooo…silly me. If I ask Julia to store the byte Char(0xc2) Julia will use more than one byte to store it. But why can’t I store just 1 byte in Julia? Also, if I’m reading a file that has that byte‚Ķit is one byte, a fact a simple hex editor will prove! Julia has advanced beyond the concept of a byte?

I remember shades of this problem when I wrote a EBCDIC to ASCII hex dump program in Julia. It about drove me crazy.

At this point I’m beginning to think It would be less work to write in Python than trying to wrap my head around Julia’s quirky (IMHO) way of handling non ASCII characters. The simplicity I liked of Julia’s one based arrays and slicing is outweighed by the complexities of it’s Unicode and UTF-8 handling. Especially when I’m trying to write a quick and dirty so I can move forward in my static file generator adventures. You know sometimes a 0xC2 is simply a one byte binary 194, no more, no less…everything isn’t Unicode!

Join the Conversation


Leave a comment

Your email address will not be published. Required fields are marked *