Julia binary EBCDIC data

This problem hit me a year ago and I kind of just pushed it aside. They say to step away from a problem for a while if you’re having problems solving it. In this case I stepped away for a year. And by step away I mean stepped away from Julia. Instead I switched to Python 3, previously most of my Python was in version 2. So the time was still productive. I hadn’t really tried to solve it till today. The problem popped in my mind and I thought “I should look into this more”, because at one point I really liked Julia but stopped using it mostly because of this problem. Here is where I wrote/complained about it. I was trying to convert ASCII/EBCDIC characters. In EBCDIC the letter ‘B’ is x’C2′.

So lemme try to splain the issue the best I remember. For years in the PC ASCII and IBM EBCDIC world a byte was a decimal byte value of 0-255. However in the print/display world ASCII only used characters with a decimal byte value of 0-127. So Unicode represents anything above that as 2 or more bytes. However in the EBCDIC world some print/display characters are above the decimal byte value of 127, but a few are below the decimal byte value of 127.

Personally I find EBCDIC a little strange. Unlike ASCII the alphabet isn’t defined in consecutive bytes The lowercase a-i are consecutive decimal byte value of 129-137, then a gap then j-r are consecutive decimal byte value of 145-153, a similar pattern is used for s-z . This strange grouping is similar for the capital letters A-I, J-R and S-Z.

So the Julia REPL says…

julia> ebcdic=Char(0xc2)
‘Â’: Unicode U+00C2 (category Lu: Letter, uppercase)

julia> length(ebcdic)
1

julia>

However if you write EBCDIC out, it writes 2 bytes…0x0381. I should say if you want to write some EBCDIC out, it’s 2 bytes. Because some EBCDIC characters are one byte!
I think the length function is very misleading! Or is it? Maybe it is one byte but is written as 2 bytes on output.

Also I’m still not claiming to fully understand all the ramifications of character sets, encodings and Unicode. I just found a solution to my particular problem.

Solution: One way to properly write a one byte EBCDIC character in Julia is

ebcdic=Char(0xc2)

oneByte=UInt8[ebcdic]

write(fo, oneByte)

I found (after I solved my problem) this very interesting, as it is very much related. There was a reply from ScottPJones author of the Julia Strs package. Which may be interesting to look at down the road. He also said “That was my problem with Julia from the start with the lack of any “BinaryString” type”.