Go – Strings, bytes, runes and characters

We’ve come a long way since 8 bit ASCII and EBCDIC. It was an issue with Julia too. Every modern language has to deal with International characters. I guess I need to dive into it because thanks to the Internet and other languages a character is not necessarily one byte. Go Info here. Which also pointed to a post… here which is titled “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” which was written in 2003, so that sounds like a good place to start. However he does say “EBCDIC is not relevant to your life.”…that’s what he thinks! I guess I’ll find out what a rune is.

To continue after reading the 2nd article mentioned above. This tidbit is why my Julia program had no problems with ASCII, but did with EBCDIC. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.

Having a Go…with Go

Golang…baby steps. Go sounds cool and Golang sounds wordy. Go is too common IMHO. However since Go comes from Google it probably helps in googling the language. Perhaps a stronger assumption you’re googling about the language.

Playing with Go. Reading/writing files. Learning enough to be dangerous.

Tumbleweed network activity lie

So in addition to Tumbleweeds odd way of graphing left to right at the beginning, then switching to right to left as all my default monitors do. It also sometimes shows download activity when I’m disconnected from the network. As shown below. In this case I also physically disconnected the Ethernet cable. If you fullscreen this you can see a red x on the network icon. Also at first glance it looks like a big download because it fills the graph. But that’s because the graph is adjusted to the highs. The actual download rate is shown as 242.0 B/s.

Learning programming languages

Most tutorials start with variables and types, loops, if statements you know…the basics. But for me I like to move quickly to reading/writing files, so that usually implies at least some type of loop also. Because that is where I’m most likely to need to use the other basics of programming. I also like to know how to comment my code. I’m not that likely to ask for console input and need to do much processing with that. I’m more likely to need to manipulate or get some information from a file. So for a rather simple example, maybe I want to look for a something in a large file. So I’ll need to know some string handling commands and conditionals (such as if) to see if I found the string. So that’s where I usually like to start. Simply reading an input file and writing out the same file. Once I know that…I’ll learn other features out of necessity.

Python PDB

Was listening to a recent Python podcast and some lady had done a talk somewhere about Python debugging…outside of print statements. She asked how many people used debugging and found not that many. I used PDB in Python many years ago. Although not for a long time, mainly because of debugging is so easy with VS Code. I probably have some old version 2 programs with the import statement probably still in there.

I’m basically an amateur Python programmer who uses it for personal projects. I assumed professionals used debugging as a matter of habit.

Linux PDF problem

These days, I rarely have a problem doing anything on Linux. However I had to fill out some legal documents. They were editable PDFs. I was able to fill them all out except one. Ocular reported that I needed a newer version of Adobe Reader. And also “This document has XFA forms, which are currently unsupported.”. Adobe no longer supports Linux. A PDF reader on my phone said it wasn’t an actual PDF and something about XML. Anywho I was able to fill it out using Adobe on my android phone. It would have tested the limits of my patients to type this on my phone. So I answered all questions using a linux text editor. And pasted the responses into the document on my phone.

Julia binary data gotcha!

One more time on this topic.

After the following code runs, as expected the file is created of 128 (0-127) bytes in length! However that is not true if you increase the upper bounds of the for loop over 127. And the Invalid message never appears!

fo=open("binary.fil", "w")
for i=0:127
local oneByte
oneByte=Char(i)
ln=length(oneByte)
if ln>1
println("Invalid")
end
write(fo, oneByte)
end
close(fo)

The following code however…works. And creates a 256 (0-255) byte file, X’00’-X’FF’.
>>>>>>>>>>> Note how oneByte is defined different <<<<<<<<<<<

fo=open("binary.fil", "w")
for i=0:255
local oneByte
oneByte=UInt8[i]
ln=length(oneByte)
if ln>1
println("Invalid")
end
write(fo, oneByte)
end
close(fo)

Julia binary EBCDIC data

This problem hit me a year ago and I kind of just pushed it aside. They say to step away from a problem for a while if you’re having problems solving it. In this case I stepped away for a year. And by step away I mean stepped away from Julia. Instead I switched to Python 3, previously most of my Python was in version 2. So the time was still productive. I hadn’t really tried to solve it till today. The problem popped in my mind and I thought “I should look into this more”, because at one point I really liked Julia but stopped using it mostly because of this problem. Here is where I wrote/complained about it. I was trying to convert ASCII/EBCDIC characters. In EBCDIC the letter ‘B’ is x’C2′.

So lemme try to splain the issue the best I remember. For years in the PC ASCII and IBM EBCDIC world a byte was a decimal byte value of 0-255. However in the print/display world ASCII only used characters with a decimal byte value of 0-127. So Unicode represents anything above that as 2 or more bytes. However in the EBCDIC world some print/display characters are above the decimal byte value of 127, but a few are below the decimal byte value of 127.

Personally I find EBCDIC a little strange. Unlike ASCII the alphabet isn’t defined in consecutive bytes The lowercase a-i are consecutive decimal byte value of 129-137, then a gap then j-r are consecutive decimal byte value of 145-153, a similar pattern is used for s-z . This strange grouping is similar for the capital letters A-I, J-R and S-Z.

So the Julia REPL says…

julia> ebcdic=Char(0xc2)
‘Â’: Unicode U+00C2 (category Lu: Letter, uppercase)

julia> length(ebcdic)
1

julia>

However if you write EBCDIC out, it writes 2 bytes…0x0381. I should say if you want to write some EBCDIC out, it’s 2 bytes. Because some EBCDIC characters are one byte!
I think the length function is very misleading! Or is it? Maybe it is one byte but is written as 2 bytes on output.

Also I’m still not claiming to fully understand all the ramifications of character sets, encodings and Unicode. I just found a solution to my particular problem.

Solution: One way to properly write a one byte EBCDIC character in Julia is

ebcdic=Char(0xc2)

oneByte=UInt8[ebcdic]

write(fo, oneByte)

I found (after I solved my problem) this very interesting, as it is very much related. There was a reply from ScottPJones author of the Julia Strs package. Which may be interesting to look at down the road. He also said “That was my problem with Julia from the start with the lack of any “BinaryString” type”.