Juypter Notebooks in VS Code

Accidently opened a Juypter Notebook file with VS Code and it worked! I can run individual code cells just like in the browser!

Similarly I opened a Python file that needs the Anaconda environment because that’s currently the only place I have pandas installed, and as long as the interpreter is correct…it works also!

Pandas loc and trying what you learned

Watched YouTube video that showed the loc method to locate a value in your data.
df.loc[df[“column”]==”string”])

However when I tried it on a first name field that I knew was there…it didn’t find any. My data had trailing spaces and I found that if I added trailing spaces to the string (“string “) it found the name.

But that’s not a good way to handle the problem. I knew I had to get rid of the trailing spaces. Solution…use the strip() function. I could have used rstrip, for trailing spaces but strip removes leading and trailing spaces. Perhaps rstrip would be slightly faster?
df.loc[df[“First”].str.strip()==”string”]

This is because I originally formatted the data fixed length with no delimiter to be copied to a mainframe. Which also takes up a lot more space. I really need to reformat it delimited only with no padding!

This is how you learn. By trying what you learned. And when it doesn’t work…you’ll learn even more figuring out why!

Bye Bye Pycharm

It’s back to VS Code. Somehow a Python program I was working on with PyCharm in one of my Python subdirectory got saved to my home directory. That’s strange. That never happened in all the time I used VS Code. Oh well I’ll just move it back to where I keep almost all…if not all my Python programs. However now, after opening it in Pycharm and attempting to run it, it basically it tells me it can’t find it in my home directory. Well there’s a good reason for that. It’s not in my home directory! It’s in the Python subdirectory I opened it with PyCharm from. I don’t know what dumb ass explanation, that probably makes sense if you graduated from PyCharm U, as to why that happened, and quite frankly…I don’t care. Evidently all the basic file handling rules I’ve learned through the years, go out the window if you use the super advanced Pycharm. That’s insane. I want to spend my time right now concentrating on learning pandas not PyCharm. I’m certainly not Charmed!

Udemy Data Analysis with Pandas and Python course

Well predictably my progress in my Udemy Data Analysis with Pandas and Python course is slower than I’d like. Mostly as I previously said, the video doesn’t work properly. And I’m at this location ~11 hrs a day 5 days a week. It works fine on Linux Mint and Manjaro on my home computer, so I assume it’s a Mate problem. Never fear there are many good Data Science videos on YouTube. I’m not having a video playback problem on YouTube. Also it allows me a little time to play with PyCharm. Although I may switch to Atom on the older computer because PyCharm is noticeably sluggish on it.

Pandas and a large file

I had to satisfy my curiosity and use Pandas on something substantial. So I did a read_csv on a million record file. Well IMHO…it’s fast. On my decent computer…not the fastest, by any stretch. The dataframe info, returned the following in the blink of an eye. Likewise a simple sum was instantaneous.

I also did something much more taxing…a sort_values on Last Name, First Name and it was noticeably slower (~1 sec) but IMHO, impressive considering it wasn’t an indexed SQL file…see the end for my 1st use of the timeit magic command as it was described.

df.info

<bound method DataFrame.info of Account Code1 Code2 Gender Prefix First \
0 4864130159876517 2 C M Mr Cameron
1 4029852595634794 1 B F Kamilah
2 4689177385753112 1 F M Mr Odis
3 4304237478464178 5 F M Stephan
4 4821479510829505 3 G F Angle
… … … … … … …
999995 4193458599551172 5 F M Mitchell
999996 4716923127249654 5 C M Mr Kendrick
999997 4818979260696413 3 F M Bernardo
999998 4118908054242008 1 B F Cardinal Celine
999999 4838239144084666 3 E M Mr Hyman

             Middle                  Last      Suffix       Birth  \

0 Milford Garza 1971-08-16
1 Raina Perkins 1983-12-16
2 Elias Shepherd 1969-02-06
3 Hayes DPM 1977-03-21
4 Cleora Huffman 1955-08-15
… … … … …
999995 Barron OD 1957-01-28
999996 Antwan Hickman 1968-11-28
999997 Kraig Newton 1996-02-07
999998 Shaunte Fry 1973-10-26
999999 Max Kennedy 1981-05-21

        Enroll  Amount                    Address                  City  \

0 1997-01-12 76.56 73 Piper Townline Whitlash
1 2002-02-28 61.56 56 Jean Avenue Johnson
2 2020-04-24 37.69 746 Spruce Alley Haverhill
3 2006-05-11 26.50 1108 Graham Bypass Cayey
4 1994-08-24 52.44 1210 Howth Parkway Locust Gap
… … … … …
999995 1976-11-22 61.76 623 Merrie Row Saugus
999996 2000-10-30 65.00 300 Vicksburg Nene Oxford
999997 2018-12-17 74.85 747 Chabot Circle Palmer
999998 1991-05-20 68.98 765 Bernal Heights Nene Manlius
999999 2008-10-25 90.13 149 Incinerator Turnpike Morristown

   State    Zip  

0 MT 59545
1 VT 5656
2 NH 3765
3 PR 633
4 PA 17840
… … …
999995 MA 1906
999996 MI 48370
999997 NE 68864
999998 IL 61338
999999 TN 37816

[1000000 rows x 16 columns]>

df[“Amount”].sum()
50502302.91999999

%timeit

%timeit df.sort_values([“Last”, “First”])
967 ms ± 36.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Pycharm

Another diversion. I’ve been using MS VS Code for a while and found it a decent IDE. However after a long break going to try PyCharm again. One thing I didn’t like about it before was I felt it was sluggish, especially on first use. That was back on my older system with 4GB memory. However at the time 4GB wasn’t terrible. It no longer feels sluggish on my Ryzen 16GB system. I was reminded about PyCharm again while listening to the latest Destination Linux podcast.

VS Code Pros:

  • Works with many languages.
  • Including Julia which I was using at the time, but not currently.
  • If I remember there was a Atom enhancement for Julia. If/when I choose to give it another try, and I probably will!

VS Code Cons:

  • It’s Microsoft. I don’t like simply saying because It’s Microsoft…that’s too easy. I’ve talked about MS before and I don’t like to jump on the let’s all dislike MS boat. I’ve been using Linux a long time and stated my reasons elsewhere. Today it’s more about not trusting a large company with a past of trying unsuccessfully, to kill Linux. An OS that has helped me to continue to learn about computer technology…free of charge.
  • I hate that it wants to be my everything editor. For regular text files I prefer a simple text editor! I’m guessing this is probably a easy fix. I briefly unsuccessfully tried a few things. I hated VS Code opening my text files where a long line would just continue off the screen and I would have to use the scroll bar to see the whole line.

Learning distractions

The below post shows how easily I can become distracted from my primary task…learning Python pandas! Because of this course I wanted more data to play with than the course provides. Sure it works well with a few thousand records but how would it fare with a more realistic larger dataset? So I decided to use a big data set that I created with my create random customer Python program, that I created a while ago to get a feel of how SQLite would handle bigger data. And to practice SQL selects. However my data didn’t have any numeric fields to practice pandas math routines on. I had added an amount field to my Julia program. but it no longer works. So I added it to my recent conversion of my Python 2 to Python 3 program. Well somewhere in the middle of the changes for Python 3, I learned about the Python “mimesis” package. I don’t remember where I first heard about it. Maybe a Python podcast, it was talked about in at least two. Anyway it all goes towards learning but my pandas progress slowed down a bit. But unless I satisfied my question about how pandas worked with real data it would be hard for me to maintain my enthusiasm to learn.

I mean really what was I thinking? A popular package used every day by data scientists around the world…and I’m wondering if it’s been tested. Still, sometimes you have to do something, just for your own satisfaction…

I figured it out!

The following is what I assume any professional Python developer is already well familiar with. However I’m usually familiar enough with the few packages I import for it not to be a problem. So it’s not something I’ve used in years.

I wanted to replace my account number create routine with the fake credit card number available in mimesis. By googling, I saw examples of it on the internet. However all the ways I tried failed. Supposedly credit_card_number(CardType…) was available as…

from mimesis import Person
person=Person()
ccn=person.credit_card_number(CardType…)

or…

from mimesis import Personel
personel=Personel()
ccn=personel.credit_card_number(CardType…)

Next I started guessing from what I knew…

from mimesis import Business
bus=Business()
ccn=bus.credit_card_number(CardType…)

I also tried Numbers like I did above with Business

All of these failed! I was just about to send an email asking for help, which I’d rather not do, if possible, when I remembered that Python has a way to expose their methods using “__dict__”. I had to google it…but I remembered!

import mimesis
for ls in mimesis.__dict__: print(ls)

I spotted Payment from that little bit of code…that’s probably it I thought! So from there I tried Payment…

from mimesis import Payment
for ls in Payment.__dict__: print(ls)

and I found it! So the solution (as of today) is…

from mimesis import Payment
pay=Payment()
ccn=pay.credit_card_number(CardType.VISA)

Udemy exercise 5

Coding exercise 5 after lecture 49

Udemy Exercise

The actual problem is above. My answers are below the last 3 comments.
I was sure my answer was correct however it was flagged as wrong!
Another student named Sridivya had a similar problem The instructor replied…


Boris Paskhaver Boris — InstructorAnswer 2 months ago
The code is looking for you to use the square bracket syntax instead.


Really? Where does it say that?
As you can see my answer actually used both methods to show I was paying attention.
My last answer used the square bracket syntax, which I used to show I was aware of both methods.
Nowhere does it say to use the bracket method!!!!
Also despite what they said my answer was NOT wrong.
Because you can easily type it into a Jupyter notebook and test it…and I did…and it works…and most importantly it was just taught!
He said his preferred way was to use brackets. But the Coding exercise didn’t say do it the way the instructor prefers.

In the grand scheme of things it’s a small complaint. My first after almost 50 steps. The course has been very good!