Screen scrapping

I was listening to a podcast on screen scrapping with Python. Specifically they were talking about Scrapy. Which resurrected thoughts of how I created my Birthday database. I never went into detail about how I started this project. Anywho this was in 2002, a few years before requests (version 1 started late 2012) Beautiful soup (Initial release:2004) or Scrapy (Initial release:June 26, 2008) came into existance. I wrote my own Python 2 web scraper using Python’s urllib. I think requests began life using urllib2. This returned HTML from a url which I had to parse myself. I scrapped a few different sites the merged the data. Someone on the podcast mentioned that it’s not illegal to scrape publicly available data.

At the time I figured no one could own basic celebrity information such as date of birth (DOB) or date of death (DOD). But my biggest remembrance is of how famous at least one of these websites is today. There was also a description of the celebrity which I did not use. All the descriptions were written by me, and are not complete. Through the years I’ve added some on occasion. I always add a description when someone dies. I would imagine it would be much more difficult to crawl through some of these sites today. Not to mention how much more data there probably is now. If I remember correctly these scrapers ran over dial-up modem. I used Name, DOB and DOD.

Another interest I had was older magazine covers (which will remain nameless) I wrote quite a few scrapers to get those for my private enjoyment. I think these images would be much harder to grab today than they were back then.