Disclaimer: This is not a Tutorial but just my thoughts while writing a Python program
I have experience programming in a few different languages predominantly in C, Perl, and LabVIEW. Within the past year, I have decided to learn Python because it has an extensive amount of libraries, ability to do OOP, and it’s overall ease of use. Over Christmas, my brothers and I wanted to track a product and notify us when it came back in stock. This problem led me to create a program to do just that.
There are many different services available online to track when websites change and alert you of this changes. The ones I have found impose limits different ways.
– Frequency of Checking
– Vague Alerts
– False Triggers on Random Website Changes
– Limited Number of Website to Polling
My requirements are
– Search a list of websites
– Run as often as I want
– Email alerts of the specific changes
– Filter/Regex the changes
That seems simple enough, and like a great candidate to brush up with Python.
NOTE: This is not an exercise is efficiency, but rather how quickly I can meet the requirements.
Searching a List of Websites
This for me means reading in setup file that has a list of websites and looping on it. Any language can read in a file and do this so let’s go on to something more interesting.
Running as Often as I Want
Sounds like a no-brainer, let’s run the script as a cron job, and have it launch whenever we want.
Email alerts of the specific changes
This one is a little harder. The email part is easy. Just use Gmail and SMTP. The second part is the hard part. How am I going to search a website for a particular field? How am I going to store the data between runs?
I did a bit of googling and that lead me to the lxml python library. This library is able to create an element tree of a website, then navigate to a specific XPath, (we will read that as HTML tag) and return the text within. Score! This is exactly what we want.
Now onto the storing of the data we find. Python can write to files like every other language. I’m just tired of stringing together variables and printing them to a CSV to just have to write something to parse it again. What other ways can I do this? I have been playing with Home-Assistant and Octoprint and they use YAML format for all of their configuration files, while most of the internet uses XML or JSON format. Since I could see this script turning into a Flask app sometime, let’s use an Internet format. Let’s see what python as for printing to JSON files.
Well, that was quick. Python has a JSON library. It contains a method to do dumps of variables which returns a string. The only problem is that it’s not as pretty or as human-readable as I want. Turns out, it has an input to customize formatting!
json.dumps(variable, sort_keys=True, indent=4, separators=(',', ': '))
Ok, printing is always easy, now what about reading back what I just wrote and loading my variables?
variable = json.load(jsonFile)
Man, what have I been doing all of my life? Simple. Easy. Quick. I can get used to python.
Filter/regex the changes
I don’t think that I am using the correct method for this but I am using re.search which returns None if there isn’t a regex match, and the character location if there is. I’m not sure if re.match is better for this but I was able to get it to work.
I ran into some weird issues because I wasn’t sanitizing the HTML inputs. I happened to get a string that was German and had a few non-ASCII characters in it. When my for loop would get to that string, it would freak out and start looping every other variable and I would lose half of my data. Bizarre so I just limited it to [^-a-zA-Z\d\s:] and I got another issue because I was converting the HTML into a string and since its non-ASCII, it would fail. So I imported the string library and used the following lines of code that goes through each title and makes sure its printable.
printable = set(string.printable) filter(lambda y: y in printable, titles)
Overall, this was a great learning experience with python. Python definitely has a ton of libraries that seem really useful. It was also super simple to use. However, I have yet to use its OOP features which is one of the reasons that it is so powerful.