Website Monitor. Build your own.

In this era of the internet, the online presence for any business is a must. It helps in getting the world to know about your product and services as well as convert viewers into customers. But imagine your website or some pages are down for any reason and worst of all you don’t even know about it, losing views and customers. So what about having your own utility that continuously monitors your site and notifies you. So let’s build our own website monitor at no cost.

WEBSITE MONITOR DESIGN

I will show how you can make use of the google app script to create a website monitor. Generally, most of the websites have sitemap.xml which has information about all the links your website has. If your site doesn’t have one, it’s better to create as it’s one of the factors for SEO. In order to start the scripting, you need an online google editor. Visit app script home and click on “New Project”. It will open up the editor. Click on the “Untitled Project” and give some good name to your app like “Website Monitor”. Delete the default functions in the editor as we will define our own.

LET’S CODE

The first thing we need is a function that can parse the sitemap.xml provided as a URL and prepares a list of all links present in our site. The function “sitemap” will do this for us. We will use the UrlFetchApp to get the contents and XML parsing classes provided in google app scripts to parse this file.

function sitemap(sitemapUrl) { try { var xml = UrlFetchApp.fetch(sitemapUrl).getContentText(); var document = XmlService.parse(xml); var root = document.getRootElement(); var sitemapNameSpace = XmlService.getNamespace(root.getNamespace().getURI()); var urls = root.getChildren('url', sitemapNameSpace) var locs = [] for (var i=0;i <urls.length;i++) { locs.push(urls[i].getChild('loc', sitemapNameSpace).getText()) } return locs } catch (e) { return e } }
Code language: JavaScript (javascript)

Now as we have the “sitemap” function ready, let’s call it with some sitemap url to get all links in it. We will create another testing function “sitemapTester” that will call “sitemap” function.

function sitemapTester() { var url = 'https://www.screamingfrog.co.uk/sitemap.xml'; mlocs = sitemap(url); for(var x=0;x<mlocs.length;x++) { downOutBoundLinks(mlocs[x]); } }
Code language: JavaScript (javascript)

Now we can see that other than “sitemap”, “sitemapTester” is calling another function “downOutBoundLinks”. This is the function where our core logic will sit. “downOutBoundLinks” will be called for each url we get from “sitemap”. Please keep in mind that webpages represented by each of these links can also have outbound links to other pages which may be of your site or some backlinks to other sites. To check whether these backlinks are up or not is a good idea as it will make sure that our website is not pointing to any other page which  no longer exists or is temporarily down. So we will check for these backlinks too. If you want you can skip them. 

To find out the backlinks we will need to parse ‘<a>’ tags and ‘hrefs’. We will do it using regex. Our implementation of “downOutBoundLinks” goes like this:-

function downOutBoundLinks(currentPage) { var res; try { res = UrlFetchApp.fetch(currentPage, {'muteHttpExceptions': true}); } catch(e) { Logger.log(currentPage); return e; } if(res.getResponseCode() != 200){ Logger.log(currentPage); return; } var html = HtmlService.createHtmlOutput(res.getContentText()); var aTagRegex = /<a\b[^>]*>(.*?)<\/a>/g; var linkRegex =/href=[""'](http|https):\/\/[^"]*/g; var arr = html.getContent().match(aTagRegex); for(var x=0;x<arr.length;x++) { var temp2 = arr[x].match(linkRegex); if(temp2 != null){ var link = temp2[0].split('"')[1]; if(link != null){ var reschild; try { reschild = UrlFetchApp.fetch(link , {'muteHttpExceptions': true}); } catch (e) { Logger.log(link); } if( reschild.getResponseCode() != 200) { Logger.log(link); } } } } }
Code language: JavaScript (javascript)

EXECUTION & RESULT

Let’s execute our code and see what happens. Select “sitemapTester” function in the editor and execute it by pressing play button:-

Once the execution completes( which may take time if your site is large) check the results in View->Logs. It will have the links which are down.

ENHANCEMENTS

So we saw how easy it is to create a website monitor. You can easily extend to make it more useful. Like we have repeated the results above, so you can use a map to store unique results. Also, Google allows a script to run for ~5 min. That’s the reason you see “Exceeded maximum execution time” in the last line of the result, which generally happens if the site is large. You can count the time your script runs and terminate execution gracefully at around 3 min. And further resume from where it left in another iteration. You can make use of the PropertiesService Object provided by Google to store the last link index so you can resume from there. To make it execute periodically, like once in a day you can set triggers also easily from Edit->Current Project’s Triggers->Add Trigger. Want to receive the result daily in a mail? you can make use of the GmailApp class. Hope you had fun reading the article.

Click here to view posts you may like.

Leave a Reply

Your email address will not be published. Required fields are marked *