Automatic Web page screenshots are commonly used for various monitoring tasks, such as detecting copyright infringement, website defacement, and other legal and security matters. Automatic screenshotting can also be a useful tool for researchers, developers, and journalists. There are free and open-source tools out there that can help you automate the task of creating Web page screenshots.

Screenshot Tools

I’ll briefly go over the following tools: CutyCapt, wkhtmltoimage, Firefox, PhantomJS, and Pageres-CLI, with a bit more focus on the last two.

Some of these tools allow you to specify the user agent, which can give you some control over how the page will be rendered. You can get a very long list of various user agent strings here.

CutyCapt

Download it here. Installation is fairly straight-forward. You do need X11 server running on your machine to use CutyCapt. Alternatively, you would need to run it via xvfb-run, which is not difficult either.

An issue you may notice with CutyCapt is missing some dynamic content that would only be visible if you scrolled down the page. This utility has quite a few features, but scrolling ain’t one of them.

wkhtmltoimage

This tool is part of wkhtmltopdfpackage. It is quick and has a good selection of options. You can install it with sudo apt install wkhtmltopdf on Debian, or get a precompiled package here.

There are some issues getting certain page elements. You will notice some missing graphics in the sidebar and the footer. This is dynamic content that is loaded when you scroll down, which you can’t do in this case.

Firefox

With Firefox screenshot CLI option you don’t really get any options except the --window-size parameter. It is significantly slower than other tools discussed here and it only supports PNG format. With my Firefox 61.0, screenshots are missing all graphics – not sure why.

An interesting option here is that Firefox is supposed to make use of any configured plugins when running in headless mode to take screenshots. It would’ve been more interesting if it actually worked. So why am I including Firefox here? Because I like Firefox and I am hoping its headless functionality will improve.

Pageres-CLI

This utility uses PhantomJS on the back-end to create the actual screenshots. The plus here is you don’t have to write scripts to access many of the advanced functions of PhantomJS.

Many but not all, obviously. There’s this same limitation of rendering dynamic content down toward the bottom of the page. There may be an option for this, but I could not find it.

PhantomJS

This is a scripted headless browser that can actually interact with Web pages and not just take screenshots. Currently, PhantomJS is the tool of choice for browser-based tests for continuous integration, as well as writing all sorts of Web bots.

Because of its extended functionality PhantomJS is a bit more difficult to use. First you need to create the JS file, which is a script telling PhantomJS how to handle loading and rendering the Web page. For the example below I am using this ~/phantomjs_rasterize.js:

To make things easier, PhantomJS comes with a bunch of handy examples of JS scripts that you can quickly adapt to your needs. There’re also plenty of resources, such as GitHub, where you can find just about anything you need.

Unlike previous tools, PhantomJS is able to successfully simulate scrolling and load dynamic content all the way to the bottom of the page (although it did take me a bit of googling and trial-and-error before I got this to work). So, some assembly required, but, overall, this is the best solution for the most accurate screenshots.

Preserving for Posterity

Let’s say you have a list of URLs that you want to screenshot periodically. The following script will loop through your list of links; make screenshots using PhantomJS; add an EXIF comment field; and insert a hidden watermark message using steghide. The latter is not fool-proof, but can be useful if you need to check if the images have been tampered with.

And, if you need to extract the watermark message, use this command:

Comparing Screenshots

The ability to compare two or more screenshots may come in handy at some point. Probably the best tool for this is ImageMagick. In the example below I took two screenshots of the same URL some time apart. Running the following command produced a diff image:

Here are the two screenshots and the resulting diff image, showing what changed:

As you can see, the highlighted changes are due to various dynamic content, such as my Twitter feed, for example. It is possible to configure the compare utility to look for difference only in a particular area of the image. In this case, I would like it to see if there were any changes in the main body of the post, while ignoring changes in dynamic content.

Leave A Reply

Please enter your comment!
Please enter your name here