Samstag, 19. Februar 2011

Headless HTML page rendering with phantomjs

What is phantomjs?

phantomjs is a headless browser which can render HTML pages into images. It uses a Webkit rendering engine. That's cool because the generated page images aren't missing any dynamically loaded javascript stuff. Event flash movies are shown as expected.

Building phantomjs

The phantomjs homepage contains some very useful hints about getting and building phantomjs. Read the comments at the end of the build instructions page if something doesn't work out for you. There are many useful hints.

Rendering my first page

phantomjs is controlled using javascript commands. You can launch your phantomjs javascript files from the commandline:

$ phantomjs myScript.js

phantomjs calls your script every time after a page is loaded. To react in different ways on different pages you have to track your current "state". You can persist your script's state in the var phantom.state. phantomjs restores the value of phantom.state even after another page is loaded. Other javascript variables disappear after loading a new page. Initially the state is empty.

To load a page call phantom.open(url). phantomjs will load the page below the specified URL and execute your script after the page's DOM is loaded.

The (in my opinion) coolest function is phantom.render(path). phantomjs will save a "screenshot" of your current page to the specified path on your hard drive. The path's file extension defines the image file format. Your viewport's size can be defined by setting phantom.viewportSize.

After you're done with your phantom things you have to call phantom.exit(returnCode) to quit. The returnCode is passed as status to the parent process.

Now putting all that functions together in a script gives us the following:

if(phantom.state.length === 0){
  phantom.state = '0_home';
  phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
  phantom.viewportSize = {width: 800, height: 600};
  phantom.sleep(2000);
  phantom.render('home.png');
  phantom.exit(0);
}

The result after executing the script is a screenshot of www.mini.de in a file named 'home.png'. The image will show the main stage with a flash movie and a footer with three dynamically loaded HTML fragments. You can see the result below:

Clicking links

"Clicking links" like a humanoid user is simulated by firing mouse click events. The following listing defines the clickElement(id) function which can be used to "click" elements. This allows us to execute simple page flows.

function clickElement(id){
  var a = document.getElementById(id);

  var e = document.createEvent('MouseEvents');
  e.initMouseEvent('click', true, true, window, 0, 0, 0, 0, 0, false, false, false, false, 0, null);

  a.dispatchEvent(e);
}

if(phantom.state.length !== 0){
  // save screenshot for every page / state
  phantom.viewportSize = {width: 800, height: 600};
  phantom.sleep(2000);
  phantom.render('screen_' + phantom.state + '.png');
}

if(phantom.state.length === 0){
  phantom.state = '0_home';
  phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
  phantom.state = '1_config';

  clickElement('quicklink_id1');
}
else if(phantom.state === '1_config'){
  phantom.exit();
}

The script will load the URL http://www.mini.de which is the home page. After loading the home page the element with the ID 'quicklink_id1' will be target of a click event. The element with the ID 'quicklink_id1' should be the 'MINI KONFIGURATOR' link in the footer.

Asserting stuff

Within phantomjs scripts you can access you page's DOM, the global javascript variables and global javascript functions. This enables us to do some kind of unit testing.

I'm extending the listing once more:

function clickElement(id){ ... }

function fail(msg){
 console.log(msg);

 phantom.exit(1);
}

function assert(condition, msg){
 if(condition){
  return;
 }

 fail(msg);
}

if(phantom.state.length !== 0){
 phantom.viewportSize = {width: 800, height: 600};
 phantom.sleep(2000);
 phantom.render('screen_' + phantom.state + '.png');
}

if(phantom.state.length === 0){
 phantom.state = '0_home';
 phantom.open('http://www.mini.de');
}
else if(phantom.state === '0_home'){
 phantom.state = '1_config';

 clickElement('quicklink_id1');
}
else if(phantom.state === '1_config'){
 assert(document.getElementById('cake'), 'I am missing the cake!');

 phantom.exit();
}

Check this out: phantomjs returns a status code of 1 when the assertion fails. This can be used to do some custom post processing in the shell:

$ phantomjs myScript.js && echo 'Test successful' || echo 'Test failed'

Further information

For further information see the following pages. They are sorted by what I think is important.