4 Tools for Improving Online Data Quality

Demonstration of headtrackr.js' utility in improving data qualiy.

It’s great that we can now collect data on fairly complex cognitive tasks online. However, often the data quality is relatively low. There are some ways in which you can improve engagement, and even explain some of the noise. Here I talk about 4 of them, giving examples of each. Click below to skip to any one.

  1. WebGazer.js
  2. Headtrackr.js
  3. Browser Checking
  4. Full Screen API

I have found it extremely helpful that advances in web browsers have made running tasks online (complete with generation of complex counterbalanced and randomised trial information on the fly) possible.

One significant hurdle is getting people to behave as if they are taking part in a research – rather than playing a game on their laptop at home. It became frustratingly apparent in the first pilot study I ran,that people do not pay attention to the task at hand. Even though I could not directly record this, it is clear that people wander off, or go and browse other websites whilst completing your task.

Whilst I can’t control people’s behaviour at home, a good starting step is to measure engagement with your content. You can include this data as a covariate, or even a regressor, in any analysis you complete. You could even just use it to get a measure of how engaging your task is. These engagement measurements are entirely possible in HTML 5 and Javascript, thanks to certain tools:

1. WebGazer.js

This tool was developed by a team at Brown University, and it is beautifully simple to include on any task or webpage. You simply include the library javascript file in your website and add in a HTML tag pointing to it’s location (in this case in the same directory as your HTML file):
<script src="webgazer.js" type="text/javascript">

This will then include a webcam stream in your page, and print it’s predictions of eye gaze location on the screen. It used a regression model to associate the estimated position of your pupil with mouse movements on the screen.

This is not as good as a lab-based eye tracker, but it does give you a rough estimate of where participants are looking on the screen. It also uses an object tracking library to identify the face, so with some modification you can output if a face is present (so you will know if your participant walks a way during your task).

You can see my demo implementation here, which will give you a good impression of the accuracy level of this tool (without any tweaking). I have the last few eye movement samples for each target location in calibration and validation prints out at the end of the demo.

Scan Paths Demo Screenshot


2. HeadTrackr.js

This is a slightly less complex library. It also uses a webcam input, and operates by including in your html. This is intended for creating user interaction by tracking the relative position of the head using a webcam (example), however I have repurposed this for quality control during tasks.

It is easy to include in a webpage, or data collection app:

Include the JavaScript file in your html, which is available on the GitHub Repo:
<script src="headtrackr.js" type="text/javascript">

Then you can write a function in JavaScript to start the tracker, and output data to whatever you want. Below I write it to a variable, that will be saved as JSON at the end of the task.

function startTracker() {

    //make canvas for video, then append to body - using jquery
    var canvasInput = $('<canvas/>', { id: 'inputCanvas'}) .width(320) .height(240).appendTo('body');

    //make video element, then append to head - again with jquery
    var videoInput = $('<video />', { id: 'inputVideo', loop: 'loop', autoplay: 'autoplay' }).appendTo('head');
    var videoInput = document.getElementById('inputVideo');
    var canvasInput = document.getElementById('inputCanvas');

    //setup headtracker instance - no user interface, and sample every 100ms 
    window.htracker = new headtrackr.Tracker({ui : false, detectionInterval : 100}); 
    window.htracker.init(videoInput, canvasInput);
    //start tracking

At the beginning of one of my projects, the user is shown a feed of their webcam, and asked to position themselves within view. The tasks will not begin until a machine-learning algorithm has detected a face in the video stream. This increases the likelihood they are in an appropriate situation for a cognitive task, and subtly encourages them to stay in the same position whist taking part.

Demonstration of headtrackr.js' utility in improving data qualiy.
Screenshot from one of my tasks.

During completion, we are able to record the position of the head along X, Y and Z coordinates (with a cm value). There is also a confidence estimation given of the face being detected at this given point. This is great, as you are able to tell if the participant has left the computer, or moved around during a given trial – this allows us to exclude messy trials!

This library does make a problematic assumption though – that the user is 60cm away from the camera when first initialised. This is not true for all users, as people are likely to assume a range of sitting positions – especially with laptops. To avoid having to rely on this, I simply treat the location data as arbitrary scale, with each trial’s timecourse treated in relation to a pre-trial baseline period. This is similar to how we analyse most pupil size data in psychology, and cognitive experiments.

Baseline recording:

// get baseline head location from last 25 records (2.5s)
lastIns = headDistances.slice(Math.max(headDistances.length - 25, 1));
trialBase = sum(lastIns)/lastIns.length;

3. Browser Checking

Many of these tools are dependant on the browsers in use. Access to the camera – for instance – is made possible with the MediaDevices.getUserMedia() method. This method is not supported by Internet Explorer or Safari.

Greater motivation for controlling browser client is provided by the fact that some are more sluggish than others. This will lead to inaccurate reaction times and poor data quality – something to avoid.

Browser checker example
An example of my browser checker

Restricting browsers is often seen as a bad thing from a user-experience perspective. However, from a data science perspective we must prioritise the integrity of the data. Therefore it is OK to tell our participants what browsers to use. This can be achieved in javascript with reasonable consistency.

For example, this function returns the browser and checks it against an approved list. If the browser is allowed it changes a flag to allow the task to carry on, if not it asked the user to download a different browser

 function browserCheck(){
        //check that firefox 52+ Chrome 49+ safari 11+
        //ask the browser, politely, what it is
        navigator.sayswho= (function(){
            var ua= navigator.userAgent, tem, 
            M= ua.match(/(opera|chrome|safari|firefox|msie|trident(?=\/))\/?\s*(\d+)/i) || [];
                tem=  /\brv[ :]+(\d+)/g.exec(ua) || [];
                return 'IE '+(tem[1] || '');
            if(M[1]=== 'Chrome'){
                tem= ua.match(/\b(OPR|Edge)\/(\d+)/);
                if(tem!= null) return tem.slice(1).join(' ').replace('OPR', 'Opera');
            M= M[2]? [M[1], M[2]]: [navigator.appName, navigator.appVersion, '-?'];
            if((tem= ua.match(/version\/(\d+)/i))!= null) M.splice(1, 1, tem[1]);
            return M.join(' ');

        //compare this to allowed browser list 
        var allowed = ['Firefox 52', 'Firefox 53', 'Firefox 54', 'Firefox 55', 'Firefox 56', 'Firefox 57', 'Chrome 49', 'Chrome 56', 'Chrome 57', 'Chrome 58', 'Chrome 59', 'Chrome 60', 'Chrome 61', 'Chrome 62', 'Safari 11']
        if (allowed.indexOf(navigator.sayswho) >=0){
          //browser is a permitted browser, show continue message and warn about full screen 
          browserFlag = true; 
          ctx.fillText("Your browser is: " + navigator.sayswho + ". This meets the minimum requirements for the task",canvas.width*0.5 ,canvas.height*0.075);
          ctx.fillText("In order to work, this task runs in full screen, you may be asked to allow the website permission to do this, please click yes.",canvas.width*0.5 ,canvas.height*0.125);
          ctx.fillStyle = 'green';
          ctx.fillText("You may click to continue",canvas.width*0.5 ,canvas.height*0.175);
        } else {
          //Ask to download or upgrade your browser 
          ctx.fillText("Your browser is: " + navigator.sayswho + ". This does not meet minimum requirements for the task",canvas.width*0.5 ,canvas.height*0.075);
          ctx.fillText("This task requires Firefox 52+ or Chrome 56+ or Safari 11",canvas.width*0.5 ,canvas.height*0.125);
          ctx.fillText("Please download or update your browser and return",canvas.width*0.5 ,canvas.height*0.175);

4. Full Screen API

This is a very powerful (and simple) tool, especially if you are looking to replicate the display of a screen in a lab setting.

One of the drawbacks of web-based behavioural data, is that there is a visual environment outside of the browser. Pop-ups, emails, messages, other browser windows, and an endless stream of apps – can all lead to disrupting your data. This tool enables you to expand your task to the whole of the user’s screen – reducing distractions to a minimum.

This method also has the added benefit of allowing you to present visual stimuli at the largest possible size – as if you were in a lab setting.

Here’s an example that works using the standard Fullscreen API. It detects, and uses, the correct prefix for the individual browser.

 function GoInFullscreen(element) {
      else if(element.mozRequestFullScreen)
      else if(element.webkitRequestFullscreen)
      else if(element.msRequestFullscreen)

Then when we want to release the user from fullscreen:

 function GoOutFullscreen() {
      else if(document.mozCancelFullScreen)
      else if(document.webkitExitFullscreen)
      else if(document.msExitFullscreen)

You might notice the wording of the API included ‘request’, this means that the page is requesting a fullscreen mode. Browsers will often have a pop-up box that either asks the user if they wish to allow fullscreen, or tell them how to exit. It is vital to explain this to our participants, so they do not accidently block your task from using the API!


To try and maximise the probability of getting good data quality, you should try use some of the above tools. Together I find that they provide a smooth experience for the participant, which in itself leads them to become more engaged, and more reliable when completing your experiment or survey.