Watson-Glaser a critical friend

By Ian Florance, Independent Consultant

Every test user has a tool kit: a range of preferred instruments that can be adapted to the particular needs of most clients. New instruments get added, but less often than we sometimes think. Research I commissioned a while ago suggests we tend to stick with the tests we originally train on, even if our preferred tests are unfit for purpose.

I started using the Watson-Glaser Critical Thinking Appraisal six years ago and now I use it in any project involving roles from base clerical staff to senior management. I also recommend it to organisations ringing up to seek advice on these issues.

Why do I find it so useful? Lots of reasons, but the most obvious one is ...
...it measures the core of what people in modern organisations do.

Think about how much we have to take in, digest, analyse and decide in any one day at work. At school we struggle to learn French conversational phrases or the capitals of African states over many weeks. Any one day, by contrast, demands that we understand and take hundreds of decisions about areas from finance and product design to the problems of the relationship between members of a team.

And that's what Watson-Glaser addresses - our ability to:

  • define a problem;
  • discriminate the information important to solving a problem;
  • recognise assumptions;
  • create and select hypotheses;
  • draw valid conclusions and check whether inferences are valid.

    Put simply, I'm using critical reading in researching this article, you're using it in reading it and if you start thinking about whether to use this test. Goodwin Watson used it in 1925 and Edward Glaser in 1937 when they were developing the test. And we'll use it outside work in evaluating newspaper articles, our children's excuses for being home late and the rationale for a plumber's invoice. Critical reasoning is at the core of 21st century human enterprise.


    There are two further points to make about the place of critical reasoning in organisations.

    It's getting more important

    The digital information flood makes it even more crucial that we ignore irrelevant information and stringently check the evidence we're presented with. We don't know the status of much information on the internet and certain ‘facts’ and assumptions get reproduced and distorted until they become the business equivalent of urban myths. We need critical faculties to a greater extent than in an information-starved age.

    It gets more important the more senior your role

    Leadership and senior management are not about knowing everything. They’re about using analytical and judgement skills to evaluate other peoples’ highly informed arguments in specific areas. Watson-Glaser measures leaders’ ability to do this.

    These are the obvious reason why I use and recommend Watson-Glaser so much. In marketing terms, they're the test's benefit. But it also has some good features as well.

    The report is excellent

    20 years ago automated test reports looked like insurance policies – and were about as comprehensible. Since then test publishers have spent millions creating reports which genuinely communicate with different audiences. Some personality test reports are as long as, and as illustrated as, guides to major cities.

    Like most ability test reports, the Watson-Glaser one is short – only 2 pages – and to the point. But it’s well-designed, the full colour graphics are good and it provides a short non-technical description of what the assessment measures.

    In other words, it provides me with the information I need but I can share it with clients and test subjects without having to spend hours explaining the basics or overcoming misunderstandings.

    It be used with other tests

    In combination with a good personality test, the Watson-Glaser is ideal for leadership recruitment and development, as well as for succession planning, talent identification and development and selection to jobs which are information rich.

    Used with high level numerical reasoning tests (such as RANRA) it can contribute to the identification of technical managers, as I found recently in a short deadline exercise to select a senior technical manager for a TV company.

    It’s technically good

    This should go without saying. I assume that Getfeedback preselect tools and quality assure them.

    Given new ways of gathering and analysing data, driven by IT advances, there’s no excuse for contemporary test publishers to produce tests that aren’t technically robust – and even less excuse for not laying their technical cards on the table in an exhaustive manual, so I can see if it does what I want.

    Watson-Glaser’s age is, in many ways an advantage. If publishers have revised and updated older tests regularly, it’s a good reason for keeping to the tried and trusted, a tendency I mentioned at the beginning. The depth with which the concepts and scales have been investigated and the amount of data collected makes an older test particularly rich.

    Practice in interpretation makes our insights subtler and sharper.

    Having forty differentiated norm groups helps set cut off scores, always an issue with ability tests.

    Good practice and process

    Getfeedback only allow Watson-Glaser to be used in proctored environments. This reflects an overall view – that personality tests are easier to use unproctered, ability tests more difficult – though not everyone’s practice. This requirement highlights an increasingly important issue: process.

    For some years good test practice has been based on a combination of a well-trained user and an appropriate and robust test. Given on-line administration, the emphasis in test user training has moved to interpretation away from administration. Some of the process issues that informed paper and pencil test use have fallen by the way side, risking less accurate measurement and challenges to decisions. Many professional and accrediting bodies are seeking to solve this problem looking to quality mark testing processes as a whole, rather than training and test quality separately.

    The rush to let anyone take a test anywhere is understandable in terms of return on investment and absolute costs. It has helped make testing more available and accessible; but it has brought some problems in measurement. That Getfeedback only allow Watson-Glaser to be used in a proctored way goes against this trend but for extremely good reasons.


    Writing this has highlighted, in my mind, a shift in how we evaluate tests. In the 1970s, when I started in the sector, a ‘good’ test was still defined in fairly technical terms relating to aspects of reliability and validity. In my mind, there was a tension between these seemingly absolute measures of quality and our understanding that tests are purposive: they are good for particular tasks and not for others.

    These technical quality requirements are still important but in choosing the best test for the job, but as testing has become a core business function, new requirements have arisen and certain emphases have changed. So, in evaluating a test, I’d add a number of items to my informal checklist.

    • Will people taking the test see it as relevant to them and the job? People are more test-aware and critical then thirty years ago. Generation Y demand reasons and answers. They don’t do a test just because we say it’s OK.
    • Equally, does the report communicate to everyone it must without hours of preliminary talk? Can I show it, if necessary, to the purchasing manager and test taker and be sure they’ll understand it? To do this any report has to be well-written, designed and organised carefully.
    • Tests can’t be evaluated outside the process you use them in. So it’s now down to me not just to decide whether the test is good enough in itself but whether it will work with a hundred people, on-line, over a thirty day time scale or in an intensive assessment centre over one day with five people.
    • On-line tests still need good manuals as much as printed ones. Publishers shouldn’t fob us off with pretty graphics and a one page pdf.
    • Answering the ‘So what?’ question. Some tests describe someone then make their excuses and leave. I have to decide what to do about the results. This depends on the fact that what’s measured and how it’s reported really taps into a job, an organisation and business practice.

    Watson-Glaser ticks these boxes.

    Mental ability testing is less ‘sexy’ than personality testing. It seems more ‘nuts and bolts’. In particular, most of us tend to use mental ability tests with more junior job seniority: the more responsible the job, the less mental ability tests we use.

    There are some good reasons for this: leadership success, in my experience, depends on behaviour and you can get some idea of leaders’ intellectual ability by their track record. There are also bad reasons: leaders often look at you sadly or angrily if you suggest a mental ability test: they’re too senior to be ‘examined’.

    Watson-Glaser covers the full range of ‘thinking’ occupations and perhaps suggests a way forward in applying what we know about ability to more senior roles.

    As I know from experience, plenty of other people have tried to create an improved competitor for the test. And, as far as I know, no one has succeeded. It’s a classic and, unusually, one that retains its absolute relevance.