Usability - Evaluation Testing and Summary
Understand the spectrum of usability evaluation methods, how to apply them throughout development, and the key metrics for measuring usability performance.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
Which four components does GOMS use to analyze user interaction?
1 of 16
Summary
Evaluation Methods for Usability
Introduction
Usability evaluation is a systematic approach to measuring how well users can interact with a design. Rather than relying on intuition or assumptions, evaluators use structured methods to understand whether products are easy to learn, efficient to use, and satisfying for users. This article covers the major families of evaluation methods, each of which plays a different role in the design lifecycle. Some methods predict usability through cognitive models before any testing occurs; others involve expert inspection; and still others require actual users to perform tasks while researchers observe.
Understanding these methods helps you choose the right evaluation approach for your design phase and goals.
Cognitive Modeling Methods
Cognitive modeling creates computational models based on psychological principles to estimate how long users will need to perform a task. Rather than asking real users to perform a task, these methods let designers predict performance by simulating human cognition. This is particularly valuable early in design when a working prototype doesn't yet exist.
GOMS (Goals, Operators, Methods, Selection Rules)
GOMS is a cognitive model that breaks down user interaction into four hierarchical components:
Goals are what the user wants to accomplish (e.g., "save a document")
Operators are the basic physical or cognitive actions a user performs (e.g., "press key," "move mouse," "retrieve from memory")
Methods are sequences of operators that achieve a goal (e.g., "type filename, then press Enter")
Selection Rules determine which method to use when multiple options exist
By mapping out a task using GOMS notation and assigning realistic time estimates to each operator, you can predict how long an experienced user will take to complete a task. GOMS works best for routine, repetitive tasks where users follow consistent procedures.
Why use GOMS? It's quick, requires no user testing, and provides quantitative predictions. However, it doesn't capture learning curves for new users or unexpected difficulties.
Human Processor Model
The Human Processor Model takes a broader view, modeling how human cognition works during interaction. It represents the human mind as an information processor with distinct subsystems: perception, cognition, and motor action.
The model works like this: information enters through the perceptual subsystem (senses → perceptual processor → visual and auditory image storage). The information then flows to the cognitive subsystem (working memory, long-term memory, and cognitive processor), which interprets meaning and decides on actions. Finally, the motor subsystem executes movement responses.
Each subsystem has measurable parameters:
Cycle times: How long each processor takes to complete one cycle (roughly 100 ms for cognitive cycle)
Decay times: How long information persists before fading (visual: 500 ms; auditory: 1500 ms)
Capacities: How much information can be held (working memory: 7 items)
By understanding these parameters, designers predict bottlenecks. For example, if an interface requires users to hold more than 7 pieces of information in working memory, performance will degrade. Similarly, if important visual feedback disappears in under 500 ms, users may miss it.
Why use the Human Processor Model? It reveals fundamental cognitive limits and helps predict where design will cause problems. Like GOMS, it doesn't require user testing, but it requires careful mapping of the task flow.
Inspection Methods
Inspection methods rely on expert judgment rather than user testing. An expert evaluates the interface against established criteria, looking for potential usability problems.
Heuristic Evaluation
Heuristic evaluation is a structured expert review where a small group (typically 3–5) of usability professionals independently examine an interface against a set of usability heuristics—principles that represent good design practice.
The most common framework is Nielsen's Ten Usability Heuristics:
System visibility and feedback (users should know what's happening)
Match between system and real world (use familiar language)
User control and freedom (support undo/redo)
System consistency and standards (follow conventions)
Error prevention and recovery (prevent problems; help users recover)
Recognition vs. recall (minimize memory load)
Flexibility and efficiency (offer shortcuts for experts)
Aesthetic and minimalist design (remove distractions)
Help and documentation (provide clear, task-focused support)
Error messages (plain language, suggest solutions)
Each expert evaluates the interface independently, then results are compiled. This approach is fast, inexpensive, and requires no real users—but it identifies problems experts think users will encounter, which may differ from actual user behavior.
When to use heuristic evaluation: Quickly, early in design to catch obvious problems before user testing.
Prototyping and Early-Stage Methods
Rapid Prototyping
Rapid prototyping quickly creates low-fidelity models—such as paper sketches, wireframes, or clickable mockups—to test design concepts without building the complete system. A low-fidelity prototype might be as simple as screenshots arranged to show task flow, or paper cutouts representing interface elements that a facilitator moves as users give commands.
Why rapid prototyping? It's cheap to create and destroy, allowing designers to test multiple directions quickly. Even rough prototypes reveal whether core concepts make sense to users. Fidelity can increase as the design matures and confidence grows.
Structured Evaluation Methods
Cognitive Walkthrough
A cognitive walkthrough is an inspection-like method, but instead of checking against heuristics, evaluators walk through a prototype or product as if they were a user learning the system for the first time. The focus is specifically on the user's decision-making process at each step.
For each action a user must take, the walkthrough team asks:
Does the user know what goal they're trying to achieve?
Can they find the right action or control?
Will they recognize that the action they found is correct?
If they perform the action, will they understand the feedback?
The cognitive walkthrough simulates the user's thought process, uncovering where confusion or wrong assumptions arise. It's particularly effective for evaluating whether new or infrequent users can learn a system by exploration.
When to use: When learning ease is a priority, especially for self-service systems where users won't get training.
Benchmarking
Benchmarking creates standardized test materials and rigorous experimental protocols to measure core usability metrics:
Task completion time
Error-fixing time
Learning time (time to reach proficiency)
System functionality coverage
A benchmark study establishes baseline measurements for current designs. New designs are then tested against the same tasks under the same conditions to determine relative improvement. Benchmarking follows the rigor of psychology laboratory experiments—controlled variables, consistent task definitions, and statistical analysis.
Why benchmarking? It provides quantitative, comparable results. You can prove that a redesign is actually better, not just different. However, benchmarking is expensive and time-consuming, so it's typically used for high-stakes products or when iterative improvements must be validated.
Persona Development
Personas are fictional characters representing distinct user types. They are built from real data—demographics, psychographics (attitudes, values, motivations), and technographics (technical skill, device preferences)—and are created early in the design process.
Where does persona data come from? Online surveys, web analytics, customer feedback forms, usability tests, and interviews with support staff.
Personas serve two purposes: they foster empathy (designers think "Will Sarah, a busy parent with low tech confidence, understand this?") and guide design decisions (features prioritized for one persona may differ from another). A product typically has 2–5 key personas, each representing a distinct user segment.
When to use personas: Early in design, before detailed decisions are made, to keep the team focused on actual user needs rather than hypothetical "power users."
Testing Methods and Metrics
Usability Testing
Usability testing brings typical users into a controlled or realistic environment where they perform representative tasks while observers record their behavior, emotions, and difficulties. Unlike inspection methods that rely on expert judgment, usability testing reveals how actual users interact with a design.
A usability test typically involves:
Recruiting participants who match your target user profile
Defining representative tasks (realistic scenarios, not artificial exercises)
Observing performance as users attempt tasks
Recording data on their success, difficulty, and feedback
Analyzing results to identify patterns and problems
Usability testing can be quantitative (measuring performance metrics) or qualitative (exploring how and why users struggle).
Key Metrics from Usability Tests
When running a usability test, you can measure:
Task completion rate: The percentage of users who successfully finish a task (expressed as 75% of users completed checkout)
Task completion time: How long it takes users to finish a task (useful for efficiency; compare against benchmarks to show improvement)
Error rate: The number of errors users make, and their severity (minor: wrong button clicked but recovered; critical: task failure)
Success-to-failure ratio: A simple count of how many tasks succeeded versus failed across all users
Time spent on errors: How long users spend trying to correct mistakes (reveals where the interface confused them)
Satisfaction rating: User-provided ratings (e.g., a 1–5 scale: "How easy was this task?")
Frustration indicators: Observable signs—sigh, comment ("This is confusing"), body language—that suggest emotional difficulty
Which metrics matter most? It depends on your goals. If you're redesigning a checkout process, task completion time and errors matter most. If you're evaluating learning software, satisfaction and error recovery time matter.
Remote Usability Testing
Remote usability testing conducts studies asynchronously online, allowing larger sample sizes or geographically dispersed participants. Participants perform tasks in their own environment using their own devices, providing naturalistic data.
Quantitative remote testing uses task-based surveys where participants report whether they completed tasks and how long it took. This scales well—hundreds of participants can be tested cheaply—but provides limited detail on why they failed.
Qualitative remote testing uses screen recordings and think-aloud commentary (where participants narrate their thoughts as they work). This captures rich detail about user decision-making and confusion points, but requires smaller sample sizes due to time needed for analysis.
When to use remote testing: When you need scale, geographic diversity, or naturalistic conditions. Downside: less control over environment and ability to probe deeper.
<extrainfo>
Meta-Analysis
Meta-analysis uses statistical techniques to combine results from multiple usability studies into overall quantitative findings. Rather than relying on a single study (which may have quirks or limitations), meta-analysis identifies patterns across studies to draw robust conclusions.
International Standards
Several international standards guide method selection and evaluation:
ISO/TR 16982 provides guidance on choosing appropriate usability evaluation methods
ISO 9241 covers ergonomics of human-computer interaction, including usability principles
IEC 62366 applies usability evaluation to medical device software
These standards help organizations ensure their evaluation approach is rigorous and defensible, particularly important in regulated industries like healthcare.
</extrainfo>
Summary: Choosing the Right Method
The usability evaluation toolkit contains many methods, each suited to different questions and development stages:
Early design (no prototype yet): Use cognitive modeling (GOMS, Human Processor Model) to predict task times, or rapid prototyping + cognitive walkthrough to test concepts.
Design phase (prototype available): Use heuristic evaluation for quick expert feedback, or small-scale usability testing to validate assumptions.
Late design or released product: Use full usability testing with appropriate metrics, benchmarking to demonstrate improvement, or remote testing for scale.
Throughout design: Use personas to keep user needs visible, and iterate based on findings.
Effective usability design requires early user focus, empirical measurement, and iterative refinement—and the methods discussed here are the tools that make this possible.
Flashcards
Which four components does GOMS use to analyze user interaction?
Goals
Operators
Methods
Selection rules
Which three metrics does the Human Processor Model predict to model human information processing?
Cycle times
Decay times
Capacities
Who performs a Heuristic Evaluation to assess a user interface?
Small groups of experts.
What is the main benefit of using Prototyping Methods in the design process?
They allow early testing of design concepts without building the complete system.
What level of fidelity is typically used in Rapid Prototyping models, such as paper prototypes?
Low-fidelity.
What specific user group does a Cognitive Walkthrough focus on when assessing ease of learning?
New or infrequent users.
What specific process does a Cognitive Walkthrough focus on at each interaction step?
The user's decision-making process.
Which four standardized metrics are measured during Benchmarking?
Core task completion time
Error-fixing time
Learning time
System functionality
How does a Meta-Analysis produce overall quantitative findings in usability?
By statistically combining results from multiple usability studies.
What three types of data are used to build Personas?
Demographic
Psychographic
Technographic
Why are Personas created early in the design process?
To foster empathy and guide design decisions.
What is the core activity of typical users during Usability Testing?
Performing tasks in realistic or simulated environments while being observed.
What does the task completion rate metric represent?
The percentage of users who finish a task.
What factors are measured by the error rate metric?
The number and severity of errors made.
What does a satisfaction rating measure?
How pleasant the experience was according to the user.
According to general takeaways, what three elements are required for designing for usability?
Early user focus
Empirical measurement
Iterative refinement
Quiz
Usability - Evaluation Testing and Summary Quiz Question 1: What usability aspect does a cognitive walkthrough primarily assess?
- Ease of learning for new or infrequent users (correct)
- Overall system performance speed
- Consistency of visual design across screens
- Long‑term user satisfaction after extended use
Usability - Evaluation Testing and Summary Quiz Question 2: Which components are analyzed in the GOMS model?
- Goals, operators, methods, and selection rules (correct)
- Users, hardware, network latency, and visual layout
- Color schemes, typography, whitespace, and icons
- Budget, schedule, resources, and risk
Usability - Evaluation Testing and Summary Quiz Question 3: In heuristic evaluation, who typically performs the assessment?
- Small groups of usability experts (correct)
- End users recruited from the market
- Software developers only
- Marketing team members
Usability - Evaluation Testing and Summary Quiz Question 4: What is the primary benefit of using prototypes in usability design?
- Early testing of concepts before full system development (correct)
- Final production deployment without further changes
- Performance benchmarking of hardware components
- Security testing of network protocols
Usability - Evaluation Testing and Summary Quiz Question 5: At what stage are personas typically developed to maximize their usefulness?
- Early in the design process (correct)
- After the product has been launched
- During final post‑release testing
- During the marketing campaign rollout
Usability - Evaluation Testing and Summary Quiz Question 6: Which of the following is a common data source for developing personas?
- Online surveys (correct)
- Hardware specifications sheets
- Server uptime logs
- Corporate financial statements
Usability - Evaluation Testing and Summary Quiz Question 7: Who typically performs the tasks during usability testing?
- Typical end users (correct)
- System administrators
- Software developers
- Automated testing scripts
Usability - Evaluation Testing and Summary Quiz Question 8: Which three practices are essential for designing usable systems?
- Early user focus, empirical measurement, iterative refinement (correct)
- Late‑stage testing, speculative design, fixed requirements
- One‑time release, no user feedback, static documentation
- Heavy marketing focus, price optimization, brand alignment
Usability - Evaluation Testing and Summary Quiz Question 9: Which categories of methods collectively support usability assessment across development stages?
- Cognitive models, inspections, inquiries, prototypes, and testing (correct)
- Marketing, sales, finance, and legal compliance
- Hardware design, network engineering, database optimization, and security audits
- Branding, advertising, public relations, and customer service
Usability - Evaluation Testing and Summary Quiz Question 10: How do benchmark usability studies typically conduct their experiments?
- They follow rigorous experimental protocols similar to psychology laboratory experiments (correct)
- They rely on informal observations in real‑world settings without control groups
- They use only anecdotal feedback from a small number of expert users
- They evaluate designs solely through automated performance metrics
Usability - Evaluation Testing and Summary Quiz Question 11: What does task completion time measure in a usability test?
- The time taken by a user to finish a specific task (correct)
- The percentage of users who successfully finish the task
- The number of errors a user makes while performing the task
- The overall satisfaction rating the user gives after the task
Usability - Evaluation Testing and Summary Quiz Question 12: What statistical approach combines findings from several usability studies to yield an overall quantitative result?
- Meta‑analysis (correct)
- Single‑case study
- Heuristic inspection
- Think‑aloud protocol
Usability - Evaluation Testing and Summary Quiz Question 13: What characteristic best describes low‑fidelity prototypes created during rapid prototyping?
- Simple, inexpensive, and quickly produced (correct)
- Fully interactive with high visual detail
- Developed using functional codebases
- Rendered in immersive virtual reality environments
Usability - Evaluation Testing and Summary Quiz Question 14: In quantitative remote usability testing, which tool is typically used to collect validation data?
- Task‑based surveys (correct)
- Screen‑recordings with think‑aloud commentary
- In‑person observation sessions
- Eye‑tracking hardware in participants’ homes
What usability aspect does a cognitive walkthrough primarily assess?
1 of 14
Key Concepts
Usability Evaluation Methods
Heuristic Evaluation
Cognitive Walkthrough
Usability Testing
Remote Usability Testing
Benchmarking (usability)
Cognitive Models and Frameworks
GOMS
Human Processor Model
Meta‑analysis
User-Centered Design
Persona (user experience)
ISO 9241
Definitions
GOMS
A cognitive modeling technique that breaks down user tasks into Goals, Operators, Methods, and Selection Rules to predict performance.
Human Processor Model
A theoretical framework that models human information processing cycles to estimate task execution times.
Heuristic Evaluation
An inspection method where usability experts assess an interface against established design heuristics.
Cognitive Walkthrough
An evaluative approach where reviewers simulate a new user’s problem‑solving steps to identify learning difficulties.
Usability Testing
A method in which representative users perform tasks while observers record performance metrics and subjective feedback.
Remote Usability Testing
Conducting usability studies online, allowing participants to complete tasks in their natural environment.
Benchmarking (usability)
Creating standardized test scenarios to compare the performance of different designs against a reference baseline.
Meta‑analysis
A statistical technique that aggregates results from multiple usability studies to derive overall conclusions.
Persona (user experience)
Fictional archetypes representing target user groups, built from demographic and behavioral data to guide design.
ISO 9241
An international standard series that defines ergonomic requirements for the design of interactive systems.