This site is a work in progress — check back soon for the full version.
WILSON CHEN

2FA Beyond the Basics: A Usability Study of Duo Mobile at the University of Washington

YEAR 2024
ORG University of Washington, HCDE · Duo Security / Cisco
ROLE Researcher: Study Design, Recruitment & Survey, Analysis & Reporting
TYPE Usability Study
DURATION 10 weeks (Winter 2024)
TEAM Wilson Chen, Bhoomika Bangalore Rajeeva, Candelaria Herrera, Anushka Kelkar, Gavriil Kochevrin
METHODS Screening survey (n=23), moderated usability testing (n=6), think-aloud protocol, affinity mapping, severity rating
STATUS Complete

The Key Insight

Duo Security sponsored our team to study how UW power users (people who had linked five or more personal accounts) navigated the app. We couldn't find them. A screening survey of 23 UW respondents revealed that 39.1% didn't know they could link third-party accounts to Duo at all. The feature existed; the awareness didn't.

When we pivoted the study to test whether regular users could complete the account-linking flow, four of six participants failed. Three interlocking design problems made the flow nearly impossible for first-time users: missing context, inaccurate instructions, and a trust gap that turned a security feature into a perceived threat. We documented them, rated their severity, and delivered actionable recommendations directly to Duo's product team at Cisco.

My Role

On a five-person team sponsored by Vanessa Chien Lai at Duo Mobile / Cisco, I contributed across research design, execution, and synthesis. The table below maps what I specifically owned versus what was shared.

Area My contribution
Survey design Co-designed the screening survey instrument, wrote question copy, and coordinated distribution through UW channels.
Study design Contributed to the usability study plan and study kit, including task definitions and moderator guide.
Testing Moderated usability test sessions with think-aloud protocol, audio recording, and post-task surveys.
Synthesis Co-led affinity mapping of qualitative data and applied the HCDE 517 severity rating scale to classify findings by frequency, impact, and persistence.
Reporting Contributed to the final study report and stakeholder presentation delivered to Duo's product team. Team point of contact was Gavriil Kochevrin.

Context

Duo Security handles over one billion monthly authentications. At the University of Washington, it's the mandatory gateway for logging into Canvas, email, and every other institutional system. Nearly every student, faculty member, and staff employee uses it daily, but almost exclusively for one thing: approving the push notification when logging into a UW service.

Duo also supports a feature most users don't know exists: linking third-party personal accounts (Gmail, Instagram, Amazon) to use Duo as a general-purpose 2FA app. In HCDE 517: Usability Studies, our five-person team was sponsored by Duo Security to investigate how well UW users could discover and use that feature.

The research question Duo brought to us was focused on power users: people who had intentionally linked five or more personal accounts and were using Duo as a primary 2FA manager.

Original research question: How do power users interact with Duo Mobile's interface? How do they navigate when logging into several applications back to back, and how do they find the right passcode in a long list of connected services?

We couldn't find them. Survey recruitment across UW channels reached 23 respondents, and the data made clear why power users were rare: 39.1% of UW Duo users didn't know they could link third-party accounts at all. The feature's target user didn't exist in quantity because the feature wasn't discoverable.

Refined research question: Can regular UW users complete the third-party account-linking flow at all, and if not, what's stopping them?

This pivot became the study's first finding before testing even began. Designing for power users was premature when basic users couldn't complete the flow.

Method

The study ran in two phases over the Winter 2024 quarter, with findings delivered to Duo stakeholders at the end of Week 11.

Method Scope Format
Screening survey n = 23 Demographics, Duo usage patterns, feature awareness, and usability test recruitment; distributed via UW channels over 9 days (Feb 16–25)
Usability testing n = 6 Moderated in-person sessions with think-aloud protocol, audio recording, and post-task questionnaires; 3 tasks per participant (Feb 25 – Mar 1)

Phase 1: Screening survey

A Google Forms survey gathered demographic data, Duo usage patterns, and awareness of the third-party linking feature. The latter half invited interested respondents to sign up for a usability test session. We received 23 responses; 17 expressed interest in follow-up testing.

The survey also served double duty as a recruiting screener: participants needed to be UW-affiliated, actively using Duo in an academic context, 18 or older, and willing to use their mobile device during a session.

Phase 2: Moderated usability testing

We selected 6 participants from the 17 interested respondents and conducted moderated in-person sessions in booked campus rooms with audio recording (with consent). Each participant completed three tasks with think-aloud protocol and a post-task questionnaire (7-point scale for difficulty, perceived duration, and likelihood of continued use) after each:

For analysis, we organized qualitative data through affinity mapping and applied the HCDE 517 severity rating scale to classify each finding. The scale uses three dimensions (frequency, impact, and persistence), each rated as moderate or serious:

Participants

Six participants: 4 students, 1 instructor, 1 employee. Four iOS users, 2 Android. Four women, two men. All were active UW Duo users in educational contexts.

Task 3 post-task survey data showing average difficulty of 2.1/7, perceived time of 2.6/7, likelihood to continue of 1.6/7, and average time on task of 9.8 minutes
Task 3 post-task data: participants rated the task difficult, long, and unlikely to repeat. Average time on task: 9.8 minutes.

Findings

Tasks 1 and 2 showed no usability issues. All 6 participants completed both at 100% success rate, and their mental models of the login flow (Task 1) matched their actual behavior (Task 2) exactly. Post-task ratings confirmed: participants found both tasks easy, short, and were likely to continue use. The standard authentication flow is well-understood and well-designed.

Task 3 was the opposite. The severity matrix tells the story:

Cosmetic (1) Minor (2) Major (3) Catastrophic (4) Success Rate
Task 1 0 0 0 0 100% (6/6)
Task 2 0 1 0 0 100% (6/6)
Task 3 0 0 0 3 33% (2/6)

Average time on task for Task 3: 9.8 minutes. Range: 6 to 18 minutes. Post-task ratings were the inverse of Tasks 1–2: participants found the task difficult, long, and were unlikely to continue use. All three issues identified were rated Severity 4 (Catastrophic).

Finding 1: Participants couldn't complete the flow

The third-party linking process requires navigating between Duo and a target app, using either a QR code or an activation code to complete the connection. The interface provided no explanation of what a QR code was for in this context, no indication of where to find the activation code in the target app, and no progress signal of how many steps remained.

Participants would reach an impasse: the Duo screen showed a QR code scanner with a vague instruction to "go back to [app name]," and they had no idea where to go or what to do next. Without any path forward, most stopped.

"The task felt impossible." (P6)

"It's impossible for me to complete the task and I would definitely not want to try again." (P4)

Recommendations: Deep-link from within Duo directly to the 2FA settings page of the target app, eliminating the need for users to navigate through unfamiliar menus. Add a clear progress indicator to the linking flow so users know how many steps remain and where they are.

Annotated Duo Mobile screens showing the account-linking flow with callouts for missing context, incomplete information, and ambiguous next steps
Finding 1 analysis: the "Add account" and activation code screens with usability issues annotated

Finding 2: Instructions were inaccurate and generic

Duo's in-app instructions read "Head back to Amazon" regardless of which app the user was linking. For participants trying to link Gmail or Instagram, this instruction was not just unhelpful but actively misleading. The instructional copy had not been updated to reflect how different apps surface their 2FA settings, and the generic fallback created confusion about what step they were even being asked to perform.

A secondary issue: the "Use QR Code" and "Use Activation Code" buttons appeared side by side with no explanation of when to use each, and were rendered in gray, a visual treatment that communicated "inactive" rather than "available." Participants consistently hesitated, unsure whether they'd already made a mistake.

"I am so lost! What is happening here?" (P6)

"It's just not enough information and when I try to add it, it's just bad design. Bad interaction." (P5)

"I wish it would give some steps on how to do that." (P1)

Recommendations: Replace generic copy with app-specific instructions, including direct links to the relevant settings section of each supported service. Replace the gray icon treatment with instructional imagery showing exactly where in the target app users should navigate, with visual arrows. Clarify when to use QR vs. activation code with a single unambiguous decision point.

Annotated Duo Mobile screen showing 'Head back to Amazon' instructions with callouts for incomplete instructions, gray icon confusion, and unclear button actions
Finding 2 analysis: generic "Head back to Amazon" instruction, gray icon treatment, and ambiguous QR/activation code buttons

Finding 3: Participants were unwilling to proceed due to mistrust

Three of six participants believed Duo was a University of Washington application, built and operated by UW for UW purposes. When asked to link a personal Gmail or Instagram account to what they understood as a UW-controlled system, they refused or hesitated significantly. They didn't understand why Duo needed access to their personal accounts, what Duo would be able to see, or whether this was safe.

This wasn't irrational. Duo's interface provided no onboarding, no explanation of what the app is outside of mandatory institutional authentication, and no visible privacy or security information at the point of account linking. The absence of trust-building content turned a feature into a threat.

"I don't understand the point of this app at all. Like, how does it actually add value to my data protection?" (P4)

"I don't know enough about Duo outside of UW and how they treat my data." (P5)

"There are just too many things that are put on my decision-making without enough information." (P5)

Recommendations: Create a short onboarding sequence for new users explaining what Duo is, why it's trustworthy, and what it can and cannot access. Add inline "info" buttons at each decision point that explain what an action does before the user commits to it. Include a contextual reminder when users initiate account-linking that explains how the feature keeps their accounts safe.

Annotated Duo Mobile home screen showing lack of context about what the app is, missing help buttons, and unclear purpose of account linking
Finding 3 analysis: Duo's home screen provided no explanation of the app's identity or purpose beyond UW authentication
Affinity mapping board showing qualitative data organized into clusters from usability testing and synthesis
Affinity mapping: qualitative data from usability testing organized into finding clusters

Additional Findings

Several secondary observations from testing added nuance to the three main findings:

Outcome

The study results were delivered to Duo Security / Cisco stakeholders at the end of Winter 2024 in the form of a final report and a stakeholder presentation. The project was a course deliverable for HCDE 517: Usability Studies. Recommendations were delivered to the sponsor, but implementation decisions rested with Duo's product team.

What the research produced was a reframing of Duo's challenge: the original goal of studying power users revealed a fundamental feature awareness problem. The feature's target user doesn't exist in quantity because the feature isn't discoverable. Designing for power users is premature when basic users can't complete the flow.

The sponsor also identified areas for future research based on our findings: how users deal with authentication when they don't have their phone, what happens during phone switching or merging, and whether power users (once found) distinguish between Duo-endorsed apps and third-party additions.

Limitations

Our sample sizes (n=23 survey, n=6 usability tests) were sufficient for identifying usability problems (Jakob Nielsen's research suggests 5 participants uncover approximately 85% of usability issues) but limit generalizability of any quantitative claims. Survey respondents were self-selected through UW channels, which may skew toward more engaged students.

Sessions were run individually by different team members rather than together, which meant each moderator observed different user behaviors in real time without the ability to cross-check interpretations live. Earlier pilot testing would have helped; some ambiguities in the task wording only became apparent once participants encountered them.

A practical limitation surfaced during testing: because all participants were reluctant to link personal accounts to what they perceived as a UW app, the sponsor suggested creating a "burner" Duo account for participants to use instead. This didn't resolve the issue. Participants were then reluctant to add personal information to an account that wasn't theirs. The workaround itself became evidence of the trust problem we documented in Finding 3.

Reflection

The study's most interesting moment was the pivot. We designed a study for one user group and discovered mid-recruitment that the group barely existed, and that its absence was itself a finding. Reframing the study around first-time use rather than expert use was the right call, and it gave the sponsor more actionable data than a power-user study would have.

Delivering evidence-based recommendations with real user data to an active product team at Cisco was the kind of outcome that makes research feel grounded in something that matters.

Appendix

Project artifacts from the HCDE 517 course deliverables:

Artifact Description
Final Study Report Executive summary, methods, findings with severity ratings, recommendations, and reflection
Stakeholder Presentation Slide deck delivered to Duo Security / Cisco with findings, severity matrix, and recommendations
Usability Study Plan Research questions, participant profiles, task list, data collection plan, and project timeline
Usability Study Kit Recruitment survey, consent form, moderator guide, post-task questionnaire, and data logging plan
Preliminary Proposal Initial heuristic evaluation and key usability issues identified before the study