Documentation of Usability Test
Usability Test Plan & Heuristic Evaluation
Part 1: Heuristic Evaluation
Methodology: Nielsen’s 10 Usability Heuristics
Scope: Initial MVP Prototype (Android / Jetpack Compose)
Evaluators: Team Snorly
1) Description of the Evaluation
We examined the interface against Jakob Nielsen's usability heuristics to identify usability problems before user testing. We focused on the critical paths: Setting an alarm, Sleep Tracking, and Puzzle Dismissal.
2) Results (Heuristic Violations & Severity)
Severity scale: 0 (No problem) to 4 (Usability catastrophe)
A. Visibility of System Status (Heuristic #1)
- Finding: When "Sleep Tracking" is active, there is no clear indication outside the app (e.g., notification bar) that the app is running.
- Impact: Users might think tracking stopped if they minimize the app.
- Severity: 3 (Major)
- Recommendation: Implement a persistent notification or a distinct “Recording” animation on the TopAppBar while tracking is active.
B. User Control and Freedom (Heuristic #3)
- Finding: While the "Puzzle Alarm" intends to restrict control, there is no emergency dismiss (e.g., long press for 10 seconds) in case of a bug or a sensitive real-world context.
- Impact: Users may force-close or uninstall the app if they cannot stop the alarm in an inappropriate setting.
- Severity: 2 (Minor / Strategic)
- Recommendation: Keep the puzzle, but consider allowing a more accessible snooze, while keeping “Dismiss” hard.
C. Error Prevention (Heuristic #5)
- Finding: In the “Set Alarm” dialog, the AM/PM toggle (if using 12h format) is small or too close to the “Save” button.
- Impact: Users might accidentally set an alarm for 7:00 PM instead of 7:00 AM.
- Severity: 4 (Catastrophe — failed core value)
- Recommendation: Use a 24h clock by default or make AM/PM very distinct. Add a “Time until alarm” confirmation (e.g., “Alarm set for 8 hours from now”).
D. Recognition Rather Than Recall (Heuristic #6)
- Finding: Puzzle difficulty is labeled “Easy/Medium/Hard”, but there is no preview of what “Medium” looks like until the alarm rings.
- Impact: Users have to guess if they can solve it while groggy.
- Severity: 2 (Minor)
- Recommendation: Add a “Preview Puzzle” button on the Settings screen.
Part 2: User Test Plan (Lab/Field Test)
1) Hypotheses & Testable Questions
- H1: Users who dismiss the alarm using a puzzle will take longer to dismiss the alarm than users who dismiss with a standard swipe, but will report feeling more awake immediately afterward.
- H2: Users will rate the Math puzzle as more effective for waking up than the Memory puzzle, but will also rate it as more frustrating (lower satisfaction / higher perceived annoyance).
- Testable question: When users experience difficulty or frustration with the puzzle dismissal, do they attempt to bypass it (e.g., force-close, lower phone volume, or disable the feature) instead of completing the puzzle?
2) Planned Data to Collect (Variables)
Independent Variables
- Dismissal Type: Standard Swipe vs. Math Puzzle vs. Memory Puzzle
- Context: Testing during the day (simulated nap) vs. morning (real use)
Dependent Variables
- Time to Dismiss (Quantitative): Time in seconds from alarm start to successful dismissal
- Error Rate (Quantitative): Failed attempts or accidental “Snooze” taps
- User Satisfaction (Qualitative): System Usability Scale (SUS)
3) Methods & Protocol (Tasks & Materials)
Participants: 5-8 students (hallway testing / fellow students)
Materials:
- Smartphone with MVP app installed
- Observation sheet (spreadsheet)
- SUS questionnaire (Google Forms)
- Post-test questionnaire (Google Forms)
Task List (Script):
- Task 1: Setup (Onboarding)
- Prompt: “Open the app for the first time. Set up an alarm for 2 minutes from now. Choose a ‘Math’ puzzle as your wake-up requirement.”
- Observation focus: Can they find the settings?
- Task 2: The Sleep Simulation
- Prompt: “Start the sleep tracking feature and lock the phone. Wait for the alarm.”
- Observation focus: Do they know how to start sleep tracking? Does the app stay active?
- Task 3: The Wake Up
- Prompt: “When the alarm rings, turn it off completely.”
- Observation focus: Watch their fingers. Do they struggle with touch targets? Do they try to swipe it away out of habit?
- Task 4: The Sleep checker
- Prompt: “Create a Sleep entry and change the rating of it.”
- Observation focus: Do they find it intuitive to track sleep and wake up to get an entry?
User Test Results
The charts below are generated from our collected data (demographics, questionnaire, and SUS). (Note: Error Rate is 0 so it isn't mentioned )
Participants
Count: —
Gender: —
Age: —
Boxplot: SUS Score (0–100)
App Structure Rating (1 good – 5 bad)
Puzzle Difficulty Appropriate?
Participant Ages
Quantitative Summary
| Participant | Gender | Age | App Structure | Puzzle Difficulty appropriate | SUS Score |
|---|
Task Completion Times (Time Rate)
Times are based on the observation.
| Participant | Task 1 (s) | Task 2 (s) | Task 3 (s) | Task 4 (s) | Notes |
|---|
Limitations:
- Small sample size (n=6)
- Mostly male participants
- Some tests done on emulator, not real devices
- No long-term real sleep usage was measured
Most Relevant Feedback
- Emulator Problems (wrong time on device)
- Phone didnt ring when device locked
- Difficulty finding challenge
- Restarted the challenges when flipping the phone
Updates Made Based on Feedback
- Improved name clarity
- Improved UI and UX
- Changed state storage so it doesn’t restart challenge when flipping the phone
Conclusion
The usability test indicates very high overall usability (mean SUS ≈ 95), which is considered excellent. Participants were able to complete the core tasks with little difficulty. However, the low ratings for app structure and multiple comments about navigation indicate that the information architecture needs improvement. The puzzle-based dismissal was perceived as effective, but some users showed initial confusion, suggesting the need for clearer onboarding and UI guidance.
Evaluation of Hypotheses
- H1: Partially supported. While puzzle dismissal increases interaction time, participants reported feeling more awake, but this was not quantitatively measured.
- H2: Not fully tested, as only one puzzle type was evaluated in this iteration.
- Testable question: No participant force-closed the app, but two participants showed signs of confusion, indicating potential frustration risk.