Snorly – Usability

Documentation of Usability Test

Usability Test Plan & Heuristic Evaluation

Part 1: Heuristic Evaluation

Methodology: Nielsen’s 10 Usability Heuristics

Scope: Initial MVP Prototype (Android / Jetpack Compose)

Evaluators: Team Snorly

1) Description of the Evaluation

We examined the interface against Jakob Nielsen's usability heuristics to identify usability problems before user testing. We focused on the critical paths: Setting an alarm, Sleep Tracking, and Puzzle Dismissal.

2) Results (Heuristic Violations & Severity)

Severity scale: 0 (No problem) to 4 (Usability catastrophe)

A. Visibility of System Status (Heuristic #1)

Finding: When "Sleep Tracking" is active, there is no clear indication outside the app (e.g., notification bar) that the app is running.
Impact: Users might think tracking stopped if they minimize the app.
Severity: 3 (Major)
Recommendation: Implement a persistent notification or a distinct “Recording” animation on the TopAppBar while tracking is active.

B. User Control and Freedom (Heuristic #3)

Finding: While the "Puzzle Alarm" intends to restrict control, there is no emergency dismiss (e.g., long press for 10 seconds) in case of a bug or a sensitive real-world context.
Impact: Users may force-close or uninstall the app if they cannot stop the alarm in an inappropriate setting.
Severity: 2 (Minor / Strategic)
Recommendation: Keep the puzzle, but consider allowing a more accessible snooze, while keeping “Dismiss” hard.

C. Error Prevention (Heuristic #5)

Finding: In the “Set Alarm” dialog, the AM/PM toggle (if using 12h format) is small or too close to the “Save” button.
Impact: Users might accidentally set an alarm for 7:00 PM instead of 7:00 AM.
Severity: 4 (Catastrophe — failed core value)
Recommendation: Use a 24h clock by default or make AM/PM very distinct. Add a “Time until alarm” confirmation (e.g., “Alarm set for 8 hours from now”).

D. Recognition Rather Than Recall (Heuristic #6)

Finding: Puzzle difficulty is labeled “Easy/Medium/Hard”, but there is no preview of what “Medium” looks like until the alarm rings.
Impact: Users have to guess if they can solve it while groggy.
Severity: 2 (Minor)
Recommendation: Add a “Preview Puzzle” button on the Settings screen.

Part 2: User Test Plan (Lab/Field Test)

1) Hypotheses & Testable Questions

H1: Users who dismiss the alarm using a puzzle will take longer to dismiss the alarm than users who dismiss with a standard swipe, but will report feeling more awake immediately afterward.
H2: Users will rate the Math puzzle as more effective for waking up than the Memory puzzle, but will also rate it as more frustrating (lower satisfaction / higher perceived annoyance).
Testable question: When users experience difficulty or frustration with the puzzle dismissal, do they attempt to bypass it (e.g., force-close, lower phone volume, or disable the feature) instead of completing the puzzle?

2) Planned Data to Collect (Variables)

Independent Variables

Dismissal Type: Standard Swipe vs. Math Puzzle vs. Memory Puzzle
Context: Testing during the day (simulated nap) vs. morning (real use)

Dependent Variables

Time to Dismiss (Quantitative): Time in seconds from alarm start to successful dismissal
Error Rate (Quantitative): Failed attempts or accidental “Snooze” taps
User Satisfaction (Qualitative): System Usability Scale (SUS)

3) Methods & Protocol (Tasks & Materials)

Participants: 5-8 students (hallway testing / fellow students)

Materials:

Smartphone with MVP app installed
Observation sheet (spreadsheet)
SUS questionnaire (Google Forms)
Post-test questionnaire (Google Forms)

Task List (Script):

Task 1: Setup (Onboarding)
- Prompt: “Open the app for the first time. Set up an alarm for 2 minutes from now. Choose a ‘Math’ puzzle as your wake-up requirement.”
- Observation focus: Can they find the settings?
Task 2: The Sleep Simulation
- Prompt: “Start the sleep tracking feature and lock the phone. Wait for the alarm.”
- Observation focus: Do they know how to start sleep tracking? Does the app stay active?
Task 3: The Wake Up
- Prompt: “When the alarm rings, turn it off completely.”
- Observation focus: Watch their fingers. Do they struggle with touch targets? Do they try to swipe it away out of habit?
Task 4: The Sleep checker
- Prompt: “Create a Sleep entry and change the rating of it.”
- Observation focus: Do they find it intuitive to track sleep and wake up to get an entry?

User Test Results

The charts below are generated from our collected data (demographics, questionnaire, and SUS). (Note: Error Rate is 0 so it isn't mentioned )

Participants

Count: —
Gender: —
Age: —

Boxplot: SUS Score (0–100)

App Structure Rating (1 good – 5 bad)

Puzzle Difficulty Appropriate?

Participant Ages

Quantitative Summary

Participant	Gender	Age	App Structure	Puzzle Difficulty appropriate	SUS Score

Task Completion Times (Time Rate)

Times are based on the observation.

Participant	Task 1 (s)	Task 2 (s)	Task 3 (s)	Task 4 (s)	Notes

Limitations:

Small sample size (n=6)
Mostly male participants
Some tests done on emulator, not real devices
No long-term real sleep usage was measured

Most Relevant Feedback

Emulator Problems (wrong time on device)
Phone didnt ring when device locked
Difficulty finding challenge
Restarted the challenges when flipping the phone

Updates Made Based on Feedback

Improved name clarity
Improved UI and UX
Changed state storage so it doesn’t restart challenge when flipping the phone

Conclusion

The usability test indicates very high overall usability (mean SUS ≈ 95), which is considered excellent. Participants were able to complete the core tasks with little difficulty. However, the low ratings for app structure and multiple comments about navigation indicate that the information architecture needs improvement. The puzzle-based dismissal was perceived as effective, but some users showed initial confusion, suggesting the need for clearer onboarding and UI guidance.

Evaluation of Hypotheses

H1: Partially supported. While puzzle dismissal increases interaction time, participants reported feeling more awake, but this was not quantitatively measured.
H2: Not fully tested, as only one puzzle type was evaluated in this iteration.
Testable question: No participant force-closed the app, but two participants showed signs of confusion, indicating potential frustration risk.