Interpreting User Feedback Signals for AI Agents
Interpreting User Feedback Signals for AI Agents
When a user gives your AI agent a thumbs up, what does that actually mean?
The naive interpretation is approval. The agent did something well; do more of it. This is how most feedback pipelines work: positive signals reinforce behavior, negative signals suppress it.
But human feedback is far more ambiguous than a binary good/bad signal. A thumbs up might mean "this is exactly right." It might also mean "good enough, I do not want to spend 10 minutes correcting this." These are profoundly different signals, and treating them the same produces agents that plateau at adequate rather than improving toward excellent.
Three Types of Explicit Feedback (and Why They All Lie a Little)
Approval signals (thumbs up, 5 stars, "looks good") are the most common. They are also the most overloaded. Users approve things to close the interaction loop, to avoid the effort of explaining what was wrong, and occasionally because they genuinely loved the result. All three responses look identical in your feedback log.
Correction signals (edits, revisions, "actually, change this to...") are more specific but still ambiguous. When a user changes a word in a draft, are they fixing a stylistic preference or correcting a factual error? One warrants remembering the preference; the other warrants questioning the agent's accuracy on similar topics.
Rejection signals (thumbs down, "that's wrong") are the clearest in direction but often the vaguest in reason. Without a follow-up explanation, a rejection tells you something failed but not what or why.
The 2025 research paper "Reinforcement Learning from User Feedback" (arXiv:2505.14946) found that production systems relying solely on explicit binary feedback showed slow convergence on user preferences compared to systems that incorporated behavioral signals alongside ratings. The gap was largest for preference areas where users had difficulty articulating their preferences explicitly but had strong implicit reactions.
Behavioral Signals Are Stronger Evidence
The most actionable feedback often comes from what users do after the agent acts, not from what they rate.
Immediate undo (within 10-30 seconds of an agent action): The user saw the result and reversed it. This is a cleaner negative signal than any rating. The action, the context, and the outcome are all captured in the log. A user who undoes 40% of an agent's file organization actions is giving you a precise training signal even if they never complain.
Partial modification: A user who changes 10% of an agent-written email is signaling something different from one who rewrites 80% of it. Light modification suggests the output was mostly right but missed a nuance. Heavy modification suggests the agent misunderstood the task or the user's style significantly. Tracking the modification ratio over time reveals whether an agent is improving or stagnating.
Selective adoption: When an agent suggests three actions and the user takes one, the two ignored suggestions are negative signals. Agents that track adoption rate per suggestion type can learn which categories of help are actually wanted versus which are tolerated.
Dwell time before acting on output: A user who immediately applies an agent's suggestion treats it as a trusted action. A user who stares at the suggestion for 90 seconds before deciding is communicating uncertainty - either about the suggestion's correctness or their own comfort with it.
Desktop agents have a structural advantage here. They can observe these behavioral signals directly rather than inferring them from survey data. The agent that reorganized your files can observe whether you moved them again within a week. The agent that drafted your email can observe how much you changed before sending.
Building a Multi-Signal Feedback System
A practical feedback pipeline for a desktop agent combines four signal types with different weights:
class FeedbackEngine:
SIGNAL_WEIGHTS = {
"explicit_approve": 0.3, # Noisiest - many motivations
"explicit_reject": 0.6, # Clearer direction, vague reason
"behavioral_undo": 0.9, # Strong - definite rejection
"behavioral_modify": 0.7, # Strong - shows what was wrong
"behavioral_ignore": 0.4, # Weak - could be deferred not rejected
"behavioral_adopt_immediate": 0.8, # Strong approval signal
}
def record_signal(self, action_id: str, signal_type: str,
context: dict = None):
weight = self.SIGNAL_WEIGHTS.get(signal_type, 0.5)
self.store(FeedbackRecord(
action_id=action_id,
signal=signal_type,
weight=weight,
context=context,
timestamp=now()
))
def aggregate_preference(self, action_category: str,
window_days: int = 30) -> float:
"""Returns preference score 0-1 for an action category."""
recent = self.get_signals(action_category, window_days)
if not recent:
return 0.5 # Neutral default
weighted_sum = sum(s.weight * s.valence for s in recent)
total_weight = sum(s.weight for s in recent)
return weighted_sum / total_weight
The key design choices here: behavioral signals carry higher weight than explicit ratings (explicit ratings are noisier), and context is stored with every signal so you can later analyze not just what failed but what conditions surrounded the failure.
Feedback Decay and Preference Drift
User preferences change. A workflow that was correct six months ago may not reflect how the user works today. A feedback system without decay treats stale signals as equally valid as recent ones, which makes agents slow to adapt to genuine preference changes.
Simple exponential decay with a half-life matched to how frequently the agent runs:
def decayed_weight(signal: FeedbackRecord, half_life_days: float = 60) -> float:
age_days = (now() - signal.timestamp).days
decay_factor = 0.5 ** (age_days / half_life_days)
return signal.weight * decay_factor
For a daily-use agent, a 60-day half-life means signals from two months ago carry about half the weight of today's signals. For a weekly-use agent, extend the half-life proportionally.
The Calibration Problem
Even with behavioral signals and decay, there is a fundamental calibration challenge: the agent can only learn from actions it took. It cannot observe what would have happened if it had acted differently.
This means agents trained purely on feedback tend toward conservative behavior over time - they learn what the user currently accepts and optimize for that, rather than exploring whether more ambitious actions might be more appreciated.
One practical mitigation: periodically propose slightly more ambitious versions of actions the user has consistently approved. If a user always approves Tier 1 file organization, occasionally propose a Tier 2 action (reorganizing an entire folder structure) as an explicit suggestion. Their response gives you a calibration signal for where their tolerance actually sits.
The goal is an agent that improves continuously toward the user's actual preferences - not one that learns to produce the minimal acceptable output every time. The difference between those two agents is often the feedback system design more than the model quality.
Fazm is an open source macOS AI agent. Open source on GitHub.