perf/core: Use POLLHUP for pinned events in error

Pinned performance events can enter an error state when they fail to be
scheduled in the context due to a failed constraint or some other conflict
or condition.

In error state these events won't generate any samples anymore and are
silently ignored until they are recovered by PERF_EVENT_IOC_ENABLE,
or the condition can also change so that they can be scheduled in.

Tooling should be allowed to know about the state change, but
currently there's no mechanism to notify tooling when events enter
an error state.

One way to do this is to issue a POLLHUP event to poll(2) to handle this.
Reading events in an error state would return 0 (EOF) and it matches to
the behavior of POLLHUP according to the man page.

Tooling should remove the fd of the event from pollfd after getting
POLLHUP, otherwise it'll be returned repeatedly.

[ mingo: Clarified the changelog ]

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317061745.1777584-1-namhyung@kernel.org
This commit is contained in:
Namhyung Kim 2025-03-16 23:17:45 -07:00 committed by Ingo Molnar
parent b6ecb57f1f
commit f4b07fd62d

View File

@ -3984,6 +3984,11 @@ static int merge_sched_in(struct perf_event *event, void *data)
if (event->attr.pinned) {
perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
if (*perf_event_fasync(event))
event->pending_kill = POLL_HUP;
perf_event_wakeup(event);
} else {
struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu);
@ -5925,6 +5930,10 @@ static __poll_t perf_poll(struct file *file, poll_table *wait)
if (is_event_hup(event))
return events;
if (unlikely(READ_ONCE(event->state) == PERF_EVENT_STATE_ERROR &&
event->attr.pinned))
return events;
/*
* Pin the event->rb by taking event->mmap_mutex; otherwise
* perf_event_set_output() can swizzle our rb and make us miss wakeups.