Kerntrace attaches eBPF probes to the running kernel and streams every syscall, page fault, TCP retransmit, and off-CPU stall straight off your fleet — no agent to install, no sidecar, no recompile. Overhead measured in nanoseconds, proven safe by the kernel's own verifier before a single byte runs.
$ kerntrace attach --probe=syscall:openat --where 'comm == "checkout-svc"'
[verifier] program accepted — 412 insns, 0 unbounded loops, 184B stack
[loaded] kprobe/sys_enter_openat attached to 38 hosts in 1.2s
ts host pid latency ret path
12:04:07.118 api-3 2241 14µs 0 /etc/ssl/certs/ca.pem
12:04:07.119 api-3 2241 9.8ms ESTALE /var/lib/secrets/db.key ←
12:04:07.121 worker-7 8830 11µs 0 /tmp/upload-7f3a.part
[anomaly] openat p99 +1,920% on api-3 — ESTALE on a stale NFS mount, not your code
[culprit] blocking read against a dead NFS handle — 9.8ms/call, 4.1k calls/min
→ off-CPU flamegraph? trace the fd leak? pin this probe to Grafana?Running in production at teams that live below the syscall boundary
Application metrics stop at the edge of your process. Kerntrace starts where they end — in the kernel, where the real stalls, drops, and blocked I/O actually live.
kprobes and tracepoints on the syscall boundary capture openat, read, write, connect, futex, and the other 300-plus calls — each tagged with PID, container, cgroup, and the exact argument that hurt.
Most profilers only show you CPU burning. Kerntrace shows the time your threads spend asleep — blocked on locks, disk, or the network — and stacks it into an off-CPU flamegraph built from scheduler events.
Trace TCP retransmits, connection resets, and SYN-to-accept latency straight from the socket layer. Catch the tail latency that never shows up in your application's request timer.
uprobes and USDT markers attach to your own binaries and to libssl, libc, or the JVM — read a function's arguments and return value with no debugger, no restart, no recompile.
Watch major faults, OOM-kill decisions, and slab churn as the kernel makes them. Find the service quietly thrashing the page cache before it takes a node down with it.
Correlate an open with no matching close, sockets wedged in CLOSE_WAIT, and growing fd tables back to the line of code and the request that opened them.
What 'production-safe' actually measures
Kernel modules can panic a box. ptrace stops the world. Kerntrace runs inside the eBPF sandbox the kernel itself enforces — every program is proven bounded before it is ever allowed to execute.
Before a probe loads, the in-kernel eBPF verifier walks every instruction path and proves it terminates, never reads out-of-bounds memory, and respects a fixed stack. A program that can't be proven safe simply does not load. Kerntrace surfaces the verifier's verdict — instruction count, bounded loops, stack depth — so every probe you ship carries the kernel's own guarantee.
Compile Once, Run Everywhere uses BTF type information to relocate field offsets at load time. One binary runs across kernel 5.4 through 6.x — no headers to chase, no module to rebuild per host.
Events stream through the perf ring buffer in-kernel and are aggregated on the host before anything leaves it. Cardinality and volume are bounded at the source — not on your network bill.
Probes observe; they don't mutate. No kernel module to taint your kernel, no LD_PRELOAD shim, nothing that can wedge a running process.
When a host gets hot, sampling backs off on its own and the kernel falls back to a counter instead of a full event — observability that never becomes the incident.
Every Kerntrace investigation ends at a specific kernel event, a specific PID, and the exact argument that caused it. Here is the path it walks for you.
p99 on checkout jumps to 9ms. Your APM shows the request handler sitting idle the whole time — the latency is being spent somewhere your instrumentation can't see.
Kerntrace attaches a syscall probe scoped to that one service and immediately catches the threads blocked in a read() against a stale NFS mount — not in your code at all.
The off-CPU flamegraph shows 94% of wall-clock time asleep in the VFS layer, stacked under a single blocking read. The kernel was the bottleneck the whole time.
Remount with a sane timeout and the same probe shows openat latency back at 14µs. The alert auto-resolves, and the probe stays pinned to catch the next regression before a human does.
“We spent two weeks blaming our own code for a latency spike. Kerntrace found it in twenty minutes — a stale NFS mount blocking in read(). It was never our code. It was always the kernel.”
“No agent, no restart, no recompile. We attached a uprobe to our running payment binary in production and watched the exact argument that was timing out. That used to be a deploy-and-pray exercise.”
“The verifier output is what got it past our kernel team. They could read, line by line, that the program was bounded before it ever loaded. That is how you trace prod without flinching.”
We don't meter events or charge per gigabyte — that would punish you for tracing more. You pay for the hosts you run, and the kernel does the aggregation before anything costs you a cent.
For homelabs, side projects, and learning eBPF.
For teams running real production at scale.
For regulated, air-gapped, and multi-region fleets.
No. Every program passes the in-kernel verifier, which proves it is bounded, terminating, and memory-safe before it loads — an unsafe program never runs. Probes are read-only, overhead is around 87 nanoseconds per syscall, and sampling backs off automatically under load. There is no kernel module to taint or panic your kernel.
No. Kerntrace attaches directly to the running kernel via kprobes, tracepoints, and uprobes. There is nothing to add to your application, no library to import, no sidecar, and no restart. You attach a probe to a process that is already running and start seeing events immediately.
Any Linux kernel from 5.4 onward with BTF enabled — which covers modern Ubuntu, Debian, RHEL/Rocky, Amazon Linux, and most managed Kubernetes nodes. Thanks to CO-RE, a single Kerntrace binary relocates against each host's own type information, so you never compile or ship a module per kernel version.
APMs trace inside your process and stop at the syscall boundary. Kerntrace starts there. It sees the time spent off-CPU, the blocking I/O, the TCP retransmits, and the page faults your application timer is blind to — the parts of a slow request that aren't your code at all.
Yes. Kerntrace speaks bpftrace-style one-liners, so the probes you already write in an ad-hoc session become saved, fleet-wide probes with retention and alerting. You keep the language you know and get a control plane around it.
Events are aggregated in-kernel through the perf ring buffer and rolled up on each host before anything leaves it. You can run Kerntrace entirely inside your own VPC or fully air-gapped, pin data residency by region, and export raw events whenever you like.
Attach your first probe in under a minute — no agent, no restart, no recompile. Watch the syscalls your dashboards have never shown you.