6 ----------------------------------------------------------------------
9 2000. -fno-omit-frame-pointer
10 ----------------------------------------------------------------------
12 Fedora 38 compiles (almost) everything with frame pointers, making
13 total system performance analysis much easier.
15 For details of why this is better, see my blog:
16 https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/
18 Unfortunately Fedora <= 37 and RHEL do not have frame pointers unless
19 you recompile your program and all library dependencies (including
23 2100. sudo perf record -a -g -- nbdkit -U - null 1G --run 'export uri; fio nbd.fio'
24 ----------------------------------------------------------------------
26 You can use perf to record the whole system while running a command.
28 -a => record the whole system
30 -g => use frame pointers to gather stack traces
32 On every CPU, 100s of times a second, an interrupt will fire and
33 the full stack trace is collected. If it's running in the kernel,
34 it collects the kernel stack and the userspace stack. If it's
35 running in userspace, only the userspace stack is collected.
38 2200/2250. flamegraph > analysis.svg
39 ----------------------------------------------------------------------
41 You can post-process the perf output to get a flamegraph, an
44 I wrote the flamegraph shell script to paper over some
45 unnecessary complexity in the tools.
48 3000. << a flamegraph >>
49 ----------------------------------------------------------------------
51 Opens in a web browser.
53 Interactive, click in for more detail.
59 ----------------------------------------------------------------------
61 Shows the width as a % of total system (non-sleeping) time.
65 3200. Height is stack depth
66 ----------------------------------------------------------------------
68 Not usually interesting.
72 ----------------------------------------------------------------------
74 Plateaus indicate functions consuming a lot of time.
76 Remember I said that every core fires hundreds of interrupts
77 and collects a stack trace? The stack trace is shown upside down
78 in a flamegraph, with the inner stack frame at the top.
80 So a plateau indicates a function that was actually running
81 when the interrupt happened.
83 (Show osq_lock and others)
85 (Describe MSR problem in UKL)
88 3400. Left to right is NOT time
89 ----------------------------------------------------------------------
91 Stack frames are ordered alphabetically
94 3500. Wide function with wide function on top
95 ----------------------------------------------------------------------
97 Time consumed in the inner function, not the outer function.
100 3600. Kernel threads and other "disconnected" or "background" work
101 ----------------------------------------------------------------------
103 Noisy machine, running firefox.
105 Note kernel threads (kcryptd) taking longer than the task being measured.
108 3700. Sleeping / blocked time is not recorded
109 3750. Sleeping / blocked time is not recorded
110 ----------------------------------------------------------------------
114 Show unexpected graph.
116 Unless you take special steps. The easiest is to use the perf
117 --off-cpu flag, but it is still not supported in the Fedora perf
122 4000. What kind of things can be revealed?
123 ----------------------------------------------------------------------
127 - Inefficient algorithms like zlib inflate
129 - MSRs / slow CPU features
131 - Serialization through a CPU
137 - Excessive mmap/munmap (from memory allocations)
139 - Opportunities to short-cut through the stack
143 ----------------------------------------------------------------------
155 - Watch out for disconnected / background work
157 - perf --off-cpu (in future)
159 - Many fascinating, revealing insights
161 The backbone of performance work in UKL. Students spent literally
162 months studying flamegraphs (and other perf-adjacent tools) to get