diff options
author | 0xb10c <0xb10c@gmail.com> | 2021-05-20 15:21:34 +0200 |
---|---|---|
committer | 0xb10c <0xb10c@gmail.com> | 2021-07-27 16:32:01 +0200 |
commit | 84ace9aef116a05e034730f2bb2f109d1d77aac7 (patch) | |
tree | edc6fe1bf43e7ce91de13b3d3e4cf5b25db0e83f /doc/tracing.md | |
parent | 979f410e69a0350da8bf67f329f760d4dd3a4f44 (diff) |
doc: Add initial USDT documentation
Both added files are extended in the following commits.
doc/usdt.md is based on earlier work by laanwj.
Co-authored-by: W. J. van der Laan <laanwj@protonmail.com>
Diffstat (limited to 'doc/tracing.md')
-rw-r--r-- | doc/tracing.md | 204 |
1 files changed, 204 insertions, 0 deletions
diff --git a/doc/tracing.md b/doc/tracing.md new file mode 100644 index 0000000000..4c472b4154 --- /dev/null +++ b/doc/tracing.md @@ -0,0 +1,204 @@ +# User-space, Statically Defined Tracing (USDT) for Bitcoin Core + +Bitcoin Core includes statically defined tracepoints to allow for more +observability during development, debugging, code review, and production usage. +These tracepoints make it possible to keep track of custom statistics and +enable detailed monitoring of otherwise hidden internals. They have +little to no performance impact when unused. + +``` +eBPF and USDT Overview +====================== + + ┌──────────────────┐ ┌──────────────┐ + │ tracing script │ │ bitcoind │ + │==================│ 2. │==============│ + │ eBPF │ tracing │ hooks │ │ + │ code │ logic │ into┌─┤►tracepoint 1─┼───┐ 3. + └────┬───┴──▲──────┘ ├─┤►tracepoint 2 │ │ pass args + 1. │ │ 4. │ │ ... │ │ to eBPF + User compiles │ │ pass data to │ └──────────────┘ │ program + Space & loads │ │ tracing script │ │ + ─────────────────┼──────┼─────────────────┼────────────────────┼─── + Kernel │ │ │ │ + Space ┌──┬─▼──────┴─────────────────┴────────────┐ │ + │ │ eBPF program │◄──────┘ + │ └───────────────────────────────────────┤ + │ eBPF kernel Virtual Machine (sandboxed) │ + └──────────────────────────────────────────┘ + +1. The tracing script compiles the eBPF code and loads the eBPF program into a kernel VM +2. The eBPF program hooks into one or more tracepoints +3. When the tracepoint is called, the arguments are passed to the eBPF program +4. The eBPF program processes the arguments and returns data to the tracing script +``` + +The Linux kernel can hook into the tracepoints during runtime and pass data to +sandboxed [eBPF] programs running in the kernel. These eBPF programs can, for +example, collect statistics or pass data back to user-space scripts for further +processing. + +[eBPF]: https://ebpf.io/ + +The two main eBPF front-ends with support for USDT are [bpftrace] and +[BPF Compiler Collection (BCC)]. BCC is used for complex tools and daemons and +`bpftrace` is preferred for one-liners and shorter scripts. Examples for both can +be found in [contrib/tracing]. + +[bpftrace]: https://github.com/iovisor/bpftrace +[BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc +[contrib/tracing]: ../contrib/tracing/ + +## Tracepoint documentation + +The currently available tracepoints are listed here. + +## Adding tracepoints to Bitcoin Core + +To add a new tracepoint, `#include <util/trace.h>` in the compilation unit where +the tracepoint is inserted. Use one of the `TRACEx` macros listed below +depending on the number of arguments passed to the tracepoint. Up to 12 +arguments can be provided. The `context` and `event` specify the names by which +the tracepoint is referred to. Please use `snake_case` and try to make sure that +the tracepoint names make sense even without detailed knowledge of the +implementation details. Do not forget to update the tracepoint list in this +document. + +```c +#define TRACE(context, event) +#define TRACE1(context, event, a) +#define TRACE2(context, event, a, b) +#define TRACE3(context, event, a, b, c) +#define TRACE4(context, event, a, b, c, d) +#define TRACE5(context, event, a, b, c, d, e) +#define TRACE6(context, event, a, b, c, d, e, f) +#define TRACE7(context, event, a, b, c, d, e, f, g) +#define TRACE8(context, event, a, b, c, d, e, f, g, h) +#define TRACE9(context, event, a, b, c, d, e, f, g, h, i) +#define TRACE10(context, event, a, b, c, d, e, f, g, h, i, j) +#define TRACE11(context, event, a, b, c, d, e, f, g, h, i, j, k) +#define TRACE12(context, event, a, b, c, d, e, f, g, h, i, j, k, l) +``` + +For example: + +```C++ +TRACE6(net, inbound_message, + pnode->GetId(), + pnode->GetAddrName().c_str(), + pnode->ConnectionTypeAsString().c_str(), + sanitizedType.c_str(), + msg.data.size(), + msg.data.data() +); +``` + +### Guidelines and best practices + +#### Clear motivation and use-case +Tracepoints need a clear motivation and use-case. The motivation should +outweigh the impact on, for example, code readability. There is no point in +adding tracepoints that don't end up being used. + +#### Provide an example +When adding a new tracepoint, provide an example. Examples can show the use case +and help reviewers testing that the tracepoint works as intended. The examples +can be kept simple but should give others a starting point when working with +the tracepoint. See existing examples in [contrib/tracing/]. + +[contrib/tracing/]: ../contrib/tracing/ + +#### No expensive computations for tracepoints +Data passed to the tracepoint should be inexpensive to compute. Although the +tracepoint itself only has overhead when enabled, the code to compute arguments +is always run - even if the tracepoint is not used. For example, avoid +serialization and parsing. + +#### Semi-stable API +Tracepoints should have a semi-stable API. Users should be able to rely on the +tracepoints for scripting. This means tracepoints need to be documented, and the +argument order ideally should not change. If there is an important reason to +change argument order, make sure to document the change and update the examples +using the tracepoint. + +#### eBPF Virtual Machine limits +Keep the eBPF Virtual Machine limits in mind. eBPF programs receiving data from +the tracepoints run in a sandboxed Linux kernel VM. This VM has a limited stack +size of 512 bytes. Check if it makes sense to pass larger amounts of data, for +example, with a tracing script that can handle the passed data. + +#### `bpftrace` argument limit +While tracepoints can have up to 12 arguments, bpftrace scripts currently only +support reading from the first six arguments (`arg0` till `arg5`) on `x86_64`. +bpftrace currently lacks real support for handling and printing binary data, +like block header hashes and txids. When a tracepoint passes more than six +arguments, then string and integer arguments should preferably be placed in the +first six argument fields. Binary data can be placed in later arguments. The BCC +supports reading from all 12 arguments. + +#### Strings as C-style String +Generally, strings should be passed into the `TRACEx` macros as pointers to +C-style strings (a null-terminated sequence of characters). For C++ +`std::strings`, [`c_str()`] can be used. It's recommended to document the +maximum expected string size if known. + + +[`c_str()`]: https://www.cplusplus.com/reference/string/string/c_str/ + + +## Listing available tracepoints + +Multiple tools can list the available tracepoints in a `bitcoind` binary with +USDT support. + +### GDB - GNU Project Debugger + +To list probes in Bitcoin Core, use `info probes` in `gdb`: + +``` +$ gdb ./src/bitcoind +… +(gdb) info probes +Type Provider Name Where Semaphore Object +stap net inbound_message 0x000000000014419e /src/bitcoind +stap net outbound_message 0x0000000000107c05 /src/bitcoind +stap validation block_connected 0x00000000002fb10c /src/bitcoind +… +``` + +### With `readelf` + +The `readelf` tool can be used to display the USDT tracepoints in Bitcoin Core. +Look for the notes with the description `NT_STAPSDT`. + +``` +$ readelf -n ./src/bitcoind | grep NT_STAPSDT -A 4 -B 2 +Displaying notes found in: .note.stapsdt + Owner Data size Description + stapsdt 0x0000005d NT_STAPSDT (SystemTap probe descriptors) + Provider: net + Name: outbound_message + Location: 0x0000000000107c05, Base: 0x0000000000579c90, Semaphore: 0x0000000000000000 + Arguments: -8@%r12 8@%rbx 8@%rdi 8@192(%rsp) 8@%rax 8@%rdx +… +``` + +### With `tplist` + +The `tplist` tool is provided by BCC (see [Installing BCC]). It displays kernel +tracepoints or USDT probes and their formats (for more information, see the +[`tplist` usage demonstration]). There are slight binary naming differences +between distributions. For example, on +[Ubuntu the binary is called `tplist-bpfcc`][ubuntu binary]. + +[Installing BCC]: https://github.com/iovisor/bcc/blob/master/INSTALL.md +[`tplist` usage demonstration]: https://github.com/iovisor/bcc/blob/master/tools/tplist_example.txt +[ubuntu binary]: https://github.com/iovisor/bcc/blob/master/INSTALL.md#ubuntu---binary + +``` +$ tplist -l ./src/bitcoind -v +b'net':b'outbound_message' [sema 0x0] + 1 location(s) + 6 argument(s) +… +``` |