doc/tracing.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419

# User-space, Statically Defined Tracing (USDT) for Bitcoin Core

Bitcoin Core includes statically defined tracepoints to allow for more
observability during development, debugging, code review, and production usage.
These tracepoints make it possible to keep track of custom statistics and
enable detailed monitoring of otherwise hidden internals. They have
little to no performance impact when unused.

```
eBPF and USDT Overview
======================

                ┌──────────────────┐            ┌──────────────┐
                │ tracing script   │            │ bitcoind     │
                │==================│      2.    │==============│
                │  eBPF  │ tracing │      hooks │              │
                │  code  │ logic   │      into┌─┤►tracepoint 1─┼───┐ 3.
                └────┬───┴──▲──────┘          ├─┤►tracepoint 2 │   │ pass args
            1.       │      │ 4.              │ │ ...          │   │ to eBPF
    User    compiles │      │ pass data to    │ └──────────────┘   │ program
    Space    & loads │      │ tracing script  │                    │
    ─────────────────┼──────┼─────────────────┼────────────────────┼───
    Kernel           │      │                 │                    │
    Space       ┌──┬─▼──────┴─────────────────┴────────────┐       │
                │  │  eBPF program                         │◄──────┘
                │  └───────────────────────────────────────┤
                │ eBPF kernel Virtual Machine (sandboxed)  │
                └──────────────────────────────────────────┘

1. The tracing script compiles the eBPF code and loads the eBPF program into a kernel VM
2. The eBPF program hooks into one or more tracepoints
3. When the tracepoint is called, the arguments are passed to the eBPF program
4. The eBPF program processes the arguments and returns data to the tracing script
```

The Linux kernel can hook into the tracepoints during runtime and pass data to
sandboxed [eBPF] programs running in the kernel. These eBPF programs can, for
example, collect statistics or pass data back to user-space scripts for further
processing.

[eBPF]: https://ebpf.io/

The two main eBPF front-ends with support for USDT are [bpftrace] and
[BPF Compiler Collection (BCC)]. BCC is used for complex tools and daemons and
`bpftrace` is preferred for one-liners and shorter scripts. Examples for both can
be found in [contrib/tracing].

[bpftrace]: https://github.com/iovisor/bpftrace
[BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc
[contrib/tracing]: ../contrib/tracing/

## Tracepoint documentation

The currently available tracepoints are listed here.

### Context `net`

#### Tracepoint `net:inbound_message`

Is called when a message is received from a peer over the P2P network. Passes
information about our peer, the connection and the message as arguments.

Arguments passed:
1. Peer ID as `int64`
2. Peer Address and Port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
4. Message Type (inv, ping, getdata, addrv2, ...) as `pointer to C-style String` (max. length 20 characters)
5. Message Size in bytes as `uint64`
6. Message Bytes as `pointer to unsigned chars` (i.e. bytes)

Note: The message is passed to the tracepoint in full, however, due to space
limitations in the eBPF kernel VM it might not be possible to pass the message
to user-space in full. Messages longer than a 32kb might be cut off. This can
be detected in tracing scripts by comparing the message size to the length of
the passed message.

#### Tracepoint `net:outbound_message`

Is called when a message is sent to a peer over the P2P network. Passes
information about our peer, the connection and the message as arguments.

Arguments passed:
1. Peer ID as `int64`
2. Peer Address and Port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
4. Message Type (inv, ping, getdata, addrv2, ...) as `pointer to C-style String` (max. length 20 characters)
5. Message Size in bytes as `uint64`
6. Message Bytes as `pointer to unsigned chars` (i.e. bytes)

Note: The message is passed to the tracepoint in full, however, due to space
limitations in the eBPF kernel VM it might not be possible to pass the message
to user-space in full. Messages longer than a 32kb might be cut off. This can
be detected in tracing scripts by comparing the message size to the length of
the passed message.

### Context `validation`

#### Tracepoint `validation:block_connected`

Is called *after* a block is connected to the chain. Can, for example, be used
to benchmark block connections together with `-reindex`.

Arguments passed:
1. Block Header Hash as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Block Height as `int32`
3. Transactions in the Block as `uint64`
4. Inputs spend in the Block as `int32`
5. SigOps in the Block (excluding coinbase SigOps) `uint64`
6. Time it took to connect the Block in nanoseconds (ns) as `uint64`

### Context `utxocache`

The following tracepoints cover the in-memory UTXO cache. UTXOs are, for example,
added to and removed (spent) from the cache when we connect a new block.
**Note**: Bitcoin Core uses temporary clones of the _main_ UTXO cache
(`chainstate.CoinsTip()`). For example, the RPCs `generateblock` and
`getblocktemplate` call `TestBlockValidity()`, which applies the UTXO set
changes to a temporary cache. Similarly, mempool consistency checks, which are
frequent on regtest, also apply the UTXO set changes to a temporary cache.
Changes to the _main_ UTXO cache and to temporary caches trigger the tracepoints.
We can't tell if a temporary cache or the _main_ cache was changed.

#### Tracepoint `utxocache:flush`

Is called *after* the in-memory UTXO cache is flushed.

Arguments passed:
1. Time it took to flush the cache microseconds as `int64`
2. Flush state mode as `uint32`. It's an enumerator class with values `0`
   (`NONE`), `1` (`IF_NEEDED`), `2` (`PERIODIC`), `3` (`ALWAYS`)
3. Cache size (number of coins) before the flush as `uint64`
4. Cache memory usage in bytes as `uint64`
5. If pruning caused the flush as `bool`

#### Tracepoint `utxocache:add`

Is called when a coin is added to a UTXO cache. This can be a temporary UTXO cache too.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Output index as `uint32`
3. Block height the coin was added to the UTXO-set as  `uint32`
4. Value of the coin as `int64`
5. If the coin is a coinbase as `bool`

#### Tracepoint `utxocache:spent`

Is called when a coin is spent from a UTXO cache. This can be a temporary UTXO cache too.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Output index as `uint32`
3. Block height the coin was spent, as `uint32`
4. Value of the coin as `int64`
5. If the coin is a coinbase as `bool`

#### Tracepoint `utxocache:uncache`

Is called when a coin is purposefully unloaded from a UTXO cache. This
happens, for example, when we load an UTXO into a cache when trying to accept
a transaction that turns out to be invalid. The loaded UTXO is uncached to avoid
filling our UTXO cache up with irrelevant UTXOs.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Output index as `uint32`
3. Block height the coin was uncached, as `uint32`
4. Value of the coin as `int64`
5. If the coin is a coinbase as `bool`

### Context `coin_selection`

#### Tracepoint `coin_selection:selected_coins`

Is called when `SelectCoins` completes.

Arguments passed:
1. Wallet name as `pointer to C-style string`
2. Coin selection algorithm name as `pointer to C-style string`
3. Selection target value as `int64`
4. Calculated waste metric of the solution as `int64`
5. Total value of the selected inputs as `int64`

#### Tracepoint `coin_selection:normal_create_tx_internal`

Is called when the first `CreateTransactionInternal` completes.

Arguments passed:
1. Wallet name as `pointer to C-style string`
2. Whether `CreateTransactionInternal` succeeded as `bool`
3. The expected transaction fee as an `int64`
4. The position of the change output as an `int32`

#### Tracepoint `coin_selection:attempting_aps_create_tx`

Is called when `CreateTransactionInternal` is called the second time for the optimistic
Avoid Partial Spends selection attempt. This is used to determine whether the next
tracepoints called are for the Avoid Partial Spends solution, or a different transaction.

Arguments passed:
1. Wallet name as `pointer to C-style string`

#### Tracepoint `coin_selection:aps_create_tx_internal`

Is called when the second `CreateTransactionInternal` with Avoid Partial Spends enabled completes.

Arguments passed:
1. Wallet name as `pointer to C-style string`
2. Whether the Avoid Partial Spends solution will be used as `bool`
3. Whether `CreateTransactionInternal` succeeded as` bool`
4. The expected transaction fee as an `int64`
5. The position of the change output as an `int32`

### Context `mempool`

#### Tracepoint `mempool:added`

Is called when a transaction is added to the node's mempool. Passes information
about the transaction.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Transaction virtual size as `int32`
3. Transaction fee as `int64`

#### Tracepoint `mempool:removed`

Is called when a transaction is removed from the node's mempool. Passes information
about the transaction.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Removal reason as `pointer to C-style String` (max. length 9 characters)
3. Transaction virtual size as `int32`
4. Transaction fee as `int64`
5. Transaction mempool entry time (epoch) as `uint64`

#### Tracepoint `mempool:replaced`

Is called when a transaction in the node's mempool is getting replaced by another.
Passes information about the replaced and replacement transactions.

Arguments passed:
1. Replaced transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Replaced transaction virtual size as `int32`
3. Replaced transaction fee as `int64`
4. Replaced transaction mempool entry time (epoch) as `uint64`
5. Replacement transaction ID or package hash as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
6. Replacement transaction virtual size as `int32`
7. Replacement transaction fee as `int64`
8. `bool` indicating if the argument 5. is a transaction ID or package hash (true if it's a transaction ID)

Note: In cases where a replacement transaction or package replaces multiple
existing transactions in the mempool, the tracepoint is called once for each
replaced transaction, with data of the replacement transaction or package
being the same in each call.

#### Tracepoint `mempool:rejected`

Is called when a transaction is not permitted to enter the mempool. Passes
information about the rejected transaction.

Arguments passed:
1. Transaction ID (hash) as `pointer to unsigned chars` (i.e. 32 bytes in little-endian)
2. Reject reason as `pointer to C-style String` (max. length 118 characters)

## Adding tracepoints to Bitcoin Core

Use the `TRACEPOINT` macro to add a new tracepoint. If not yet included, include
`util/trace.h` (defines the tracepoint macros) with `#include <util/trace.h>`.
Each tracepoint needs a `context` and an `event`. Please use `snake_case` and
try to make sure that the tracepoint names make sense even without detailed
knowledge of the implementation details. You can pass zero to twelve arguments
to the tracepoint. Each tracepoint also needs a global semaphore. The semaphore
gates the tracepoint arguments from being processed if we are not attached to
the tracepoint. Add a `TRACEPOINT_SEMAPHORE(context, event)` with the `context`
and `event` of your tracepoint in the top-level namespace at the beginning of
the file. Do not forget to update the tracepoint list in this document.

For example, the `net:outbound_message` tracepoint in `src/net.cpp` with six
arguments.

```C++
// src/net.cpp
TRACEPOINT_SEMAPHORE(net, outbound_message);
…
void CConnman::PushMessage(…) {
  …
  TRACEPOINT(net, outbound_message,
      pnode->GetId(),
      pnode->m_addr_name.c_str(),
      pnode->ConnectionTypeAsString().c_str(),
      sanitizedType.c_str(),
      msg.data.size(),
      msg.data.data()
  );
  …
}
```
If needed, an extra `if (TRACEPOINT_ACTIVE(context, event)) {...}` check can be
used to prepare somewhat expensive arguments right before the tracepoint. While
the tracepoint arguments are only prepared when we attach something to the
tracepoint, an argument preparation should never hang the process. Hashing and
serialization of data structures is probably fine, a `sleep(10s)` not.

```C++
// An example tracepoint with an expensive argument.

TRACEPOINT_SEMAPHORE(example, gated_expensive_argument);
…
if (TRACEPOINT_ACTIVE(example, gated_expensive_argument)) {
    expensive_argument = expensive_calulation();
    TRACEPOINT(example, gated_expensive_argument, expensive_argument);
}
```

### Guidelines and best practices

#### Clear motivation and use case
Tracepoints need a clear motivation and use case. The motivation should
outweigh the impact on, for example, code readability. There is no point in
adding tracepoints that don't end up being used.

#### Provide an example
When adding a new tracepoint, provide an example. Examples can show the use case
and help reviewers testing that the tracepoint works as intended. The examples
can be kept simple but should give others a starting point when working with
the tracepoint. See existing examples in [contrib/tracing/].

[contrib/tracing/]: ../contrib/tracing/

#### Semi-stable API
Tracepoints should have a semi-stable API. Users should be able to rely on the
tracepoints for scripting. This means tracepoints need to be documented, and the
argument order ideally should not change. If there is an important reason to
change argument order, make sure to document the change and update the examples
using the tracepoint.

#### eBPF Virtual Machine limits
Keep the eBPF Virtual Machine limits in mind. eBPF programs receiving data from
the tracepoints run in a sandboxed Linux kernel VM. This VM has a limited stack
size of 512 bytes. Check if it makes sense to pass larger amounts of data, for
example, with a tracing script that can handle the passed data.

#### `bpftrace` argument limit
While tracepoints can have up to 12 arguments, bpftrace scripts currently only
support reading from the first six arguments (`arg0` till `arg5`) on `x86_64`.
bpftrace currently lacks real support for handling and printing binary data,
like block header hashes and txids. When a tracepoint passes more than six
arguments, then string and integer arguments should preferably be placed in the
first six argument fields. Binary data can be placed in later arguments. The BCC
supports reading from all 12 arguments.

#### Strings as C-style String
Generally, strings should be passed into the `TRACEPOINT` macros as pointers to
C-style strings (a null-terminated sequence of characters). For C++
`std::strings`, [`c_str()`]  can be used. It's recommended to document the
maximum expected string size if known.


[`c_str()`]: https://www.cplusplus.com/reference/string/string/c_str/


## Listing available tracepoints

Multiple tools can list the available tracepoints in a `bitcoind` binary with
USDT support.

### GDB - GNU Project Debugger

To list probes in Bitcoin Core, use `info probes` in `gdb`:

```
$ gdb ./build/src/bitcoind
…
(gdb) info probes
Type Provider   Name             Where              Semaphore Object
stap net        inbound_message  0x000000000014419e 0x0000000000d29bd2 /build/src/bitcoind
stap net        outbound_message 0x0000000000107c05 0x0000000000d29bd0 /build/src/bitcoind
stap validation block_connected  0x00000000002fb10c 0x0000000000d29bd8 /build/src/bitcoind
…
```

### With `readelf`

The `readelf` tool can be used to display the USDT tracepoints in Bitcoin Core.
Look for the notes with the description `NT_STAPSDT`.

```
$ readelf -n ./build/src/bitcoind | grep NT_STAPSDT -A 4 -B 2
Displaying notes found in: .note.stapsdt
  Owner                 Data size	Description
  stapsdt              0x0000005d	NT_STAPSDT (SystemTap probe descriptors)
    Provider: net
    Name: outbound_message
    Location: 0x0000000000107c05, Base: 0x0000000000579c90, Semaphore: 0x0000000000d29bd0
    Arguments: -8@%r12 8@%rbx 8@%rdi 8@192(%rsp) 8@%rax 8@%rdx
…
```

### With `tplist`

The `tplist` tool is provided by BCC (see [Installing BCC]). It displays kernel
tracepoints or USDT probes and their formats (for more information, see the
[`tplist` usage demonstration]). There are slight binary naming differences
between distributions. For example, on
[Ubuntu the binary is called `tplist-bpfcc`][ubuntu binary].

[Installing BCC]: https://github.com/iovisor/bcc/blob/master/INSTALL.md
[`tplist` usage demonstration]: https://github.com/iovisor/bcc/blob/master/tools/tplist_example.txt
[ubuntu binary]: https://github.com/iovisor/bcc/blob/master/INSTALL.md#ubuntu---binary

```
$ tplist -l ./build/src/bitcoind -v
b'net':b'outbound_message' [sema 0xd29bd0]
  1 location(s)
  6 argument(s)
…
```