The Portable Hardware Locality (hwloc) software package provides a
portable abstraction (across OS, versions, architectures, ...) of the
hierarchical topology of modern architectures, including NUMA memory
nodes, sockets, shared caches, cores and simultaneous multithreading.
It also gathers various system attributes such as cache and memory
information as well as the locality of I/O devices such as network
interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping
applications with gathering information about modern computing
hardware so as to exploit it accordingly and efficiently.

Optional dependences:

numactl:
  libnuma provides conversion helpers between hwloc CPU setsand
  libnuma-specific types, such as bitmasks. It helps you use libnuma
  memory-binding functions with hwloc CPU sets.

cudatoolkit:
  CUDA enables interoperability with NVIDIA CUDA Driver and Runtime
  interfaces. For instance, it may return the list of processors near
  NVIDIA GPUs. Note that if I/O device discovery is enabled, GPUs may
  also appear as PCI objects in the topology.