umr - Man Page

AMDGPU Userspace Register Debugger

Description

umr is a tool to read and display, as well as write to AMDGPU device MMIO, PCIE, SMC, and DIDT registers via userspace.  It can autodetect and scan AMDGPU devices (SI and up).

Device Selection

--database-path,  -dbp <path>

Specify a database path for register, ip, and asic model data.

--gpu,  -g <asicname>(@<instance> | =<pcidevice>)

Select a gpu by ASIC name and either the instance number or the PCI bus identifier.  For instance, "raven1@1" would pick the raven1 device in the 2nd DRI instance slot.  Similarly, "raven1=0000:06:00.0" would pick a raven1 device with the PCI bus address '0000:06:00.0'.

--instance,  -i <number>

Pick a device instance to work with.  Defaults to the 0'th device.  The instance refers to a directory under /sys/kernel/debug/dri/ where 0 is the first card probed.

--force,  -f <number>

Force a PCIE Device ID in hex or by asic name.  This is used in case the amdgpu driver is not yet loaded or a display is not yet attached.  A '.' prefix will specify a virtual device which is handy for looking up register decodings for a device not present in the system, for instance, '.vega10'.

--pci <device>

Force a specific PCI device using the domain:bus:slot.function format in hex. This is useful when more than one GPU is available. If the amdgpu driver is loaded the corresponding instance will be automatically detected.

--by-pci <device>

Like --pci but still uses the traditional debugfs path to interface with the hardware.  This is useful for interacting with APIs that identify hardware by the PCI bus address.

--gfxoff,  -go <0 | 1>

Turn on or off GFXOFF on select hardware.  A non-zero value enables the GFXOFF feature and a zero value disables it.

--vm-partition,  -vmp <-1, 0...n>

Select a VM partition for all GPUVM accesses.  Default is -1 which refers to the 0'th instance of the VM hub which is not the same as specifying '0'.  Values above -1 are for ASICs with multiple IP instances.

--vgpr-granularity,  -vgpr <-1, 0...n>

Specify the VGPR size granularity as a power of 2, e.g., '2' means 4 DWORDs per increment.

--option,  -O <string>[,<string>,...]

Specify options to the tool.  Multiple options can be specified as comma separated strings.  Options should be specified before --update or --force commands (among others) to enable options specified.

quiet
    Disable various informative but not required (for functionality) outputs.

read_smc
    Enable scanning of SMC registers.

bits
    enables displaying bitfields for scanned blocks.

bitsfull
    enables displaying bitfields using their entire path for scanned blocks.

empty_log
    Empties the MMIO log after reading it.

follow
    Causes the --logscan command to repeatedly produce output without
    exiting.

no_follow_ib
    Instruct the --ring-stream command to not attempt to follow IBs pointed to by the packets
    in the ring.

use_pci
    Enable PCI access for MMIO instead of using debugfs.  Used by the --read,
    --scan, --top, --write, and --write-bit commands.  Does not currently
    support multiple instances of the same GPU (PCI device ID).  Note that access
    to non-MMIO registers might be disabled when using this flag.

use_colour
    Enable colour output for --top command, scales from blue, green, yellow, to red.  Also
    accepted is 'use_color'.

no_kernel
    Disable using kernel files to access the device.  Implies ''use_pci''.  This is meant to
    be used only if the KMD is hung or otherwise not working correctly.  Using it on live systems
    may result in race conditions.

verbose
    Enable verbose diagnostics (used in --vram).

halt_waves
    Halt/resume all waves while reading wave status.

disasm_early_term
    Terminate shader disassembly when first s_endpgm is hit.  This is required for
    older UMDs (or non-mesa UMDs) that don't use the quintuple 0xBF9F0000 to signal the true
    end of a shader.

no_disasm
    Disable shader disassembler logic (still outputs text just doesn't use LLVM to decode).  Useful
    if the linked llvm-dev doesn't support the hardware being debugged.  Avoids segfualts/asserts.

disasm_anyways
    Enable shader disassembly in --waves even if the rings aren't halted.

wave64
    Enable full 64 wave disassembly

full_shader
    Enable full shader disassembly in --waves when '-O bits' is used and the shader is found in
    a gfx or compute ring.

no_fold_vm_decode
   Disable folding of PDEs when VM decoding multiple pages of memory.  By default,
   when subsequent pages are decoded if PDEs match previous pages they are omitted to cut down
   on the verbosity of the output.  This option disables this and will print the full chain of
   PDEs for every page decoded.

no_scan_waves
  Disable scanning wave data during --ring-stream output.

force_asic_file
  Force using a database .asic file matching in pci.did instead of IP discovery.

Bank Selection

--bank,  -b <se> <sh> <instance>

Select a GRBM se/sh/instance bank in decimal.  Can use 'x' to denote a broadcast selection.

--sbank,  -sb <me> <pipe> <queue> [vmid]

Select a SRBM me/pipe/queue bank in decimal.  VMID is optional (default: 0).

--cbank,  -cb <context_reg_bank>

Select a context register bank (value is multiplied by 0x1000).  Used for context registers in the range 0xA000..0xAFFF.

Device Information

--config,  -c

Print out configuation data read from kernel driver.

--enumerate,  -e

Enumerate all AMDGPU supported devices.

--list-blocks -lb

List all blocks attached to the asic that have been detected.

--list-regs,  -lr <string>

List all registers in an IP block (can use '-O bits' to list bitfields)

Register Access

--lookup,  -lu <address_or_regname> <number>

Look up an MMIO register by address and bitfield decode the value specified (with 0x prefix) or by register name.  The register name string must include the ipname, e.g., uvd6.mmUVD_CONTEXT_ID.

--write -w <string> <number>

Write a value specified in hex to a register specified with a complete register path in the form < asicname.ipname.regname >.  For example, fiji.uvd6.mmUVD_CGC_GATE.  The value of asicname and/or ipname can be * to simplify scripting.  This command can be used multiple times to write to multiple registers in a single invocation.

--writebit -wb <string> <number>

Write a value specified in hex to a register bitfield specified with a complete register path as in the --write command.

--read,  -r <string>

Read a value from a register specified by a register path to stdout. This command uses the same syntax as the --write command but also allows * for the regname field to read an entire block.  Additionally, a * can be appended to a register name to read any register that contains a partial match.  For instance, "*.vcn10.ADDR*" would read any register from the 'VCN10' block which contains 'ADDR' in the name.

--scan,  -s <string>

Scan and print an IP block by name, for example, uvd6 or carrizo.uvd6. Can be used multiple times in a single invocation.

Device Utilization

--top,  -t

Summarize GPU utilization.  Can select a SE block with --bank.  Relevant options that apply are: use_colour and use_pci

--waves,  -wa [ <ring_name> | <vmid>@<addr>.<size> ]

Print out information about any active CU waves.  Note that if GFX power gating is enabled this command may result in a GPU hang.  It's unlikely unless you're invoking it very rapidly.  Unlike the wave count reading in --top this command will operate regardless of whether GFX PG is enabled or not.  Can use bits to decode the wave bitfields.  An optional ring name can be specified (default: gfx) to search for pointers to active shaders to find extra debugging information.  Alternatively, an IB can be specified by a vmid, address, and size (in hex bytes) triplet.

--profiler,  -prof [pixel= | vertex= | compute=]<nsamples> [ring]

Capture 'nsamples' samples of wave data.  Optionally specify a ring to use when searching for IBs that point to shaders.  Defaults to 'gfx'.  Additionally, the type of shader can be selected for as well to only profile a given type of shader.

Virtual Memory Access

VMIDs are specified in umr as 16 bit numbers where the lower 8 bits indicate the hardware VMID and the upper 8 bits indicate the which VM space to use.

0 - GFX hub

1 - MM hub

2 - VC0 hub

3 - VC1 hub

For instance, 0x107 would specify the 7'th VMID on the MM hub.

--vm-decode,  -vm vmid@<address> <num_of_pages>

Decode page mappings at a specified address (in hex) from the VMID specified. The VMID can be specified in hexadecimal (with leading '0x') or in decimal. Implies '-O verbose' for the duration of the command so does not require it to be manually specified.

--vm-read,  -vr [vmid@]<address> <size>

Read 'size' bytes (in hex) from the address specified (in hexadecimal) from VRAM to stdout.  Optionally specify the VMID (in decimal or in hex with a 0x prefix) treating the address as a virtual address instead.  Can use 'use_pci' to directly access VRAM.

--vm-write,  -vw [vmid@]<address> <size>

Write 'size' bytes (in hex) to the address specified (in hexadecimal) to VRAM from stdin.

--vm-write-word,  -vww [vmid@]<address> <data>

Write a 32-bit word 'data' (in hex) to a given address (in hex) in host machine order.

--vm-disasm,  -vdis [<vmid>@]<address> <size>

Disassemble 'size' bytes (in hex) from a given address (in hex).  The size can be specified as zero to have umr try and compute the shader size.

Ring and PM4 Decoding

--ring-stream,  -RS <string>[range]

Read the contents of the ring named by the string amdgpu_ring_<string>, i.e. without the amdgpu_ring prefix. By default it reads and prints the entire ring.  A range is optional and has the format '[start:end]'. The starting and ending address are non-negative integers or the '.' (dot) symbol, which indicates the rptr when on the left side and wptr when on the right side of the range. For instance, "-RS gfx" prints the entire gfx ring, "-R gfx[0:16]" prints the contents from 0 to 16 inclusively, and "-RS gfx[.]" or "-RS gfx[.:.]" prints the range [rptr,wptr]. When one of the range limits is a number while the other is the dot, '.', then the number indicates the relative range before or after the corresponding ring pointer. For instance, "-RS sdma0[16:.]" prints [wptr-16, wptr] words of the SDMA0 ring, and "-RS sdma1[.:32]" prints [rptr, rptr+32] double-words of the SDMA1 ring. The contents of the ring is always interpreted, if it can be interpreted.

--dump-ib,  -di [vmid@]address length [pm]

Dump an IB packet at an address with an optional VMID.  The length is specified in bytes.  The type of decoder <pm> is optional and defaults to PM4 packets. Can specify '4' for PM4 packets, '3' for SDMA packets, '2' for MES packets, '1' for VPE packets, and '5' for UMSCH packets.

--dump-ib-file,  -df filename [pm]

Dump an IB stored in a file as a series of hexadecimal DWORDS one per line.  If the filename ends in .bin the file is treated as binary, if the filename ends in .ring it treats it as a ring copy and skips the first 12 bytes.  Can optionally specify '3' for SDMA packets, '2' for MES packets, '1' for VPE packets, and '5' for UMSCH packets.  The default is PM4.

--header-dump,  -hd [HEADER_DUMP_reg]

Dump the contents of the HEADER_DUMP buffer and decode the opcode into a human readable string.

--print-cpc,  -cpc

Dump CPC register data.

--print-sdma,  -sdma

Dump SDMA register data.

--logscan,  -ls

Read and display contents of the MMIO register log.  Usually specified with '-O bits,follow,empty_log' to enable continual dumping of the trace log.

Power and Clock

--power,  -p

Read the content of clocks, temperature, gpu loading at runtime options 'use_colour' to colourize output.

--clock-scan -cs [clock]

Scan the current hierarchy value of each clock.  Default will list all the hierarchy value of clocks. otherwise will list the corresponding clock, eg. sclk.

--clock-manual,  -cm [clock] [value]

Set the value of the corresponding clock.  Use -cs command to check hierarchy values of clock and then use -cm value to set the clock.

--clock-high,  -ch

Set power_dpm_force_performance_level to high.

--clock-low,  -cl

Set power_dpm_force_performance_level to low.

--clock-auto,  -ca

Set power_dpm_force_performance_level to auto.

--ppt-read,  -pptr [ppt_field_name]

Read powerplay table value and print it to stdout.  This command will print all the powerplay table information or the corresponding string in powerplay table.

--gpu-metrics,  -gm [delay]

Print the GPU metrics table for the device, optionally continuously read every 'delay' milliseconds.

--power,  -p

Read the conetent of clocks, temperature, gpu loading at runtime options 'use_colour' to colourize output.

Video BIOS Information

--vbios-info,  -vi

Print Video BIOS information

Test Vector Generation

--test-log,  -tl <filename>

Log all MMIO/memory reads to a file.

--test-harness,  -th <filename>

Use a test harness file instead of reading from hardware.

RUMR Commands

--rumr-client <server>

Run as a RUMR client connecting to 'server', e.g. tcp://127.0.0.1:9000.  You can also use the 'RUMR_SERVER_ADDR' environment variable to instruct umr to connect as a client.  With the environment variable set you don't need to specify --rumr-client.

--rumr-server <server>

Run as a RUMR server binding to 'server', e.g. tcp://127.0.0.1:9000.

KFD Support

--runlist,  -rls <node>

Dump any runlists for a given KFD node specified.

Notes

- The "Waves" field in the DRM section of --top only works if GFX PG has been disabled.  Otherwise, GPU hangs occur frequently.  When PG is enabled it will read a constant 0.

Environmental Variables

UMR_LOGGER
   Directory to output "umr.log" file when capturing samples with the --top command.

UMR_DATABASE_PATH
   Should be set to the top directory of the database tree used for register, IP, and ASIC model data.

RUMR_SERVER_ADDR
   Specifies the server address the rumr client should connect to.  This can be set to avoid needing to add --rumr-client to the command line.

Files

${CMAKE_INSTALL_PREFIX}/share/bash-completion/completions/umr contains completion for bash shells. You'd normally source this file in your ~/.bashrc.

${CMAKE_INSTALL_PREFIX}/share/umr/database contains database files for ASICs, IPs, and registers. UMR_DATABASE_PATH is usually set to point to here.

Info

February 2022 AMD (c) 2022 User Manuals