ipmctl-start-diagnostic - Man Page

Starts a diagnostic test

Synopsis

ipmctl start [OPTIONS] -diagnostic [TARGETS]

Description

Starts a diagnostic test.

Options

-h,  -help

Displays help for the command.

-ddrt

Used to specify DDRT as the desired transport protocol for the current invocation of ipmctl.

-smbus

Used to specify SMBUS as the desired transport protocol for the current invocation of ipmctl.
Note

The -ddrt and -smbus options are mutually exclusive and may not be used together.

-lpmb

Used to specify large transport payload size for the current invocation of ipmctl.

-spmb

Used to specify small transport payload size for the current invocation of ipmctl.
Note

The -lpmb and -spmb options are mutually exclusive and may not be used together.

-o (text|nvmxml), -output (text|nvmxml)

Changes the output format. One of: "text" (default) or "nvmxml".

Targets

-diagnostic [Quick|Config|Security|FW]

Start a specific test by supplying its name. All tests are run by default. One of:

  • "Quick" - This test verifies that the PMem module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.
  • "Config" - This test verifies that the BIOS platform configuration matches the installed hardware and the platform configuration conform to best known practices.
  • "Security" - This test verifies that all PMem modules have a consistent security state. It is a best practice to enable security on all PMem modules rather than just some.
  • "FW" - This test verifies that all PMem modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.
    Note that the test does not have a means of verifying that the installed FW is the optimal version for a given PMem module model just that it has been consistently applied across the system.
-dimm [DimmIDS]

Starts a diagnostic test on specific PMem modules by optionally supplying one or more comma separated PMem module identifiers. The default is to start the specified tests on all manageable PMem modules. Only valid for the Quick diagnostic test.

Examples

Starts all diagnostics.

ipmctl start -diagnostic

Starts the quick check diagnostic on PMem module 0x0001.

ipmctl start -diagnostic Quick -dimm 0x0001

Limitations

If a PMem module is unmanageable, then Quick test will report the reason, while Config, Security and FW tests will skip unmanageable PMem modules.

Return Data

Each diagnostic generates one or more log messages. A successful test generates a single log message per PMem module indicating that no errors were found. A failed test might generate multiple log messages each highlighting a specific error with all the relevant details. Each log contains the following information.

Test

The test name along with overall execution result. One of:

  • "Quick"
  • "Config"
  • "Security"
  • "FW"
State

The collective result state for each test. One of:

  • "Ok"
  • "Warning"
  • "Failed"
  • "Aborted"
Message

The message indicates the status of the test. One of:

  • "Ok"
  • "Failed"
SubTestName

The subtest name for given Test.

Test NameValid SubTest Names
Quick
  • Manageability
  • Boot status
  • Health
Config
  • PMem module specs
  • Duplicate PMem module
  • System Capability
  • Namespace LSA
  • PCD
Security
  • Encryption status
  • Inconsistency
FW
  • FW Consistency
  • Viral Policy
  • Threshold check
  • System Time
State

The severity of the error for each sub-test displayed with SubTestName. One of:

  • "Ok"
  • "Warning"
  • "Failed"
  • "Aborted"

Events are generated as a result of invoking the Start Diagnostics command in order to analyze the Intel™ Optane™ PMem module for potential issues.

Diagnostic events may fall into the following categories:

Each event includes the following pieces of information:

The following sections list each of the possible events grouped by category of the event.

Quick Health Check Events

The quick health check diagnostic verifies that the Intel™ Optane™ PMem module’s host mailboxes are accessible and that basic health indicators can be read and are currently reporting acceptable values.

Table 1. Table Quick Health Check Events

CodeSeverityMessageArguments
500InfoThe quick health check succeeded.
501WarningThe quick health check detected that PMem module [1] is not manageable because subsystem vendor ID [2] is not supported. UID: [3]
  1. PMem module Handle
  2. Subsystem Vendor ID
  3. PMem module UID
502WarningThe quick health check detected that PMem module [1] is not manageable because subsystem device ID [2] is not supported. UID: [3]
  1. PMem module Handle
  2. Subsystem Device ID
  3. PMem module UID
503WarningThe quick health check detected that PMem module [1] is not manageable because firmware API version [2] is not supported. UID: [3]
  1. PMem module Handle
  2. FW API version
  3. PMem module UID
504WarningThe quick health check detected that PMem module [1] is reporting a bad health state [2]. UID: [3]
  1. PMem module Handle
  2. Actual Health State
  3. PMem module UID
505WarningThe quick health check detected that PMem module [1] is reporting a media temperature of [2] C which is above the alarm threshold [3] C. UID: [4]
  1. PMem module Handle
  2. Actual Media Temperature
  3. Media Temperature Threshold
  4. PMem module UID
506WarningThe quick health check detected that PMem module [1] is reporting percentage remaining at [2]% which is less than the alarm threshold [3]%. UID: [4]
  1. PMem module Handle
  2. Actual Percentage Remaining
  3. Percentage Remaining Threshold
  4. PMem module UID
507WarningThe quick health check detected that PMem module [1] is reporting reboot required. UID: [2]
  1. PMem module Handle
  2. PMem module UID
511WarningThe quick health check detected that PMem module [1] is reporting a controller temperature of [2] C which is above the alarm threshold [3] C. UID: [4]
  1. PMem module Handle
  2. Actual Controller Temperature
  3. Controller Temperature Threshold
  4. PMem module UID
513ErrorThe quick health check detected that the boot status register of PMem module [1] is not readable. UID: [2]
  1. PMem module Handle
  2. PMem module UID
514ErrorThe quick health check detected that the firmware on PMem module [1] is reporting that the media is not ready. UID: [2]
  1. PMem module Handle
  2. PMem module UID
515ErrorThe quick health check detected that the firmware on PMem module [1] is reporting an error in the media. UID: [2]
  1. PMem module Handle
  2. PMem module UID
519ErrorThe quick health check detected that PMem module [1] failed to initialize BIOS POST testing. UID: [2]
  1. PMem module Handle
  2. PMem module UID
520ErrorThe quick health check detected that the firmware on PMem module [1] has not initialized successfully. The last known Major:Minor Checkpoint is [2]. UID: [3]
  1. PMem module Handle
  2. Major checkpoint : Minor checkpoint in Boot Status Register
  3. PMem module UID
523ErrorThe quick health check detected that PMem module [1] is reporting a viral state. The PMem module is now read-only. UID: [2]
  1. PMem module Handle
  2. PMem module UID
529WarningThe quick health check detected that PMem module [1] is reporting that it has no package spares available. UID: [2]
  1. PMem module Handle
  2. PMem module UID
530InfoThe quick health check detected that the firmware on PMem module [1] experienced an unsafe shutdown before its latest restart. UID: [2]
  1. PMem module Handle
  2. PMem module UID
533ErrorThe quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is not ready. UID: [2]
  1. PMem module Handle
  2. PMem module UID
534ErrorThe quick health check detected that the firmware on PMem module [1] is reporting that the media is disabled. UID: [2]
  1. PMem module Handle
  2. PMem module UID
535ErrorThe quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is disabled. UID: [2]
  1. PMem module Handle
  2. PMem module UID
536ErrorThe quick health check detected that the firmware on PMem module [1] failed to load successfully. UID: [2]
  1. PMem module Handle
  2. PMem module UID
538ErrorPMem module [1] is reporting that the DDRT IO Init is not complete. UID: [2]
  1. PMem module Handle
  2. PMem module UID
539ErrorPMem module [1] is reporting that the mailbox interface is not ready. UID: [2]
  1. PMem module Handle
  2. PMem module UID
540ErrorAn internal error caused the quick health check to abort on PMem module [1]. UID: [2]
  1. PMem module Handle
  2. PMem module UID
541ErrorThe quick health check detected that PMem module [1] is busy. UID: [2]
  1. PMem module Handle
  2. PMem module UID
542ErrorThe quick health check detected that the platform FW did not map a region to SPA on PMem module [1]. ACPI NFIT NVPMem module State Flags Error Bit 6 Set. UID: [2]
  1. PMem module Handle
  2. PMem module UID
543ErrorThe quick health check detected that PMem module [1] DDRT Training is not complete/failed. UID: [2]
  1. PMem module Handle
  2. PMem module UID
544ErrorPMem module [1] is reporting that the DDRT IO Init is not started. UID: [2]
  1. PMem module Handle
  2. PMem module UID
545ErrorThe quick health check detected that the ROM on PMem module [1] has failed to complete initialization, last known Major:Minor Checkpoint is [2].
  1. PMem module Handle
  2. Major checkpoint : Minor checkpoint in Boot Status Register
  3. PMem module UID

Platform Configuration Check Events

This diagnostic test group verifies that the BIOS platform configuration matches the installed hardware and the platform configuration conforms to best known practices.

Table 2. Table Platform Configuration Check Events

CodeSeverityMessageArguments
600InfoThe platform configuration check succeeded.
601InfoThe platform configuration check detected that there are no manageable PMem modules.
606InfoThe platform configuration check detected that PMem module [1] is not configured. UID: [2]
  1. PMem module Handle
  2. PMem module UID
608ErrorThe platform configuration check detected [1] PMem modules installed on the platform with the same serial number [2].
  1. Number of PMem modules with duplicate serial numbers.
  2. The duplicate serial number
609InfoThe platform configuration check detected that PMem module [1] has a goal configuration that has not yet been applied. A system reboot is required for the new configuration to take effect. UID: [2]
  1. PMem module Handle
  2. PMem module UID
618ErrorThe platform configuration check detected that a PMem module with physical ID [1] is present in the system but failed to initialize. UID: [2]
  1. PMem module handle in the SMBIOS table
  2. PMem module UID
621ErrorThe platform configuration check detected PCD contains invalid data on PMem module [1]. UID: [2]
  1. PMem module Handle
  2. PMem module UID
622ErrorThe platform configuration check was unable to retrieve the namespace information.
623WarningThe platform configuration check detected that the BIOS settings do not currently allow memory provisioning from this software.
624ErrorThe platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of errors in the goal data. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6].
  1. PMem module Handle
  2. Validation Status
  3. Text error code corresponding to the status code
  4. Partition Size Change Status
  5. Interleave Change Status
  6. Interleave Change Status
625ErrorThe platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because the system has insufficient resources. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6].
  1. PMem module Handle
  2. Validation Status
  3. Text error code corresponding to the status code
  4. Partition Size Change Status
  5. Interleave Change Status
  6. Interleave Change Status
626ErrorThe platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of a firmware error. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6].
  1. PMem module Handle
  2. Validation Status
  3. Text error code corresponding to the status code
  4. Partition Size Change Status
  5. Interleave Change Status
  6. Interleave Change Status
627ErrorThe platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] for an unknown reason. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6].
  1. PMem module Handle
  2. Validation Status
  3. Text error code corresponding to the status code
  4. Partition Size Change Status
  5. Interleave Change Status
  6. Interleave Change Status
628ErrorThe platform configuration check detected that interleave set [1] is broken because the PMem modules were moved [2].
  1. Interleave set index ID
  2. List of moved PMem modules.
629ErrorThe platform configuration check detected that the platform does not support ADR and therefore data integrity is not guaranteed on the PMem modules.
630ErrorAn internal error caused the platform configuration check to abort.
631ErrorThe platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is missing from location (Socket-Die-iMC-Channel-Slot) [3].
  1. Interleave set index ID
  2. PMem module UID
  3. Location ID
632ErrorThe platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is misplaced. It is currently in location (Socket-Die-iMC-Channel-Slot) [3] and should be moved to (Socket-Die-iMC-Channel-Slot) [4].
  1. Interleave set index ID
  2. PMem module UID
  3. Location ID
  4. Location ID
633ErrorThe platform configuration check detected that the BIOS could not fully map memory on PMem module [1] because of an error in current configuration. The detailed status is CCUR table status: [2] [3].
  1. PMem module Handle
  2. Current Configuration Status
  3. Text error code corresponding to the status code

Security Check Events

The security check diagnostic test group verifies that all Intel™ Optane™ PMem modules have a consistent security state.

Table 3. Table Security Check Events

CodeSeverityMessageArguments
800InfoThe security check succeeded.
801InfoThe security check detected that there are no manageable PMem modules.
802WarningThe security check detected that security settings are inconsistent [1].
1.

A comma separated list of the number of PMem modules in each security state

804InfoThe security check detected that security is not supported on all PMem modules.
805ErrorAn internal error caused the security check to abort.

Firmware Consistency and Settings Check Events

This test group verifies that all PMem modules of a given subsystem device ID have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.

Table 4. Table Firmware Consistency and Settings Check Events

CodeSeverityMessageArguments
900InfoThe firmware consistency and settings check succeeded.
901InfoThe firmware consistency and settings check detected that there are no manageable PMem modules.
902WarningThe firmware consistency and settings check detected that firmware version on PMem modules [1] with subsystem device ID [2] is non-optimal, preferred version is [3].
  1. Comma separated list of PMem module UIDs
  2. Subsystem device ID
  3. Preferred firmware version
903WarningThe firmware consistency and settings check detected that PMem module [1] is reporting a non-critical media temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4]
  1. PMem module Handle
  2. Current media temperature threshold
  3. Fatal media temperature threshold
  4. PMem module UID
904WarningThe firmware consistency and settings check detected that PMem module [1] is reporting a non-critical controller temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4]
  1. PMem module Handle
  2. Current controller temperature threshold
  3. Fatal controller temperature threshold
  4. PMem module UID
905WarningThe firmware consistency and settings check detected that PMem module [1] is reporting a percentage remaining of [2]% which is below the recommended threshold [3]%. UID: [4]
  1. PMem module Handle
  2. Current percentage remaining threshold
  3. Recommended percentage remaining threshold
  4. PMem module UID
906WarningThe firmware consistency and settings check detected that PMem modules have inconsistent viral policy settings.
910ErrorAn internal error caused the firmware consistency and settings check to abort.
911WarningThe firmware consistency and settings check detected that PMem modules have inconsistent first fast refresh settings.

Referenced By

ipmctl(1).

2023-01-19 ipmctl