ethfindgood - Man Page

Name

ethfindgood

Checks for hosts that are able to be pinged, accessed via SSH, and active on the Intel(R) Ethernet Fabric. Produces a list of good hosts meeting all criteria. Typically used to identify good hosts to undergo further testing and benchmarking during initial cluster staging and startup.

The resulting good file lists each good host exactly once and can be used as input to create mpi_hosts files for running mpi_apps and the NIC-SW cable test. The files alive, running, active, good, and bad are created in the selected directory listing hosts passing each criteria.

This command automatically generates the file FF_RESULT_DIR/punchlist.csv. This file provides a concise summary of the bad hosts found. This can be imported into Excel directly as a *.csv file. Alternatively, it can be cut/pasted into Excel, and the Data/Text to Columns toolbar can be used to separate the information into multiple columns at the semicolons.

A sample generated output is:

# ethfindgood

3 hosts will be checked

2 hosts are pingable (alive)

2 hosts are ssh'able (running)

2 total hosts have RDMA active on one or more fabrics (active)

1 hosts are alive, running, active (good)

2 hosts are bad (bad)

Bad hosts have been added to /root/punchlist.csv

# cat /root/punchlist.csv

2015/10/09 14:36:48;phs1fnivd13u07n4;Doesn't ping

2015/10/09 14:36:48;phs1fnivd13u07n4;Can't ssh

2015/10/09 14:36:48;phs1fnivd13u07n3;No active RDMA port

For a given run, a line is generated for each failing host. Hosts are reported exactly once for a given run. Therefore, a host that does not ping is NOT listed as can't ssh nor No active RDMA port. There may be cases where ports could be active for hosts that do not ping. However, the lack of ping often implies there are other fundamental issues, such as PXE boot or inability to access DNS or DHCP to get proper host name and IP address. Therefore, reporting hosts that do not ping is typically of limited value.

Syntax

ethfindgood [-R|-A] [-d  dir] [-f  hostfile] [-h 'hosts'] [-T  timelimit]

Options

--help

Produces full help text.

-R

Skips the running test (SSH). Recommended if password-less SSH is not set up.

-A

Skips the active test. Recommended if Intel(R) Ethernet Fabric Suite software or fabric is not up.

-d dir

Specifies the directory in which to create alive, active, running, good, and bad files. Default is /etc/eth-tools directory.

-f hostfile

Specifies the file with hosts in cluster. Default is /etc/eth-tools/hosts directory.

-h hosts

Specifies the list of hosts to ping.

-T timelimit

Specifies the time limit in seconds for host to respond to SSH. Default is 20 seconds.

Environment Variables

The following environment variables are also used by this command:

HOSTS

List of hosts, used if -h option not supplied.

HOSTS_FILE

File containing list of hosts, used in absence of -f and -h.

FF_MAX_PARALLEL

Maximum concurrent operations.

Examples

ethfindgood

ethfindgood -f allhosts

ethfindgood -h 'arwen elrond'

HOSTS='arwen elrond' ethfindgood

HOSTS_FILE=allhosts ethfindgood

Info

Intel Corporation Copyright(C) 2020 EFSFFCLIRG (Man Page)