dbrvstatdiff man page
dbrvstatdiff — evaluate statistical differences between two random variables
Synopsis
dbrvstatdiff [-f format] [-c ConfRating] [-h HypothesizedDifference] m1c sd1c n1c m2c sd2c n2c
OR
dbrvstatdiff [-f format] [-c ConfRating] m1c n1c m2c n2c
Description
Produce statistics on the difference of sets of random variables. If a hypothesized difference is given (with "-h"
), to does a Student's t-test.
Random variables are specified by:
- "m1c", "m2c"
The column names of means of random variables.
- "sd1c", "sd2c"
The column names of standard deviations of random variables.
- "n1c", "n2c"
Counts of number of samples for each random variable
These values can be computed with dbcolstats.
Creates up to ten new columns:
- "diff"
The difference of RV 2 - RV 1.
- "diff_pct"
The percentage difference (RV2-RV1)/1
- "diff_conf_{half,low,high}" and "diff_conf_pct_{half,low,high}"
The half half confidence intervals and low and high values for absolute and relative confidence.
- "t_test"
The T-test value for the given hypothesized difference.
- "t_test_result"
Given the confidence rating, does the test pass? Will be either "rejected" or "not-rejected".
- "t_test_break"
The hypothesized value that is break-even point for the T-test.
- "t_test_break_pct"
Break-even point as a percent of m1c.
Confidence intervals are not printed if standard deviations are not provided. Confidence intervals assume normal distributions with common variances.
T-tests are only computed if a hypothesized difference is provided. Hypothesized differences should be proceeded by <=, >=, =. T-tests assume normal distributions with common variances.
Options
- -c FRACTION or --confidence FRACTION
Specify FRACTION for the confidence interval. Defaults to 0.95 for a 95% confidence factor (alpha = 0.05).
- -f FORMAT or --format FORMAT
Specify a printf(3)-style format for output statistics. Defaults to
"%.5g"
.- -h DIFF or --hypothesis DIFF
Specify the hypothesized difference as
"DIFF"
, where"DIFF"
is something like"<=0"
or">=0"
, etc.
This module also supports the standard fsdb options:
- -d
Enable debugging output.
- -i or --input InputSource
Read from InputSource, typically a file name, or
"-"
for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.- -o or --output OutputDestination
Write to OutputDestination, typically a file name, or
"-"
for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.- --autorun or --noautorun
By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The
"--(no)autorun"
option controls that behavior within Perl.- --help
Show help.
- --man
Show full manual.
Sample Usage
Input
#fsdb title mean2 stddev2 n2 mean1 stddev1 n1 example6.12 0.17 0.0020 5 0.22 0.0010 4
Command
cat data.fsdb | dbrvstatdiff mean2 stddev2 n2 mean1 stddev1 n1
Output
#fsdb title mean2 stddev2 n2 mean1 stddev1 n1 diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high example6.12 0.17 0.0020 5 0.22 0.0010 4 0.05 29.412 0.0026138 0.047386 0.052614 1.5375 27.874 30.949 # | dbrvstatdiff mean2 stddev2 n2 mean1 stddev1 n1
Input 2
(example 7.10 from Scheaffer and McClave):
#fsdb title x2 sd2 n2 x1 sd1 n1 example7.10 9 35.22 24.44 9 31.56 20.03
Command 2
dbrvstatdiff -h '<=0' x2 sd2 n2 x1 sd1 n1
Output 2
#fsdb title n1 x1 sd1 n2 x2 sd2 diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high t_test t_test_result example7.10 9 35.22 24.44 9 31.56 20.03 3.66 0.11597 4.7125 -1.0525 8.3725 0.14932 -0.033348 0.26529 1.6465 not-rejected # | /global/us/edu/ucla/cs/ficus/users/johnh/BIN/DB/dbrvstatdiff -h <=0 x2 sd2 n2 x1 sd1 n1
Case 3
A common use case is to have one file with a set of trials from two experiments, and to use dbrvstatdiff to see if they are different.
Input 3:
#fsdb case trial value a 1 1 a 2 1.1 a 3 0.9 a 4 1 a 5 1.1 b 1 2 b 2 2.1 b 3 1.9 b 4 2 b 5 1.9
Command 3
cat two_trial.fsdb | dbmultistats -k case value | dbcolcopylast mean stddev n | dbrow '_case eq "b"' | dbrvstatdiff -h '=0' mean stddev n copylast_mean copylast_stddev copylast_n | dblistize
Output 3:
#fsdb -R C case mean stddev pct_rsd conf_range conf_low conf_high conf_pct sum sum_squared min max n copylast_mean copylast_stddev copylast_n diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high t_test t_test_result t_test_break t_test_break_pct case: b mean: 1.98 stddev: 0.083666 pct_rsd: 4.2256 conf_range: 0.10387 conf_low: 1.8761 conf_high: 2.0839 conf_pct: 0.95 sum: 9.9 sum_squared: 19.63 min: 1.9 max: 2.1 n: 5 copylast_mean: 1.02 copylast_stddev: 0.083666 copylast_n: 5 diff: -0.96 diff_pct: -48.485 diff_conf_half: 0.12202 diff_conf_low: -1.082 diff_conf_high: -0.83798 diff_conf_pct_half: 6.1627 diff_conf_pct_low: -54.648 diff_conf_pct_high: -42.322 t_test: -18.142 t_test_result: rejected t_test_break: -1.082 t_test_break_pct: -54.648 # | dbmultistats -k case value # | dbcolcopylast mean stddev n # | dbrow _case eq "b" # | dbrvstatdiff -h =0 mean stddev n copylast_mean copylast_stddev copylast_n # | dbfilealter -R C
(So one cannot say that they are statistically equal.)
See Also
Fsdb. dbcolstats. dbcolcopylast.
AUTHOR and COPYRIGHT
Copyright (C) 1991-2015 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.