20030312, 15:09  #12 
Mar 2003
3^{4} Posts 
Prime95 secret
It seems I begin understand why other preshmen's progs are
very slow. 1) The slowest: don't use FFT at all, use scholar multiplication O(n^2) instead. 2) Slow: Use two FFT every iteration 3) Fast: Use only once FFT at startup and than perform all calculation with FFTimages in complex field. Is 3rd item correct and does Prime95 do so? 
20030312, 17:04  #13  
∂^{2}ω=0
Sep 2002
República de California
2·13·449 Posts 
Quote:
I used the revised version to estimate what FFT length would be needed for a Pe'pin test (or p1/ecm factoring work) on F31, before Kruppa & Forbes found the small factor of that number. The error analysis (with the key adjustable asymptotic constant set based on data from runs with GIMPSsized numbers) predicts that N = 2^27 floating doubles should me more than adequate to test F31sized numbers  in fact the slightly smaller 15*2^23 should also suffice. Since I didn't have a nonpowerof2 runlength capability in my Fermat code (although one can use such for Fermatmod DWT, by combining the standard negacyclic weighting with a Mersennelike variablebase DWT  the final version of the paper also discusses this), I used my Mersenne code with M(M31) as input for the test. Sure enough, ~100K iterations with lengths 15*2^23 and 2^27 yielded no fractional errors anywhere close to dangerous (or at least what we consider dangerous based on lots of practical experience with GIMPS), but 14*2^23 quickly yielded fatal RO errors. The same error analysis also predicts that 16 bits per input will become too large around F3536, at which point being able to do a nonpowerof2runlength Fermatmod DWT will become very important  otherwise we'd have to drop down to 8 bits per input, which would more than double our runtime. 

20030312, 17:19  #14  
∂^{2}ω=0
Sep 2002
República de California
2·13·449 Posts 
Re: Prime95 secret
Quote:
2) No, all the best programs still need 2 FFTs per iteration  they just do them very efficiently. 3) I'm not sure what you mean here. If you're referring to doing all the iterations entirely in Fourier space, that has been much discussed, but simply doesn't work  you need to come out of Fourier space to do error removal (in a floatingpointFFTbased scheme), normalization of digits and carry propagation. Then you need to take the resulting new set of input digits and get back into Fourier space to do the squaring, and so on. There's your two FFTs  no way around that, I'm afraid. So there's no black magic, no incredibly sophisticated mathematical sleight of hand (besides the FFTbased multiply and the DWT, both of which are really nifty but certainly not miraculous by any stretch)  just a small amount of higher mathematics and a hell of a lot of hard work. (Very much in the spirit of Thomas Edison's famous phrase about genius being 10% inspiration and 90% perspiration. Many would argue that the 10% figure is much too high. :)) 

20030312, 19:28  #15  
∂^{2}ω=0
Sep 2002
República de California
11674_{10} Posts 
Quote:


20030312, 19:40  #16 
Oct 2002
2B_{16} Posts 
I feel rather bad, now, about referring to your paper (or rather, the estimate contained in it) so roughly... but my comments were entirely accurate when I wrote them. (In fact, I seem to recall mentioning it to Richard C. at the time, and being told something to the effect of "well, you might be right, but we're not going to change it now"  but that was a couple years ago, so I might be getting confused with something else.)

20030312, 20:10  #17  
∂^{2}ω=0
Sep 2002
República de California
2×13×449 Posts 
Quote:


20030312, 20:38  #18  
Oct 2002
43 Posts 
Quote:
Quote:
Quote:


20030313, 11:31  #19  
Mar 2003
121_{8} Posts 
Quote:
But it's still incredible that Prime95 is 30times faster than such suboptimal progs (instead 2 or 4 times). How long digits are safe enough and optimal for 80 and 64 bit FFT? (I believe that 16 and 8 bit per "digit") 

20030313, 18:26  #20  
∂^{2}ω=0
Sep 2002
República de California
11674_{10} Posts 
Quote:
For a guide to how many bits per diigt can safely be used, look at http://www.mersenne.org/bench.htm . For instance, at 1024K (that's the number of *real* elements in the FFT  if you're using a complexdata FFT that means 19 radix2 passes, not 20) one can use ~19 bits per input digit. Of course to achieve this in practice requires careful coding  you need to use the balanceddigit representation described in the 1994 Crandall/Fagin Math. Comp. paper, use the DWT to halve the vector length over a zeropadded implementation, and preferably use a higherradix FFT to redce the number of passes through the data (= faster) and reduce the number of complex multiplies by FFT sincos data (= more accurate). 

20030319, 18:14  #21 
Mar 2003
121_{8} Posts 
IBWDT
*** ewmayer
Would you be so kind as explaining me how IBWDT works? According to its name I've tried to multiply 79 by 83 (mod 100) using 101^0.5 (i.e. 10.04987562112089) as base. In this system 79 and 83 will be, for instance, (7 8.65087065215377) and (8 2.60099503103288). Two FFT and one iFFT go here, which lead us to (87.41393043446033 78.50087158036013) The cyclic carry prolonging gives us: (4.96504984437232 7.10186661139301) In decimal system it will be: 57.000000000000 That is rare luck then irrationality was eliminated at the very end, but more often was not! What is the correct algorithm? Thank you in advance!!! 
20030320, 22:10  #22  
Aug 2002
3×83 Posts 
Re: Prime95 secret
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
why is http://www.mersenne.org/ so slow  Unregistered  Information & Answers  19  20120417 03:12 
Slow Down?  R.D. Silverman  GMPECM  55  20111016 17:28 
How hot is too hot? Slow is too slow?  petrw1  Hardware  13  20081110 23:25 
Slow computer  Housemouse  Hardware  7  20080215 18:18 
Really slow machines?  Doorbasher  Hardware  5  20040823 22:18 