Sunday, 22 December 2013

Though you are not sentient and do not comprehend, I nonetheless consider you a true and valued friend.

...some quick notes, mostly for myself. I've been calculating massive amounts of cross correlation at work recently and the CPU-time needed for that is significant so I was looking into using the GPU instead. I found that GeForce GTX 670 does what I need about 5 times faster than i7-2600K (though utilization of the cores was poor in this code, but nice finger exercise none the less).

Small update: I managed to get a speedup of more than 8-10 using GPU with large number of repetitions.
More update: Using single precision floating point operations gives 2x 240 MS/s with the GPU.


% This program compares the difference in speed between calculating
% a cross correlation of 2x8 MS vectors using CPU and GPU
% on i7-2600K @ 3.4 GHz with 8 GB of RAM vs. GeForce GTX 670 
% the speedup is around 4.9, cross correlation can be efficiently 
% calculated through the use of fast fourier transform.
% If only cross-spectrum is of interest then inverse fft is
% unnecessary and some time can be saved.

clear all;
a = randn(2^23, 1);
b = randn(2^23, 1);

tic;
A = fft(a);
B = fft(b);
C = ifft(times(conj(A), B), 'symmetric'); % xcorr of CH1 and CH2
aa = dot(a, a); % zero delay autocorr of CH1
bb = dot(b, b); % zero delay autocorr of CH2
timecpu = toc

tic;
a = gpuArray(a);
b = gpuArray(b);
A = fft(a);
B = fft(b);
C = ifft(times(conj(A), B), 'symmetric');
aa = dot(a, a);
bb = dot(b, b);
C = gather(C);
aa = gather(aa);
bb = gather(bb);
timegpu = toc

speed = timecpu/timegpu

---------------


N=21;
n=100;
n1 = randn(2^N, 1);
n2 = randn(2^N, 1);
tic;
for z=1:n
    a = gpuArray(n1);
    b = gpuArray(n2);
    A = fft(a);
    B = fft(b);
    C = ifft(times(conj(A), B), 'symmetric');
    C = gather(C); % if you don't gather -> 96 MSPS (x2)
end
timegpu = toc
speed = n*2^N/timegpu/1e6


timegpu =

    2.8006


speed =

   74.8825



2x 75 MS/s, if you don't gather (for example just average on the GPU) 2x 96 MS/s, 2x240 MS/s with single precision floating point instructions ... randn(2^N, 1, 'single').

http://en.wikipedia.org/wiki/Cross_correlation

--------------------


Intel Core i7-2600K (4x 3.4 GHz)
GeForce GTX 670 (1344 CUDA-cores, 915 MHz, 6 GHz memory)

Averaging performance including data transfer overheads :

CPU single precision dot-product (1-point xcorr) 2x 600 MS/s
CPU double precision dot-product (1-point xcorr) 2x 300 MS/s
CPU single precision cross-correlation 2x  23 MS/s
CPU double precision cross-correlation 2x  13 MS/s

GPU single precision dot-product (1-point xcorr) 2x 433 MS/s
GPU double precision dot-product (1-point xcorr) 2x 227 MS/s
GPU single precision cross-correlation 2x 225 MS/s
GPU double precision cross-correlation 2x  90 MS/s


--------------------

In the spirit of the Hobbit...
Some fiddling with matlab again. I wasn't aware that java could be integrated this way into matlab, but nice to know...

% Simple RSA cryptosystem testing in matlab

one = java.math.BigInteger.ONE;
p = one.probablePrime(256, java.util.Random);
q = one.probablePrime(256, java.util.Random);


n = p.multiply(q); % first part of the public key (n, e)
tmp0 = p.subtract(one);
tmp1 = q.subtract(one);
totient = tmp0.multiply(tmp1);
tmp0 = java.math.BigInteger('2');
tmp1 = totient.divide(tmp0);
e = tmp1.nextProbablePrime(); % second part of the public key (n, e)
d = e.modInverse(totient); % the private key

m = java.math.BigInteger('123321243425345'); % message

c = m.modPow(e, n) % encryption
m = c.modPow(d, n) % decryption

http://en.wikipedia.org/wiki/RSA_(algorithm)

--------------------


% Fermat primality test

one = java.math.BigInteger.ONE;
p = java.math.BigInteger('221');
n = p.subtract(java.math.BigInteger.ONE);

a = one.nextProbablePrime();
a.modPow(n, p) % a^n mod p

a = a.nextProbablePrime();
a.modPow(n, p) % this needs to be always one if p is prime
% ...

http://en.wikipedia.org/wiki/Fermat_primality_test
http://en.wikipedia.org/wiki/AKS_primality_test

http://en.memory-alpha.org/wiki/Ode_to_Spot

-----------------

Efficient modpow algorithm for large numbers...

a = java.math.BigInteger('2988348162058574136915891421498819466320163312926952423791023078876139');
b = java.math.BigInteger('2351399303373464486466122544523690094744975233415544072992656881240319');
m = java.math.BigInteger('87814412832655818143772433337418883492663173730772486450699007318453048545183');

ZERO = java.math.BigInteger.ZERO;
ONE = java.math.BigInteger.ONE;
s = ONE;
t = a;
u = b;

while u.equals(ZERO) == 0
    if u.and(ONE).equals(ONE) == 1
        s = s.multiply(t).mod(m);
    end
    u = u.shiftRight(1)
    t = t.multiply(t).mod(m);
end

a.modPow(b, m)
s

No comments:

Post a Comment