numactl --interleave=all ./testing_dgetrf -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.5.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_dgetrf [options] [-h|--help]

ngpu 1
    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.02 (   0.04)     ---   
 1000  1000     ---   (  ---  )     24.88 (   0.03)     ---   
   10    10     ---   (  ---  )      0.03 (   0.00)     ---   
   20    20     ---   (  ---  )      0.21 (   0.00)     ---   
   30    30     ---   (  ---  )      0.37 (   0.00)     ---   
   40    40     ---   (  ---  )      0.96 (   0.00)     ---   
   50    50     ---   (  ---  )      1.30 (   0.00)     ---   
   60    60     ---   (  ---  )      1.85 (   0.00)     ---   
   70    70     ---   (  ---  )      1.09 (   0.00)     ---   
   80    80     ---   (  ---  )      2.21 (   0.00)     ---   
   90    90     ---   (  ---  )      2.34 (   0.00)     ---   
  100   100     ---   (  ---  )      3.12 (   0.00)     ---   
  200   200     ---   (  ---  )      1.57 (   0.00)     ---   
  300   300     ---   (  ---  )      2.93 (   0.01)     ---   
  400   400     ---   (  ---  )      5.34 (   0.01)     ---   
  500   500     ---   (  ---  )      6.85 (   0.01)     ---   
  600   600     ---   (  ---  )     11.65 (   0.01)     ---   
  700   700     ---   (  ---  )     13.60 (   0.02)     ---   
  800   800     ---   (  ---  )     15.60 (   0.02)     ---   
  900   900     ---   (  ---  )     24.71 (   0.02)     ---   
 1000  1000     ---   (  ---  )     28.77 (   0.02)     ---   
 2000  2000     ---   (  ---  )     81.33 (   0.07)     ---   
 3000  3000     ---   (  ---  )    160.16 (   0.11)     ---   
 4000  4000     ---   (  ---  )    230.79 (   0.18)     ---   
 5000  5000     ---   (  ---  )    307.33 (   0.27)     ---   
 6000  6000     ---   (  ---  )    401.87 (   0.36)     ---   
 7000  7000     ---   (  ---  )    484.96 (   0.47)     ---   
 8000  8000     ---   (  ---  )    556.84 (   0.61)     ---   
 9000  9000     ---   (  ---  )    465.87 (   1.04)     ---   
10000 10000     ---   (  ---  )    537.38 (   1.24)     ---   
12000 12000     ---   (  ---  )    666.13 (   1.73)     ---   
14000 14000     ---   (  ---  )    734.33 (   2.49)     ---   
16000 16000     ---   (  ---  )    800.74 (   3.41)     ---   
18000 18000     ---   (  ---  )    849.48 (   4.58)     ---   
20000 20000     ---   (  ---  )    884.50 (   6.03)     ---   
numactl --interleave=all ./testing_dgetrf_gpu -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.5.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_dgetrf_gpu [options] [-h|--help]

    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.27 (   0.00)     ---  
 1000  1000     ---   (  ---  )     53.63 (   0.01)     ---  
   10    10     ---   (  ---  )      0.01 (   0.00)     ---  
   20    20     ---   (  ---  )      0.08 (   0.00)     ---  
   30    30     ---   (  ---  )      0.20 (   0.00)     ---  
   40    40     ---   (  ---  )      0.47 (   0.00)     ---  
   50    50     ---   (  ---  )      0.72 (   0.00)     ---  
   60    60     ---   (  ---  )      1.06 (   0.00)     ---  
   70    70     ---   (  ---  )      0.78 (   0.00)     ---  
   80    80     ---   (  ---  )      1.43 (   0.00)     ---  
   90    90     ---   (  ---  )      1.76 (   0.00)     ---  
  100   100     ---   (  ---  )      1.94 (   0.00)     ---  
  200   200     ---   (  ---  )      2.13 (   0.00)     ---  
  300   300     ---   (  ---  )      6.02 (   0.00)     ---  
  400   400     ---   (  ---  )     11.28 (   0.00)     ---  
  500   500     ---   (  ---  )     17.43 (   0.00)     ---  
  600   600     ---   (  ---  )     25.38 (   0.01)     ---  
  700   700     ---   (  ---  )     33.92 (   0.01)     ---  
  800   800     ---   (  ---  )     42.64 (   0.01)     ---  
  900   900     ---   (  ---  )     52.44 (   0.01)     ---  
 1000  1000     ---   (  ---  )     59.67 (   0.01)     ---  
 2000  2000     ---   (  ---  )    148.62 (   0.04)     ---  
 3000  3000     ---   (  ---  )    208.77 (   0.09)     ---  
 4000  4000     ---   (  ---  )    211.33 (   0.20)     ---  
 5000  5000     ---   (  ---  )    285.97 (   0.29)     ---  
 6000  6000     ---   (  ---  )    394.14 (   0.37)     ---  
 7000  7000     ---   (  ---  )    470.15 (   0.49)     ---  
 8000  8000     ---   (  ---  )    542.20 (   0.63)     ---  
 9000  9000     ---   (  ---  )    729.40 (   0.67)     ---  
10000 10000     ---   (  ---  )    764.43 (   0.87)     ---  
12000 12000     ---   (  ---  )    854.70 (   1.35)     ---  
14000 14000     ---   (  ---  )    930.72 (   1.97)     ---  
16000 16000     ---   (  ---  )    969.89 (   2.82)     ---  
18000 18000     ---   (  ---  )    997.58 (   3.90)     ---  
20000 20000     ---   (  ---  )   1022.62 (   5.22)     ---  
