This directory, paral, contains tests which exercise parallel
features of the ABINIT package.

Copyright (C) 1998-2007 ABINIT group (XG,LSi)
This file is distributed under the terms of the
GNU General Public License, see ~abinit/COPYING
or http://www.gnu.org/copyleft/gpl.txt .
For the initials of contributors, see ~abinit/doc/developers/contributors.txt .

=============================================================================

Most of these tests are designed primarily to exercise parts of the code
quickly, NOT necessarily to give physically sensible results.
For tests of correctness, see the Tutorial directory.
For greater speed, some tests are not run to full convergence.
Also the quality parameters (especially ecut) are minimal, i.e.
the calculations are underconverged.

Tests A, B, D, E, H, I, J, and K are NOT intended to be used as a measure of the
parallelisation speed-up : they contain too much initialisation.
Tests C, F, G, M and N should be OK for some speed-up testing : they are more
realistic than the others. They represent the case of a large number
of k points. In this respect, test C could be modified, to have a still finer
grid of k points (ngkpt and nkpt should be changed).
Also, one might test localrdwf=0 as well as localrdwf=1

WARNING : test F and G use outputs of test C, so test C must be run
before being able to run tests F or G. One run of test C is enough
to initialize all further uses of tests F and G in the same test directory.

Test "kpoints+spin" is intended to serve for speed-up testing. Its use is
explained in the latest part of this file.

==============================================================================

To run these tests :

1. Submit the 'Run' script. See the header of the Run file, for a
   description of the procedure.
   The script 'Run' will create a subdirectory with the name_of_machine and the
   date, where all the results will be placed.
   Beware : only some machine names are allowed, since for each machine
   the procedure to launch parallel execution is different !!

2. In the directory so created, you will find for each test case that you have
   run, a log file (with the name of the test case), an output
   file, but also a 'diff.xxx' file, automatically created by making
   a 'diff' with respect to the "Refs" subdirectory output files.
   It contains output files from a recent version of the code.
   There may be large differences in timing but there should only
   be minor differences in the output of physical quantities.

3. There is also a global report file, generated by the use of the
   fldiff script. Its name is fldiff.report . See the install_notes
   in the Infos directory for information about the use of this file.

**********

Test cases :

set A :  Si in diamond structure; 60 special points in core; low ecut.
set B :  Si in diamond structure; 60 special points, not in core; low ecut.
set C :  FCC Al metallic; 10 special points
set D :  Molybdenum slab (5 atoms+3 vacuum), with ixc=1. 4 k-points, in core.
            Use iprcel=45 for SCF cycle (iscf=3).
set E :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to test v2 #30, except that only two q points are
             considered)
            localrdwf=1
set F :  FCC Al metallic : 2 non-self-consistent calculations with 256 k-points,
         for q=Gamma and q=1/4 -1/8 1/8, from the results of set C.
set G :  FCC Al metallic : phonon RF calculation at q=1/4 -1/8 1/8 .
         Need the output files of tests 2
set H :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that localrdwf=0)
set I :  Fe in FCC structure; GS and RF calculation (RF at q=0 0 0)
            Test the parallelism on both spin and k points
set J :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that mkmem,mkqmem,mk1mem=0)
set K :  GaAs in hypothetical wurtzite structure; GS and RF calculation
            parallelism over the perturbations (contributed by PPlaenitz)
            (not activated in the automatic testing suite at present - v4.7)
set L : Si3N4, parallelism over G basis set (contributed by THoefler).
            (not activated in the automatic testing suite at present - v4.7)
set M : Si, Bulk, 2 atoms, GW calculation, parallelism over k points (contributed by RShaltaf).
set N : Si, Bulk, 2 atoms, GW calculation, parallelism over bands (contributed by RShaltaf).
For each set, one has usually the following cases :
0 use the sequential code (abinis), for check
1 use the parallel code (abinip), with only one processing element
2  use abinip, with two threads and two processing elements
4  use abinip, with four threads and four processing elements
10 use abinip, with ten threads and ten processing elements
   (not for test D and I, as there are only 4 k points for test D,
    and 2 spins and 2 k points for test I)

=============================================================================

To test the speed-up on a number of processor larger than 10, one
should use the files  t_kpt+spin.in and t_kpt+spin.files .
The parallelisation over k-points and spins can be tested.

One is advised first to execute the above-mentioned test I ,
and see whether the output file is correct, see the fldiff file,
where the automatic analysis is performed. The test "kpt+spin"
contains parameters that make the run much longer, and much
more suitable for parallelisation, still being quite realistic.

Supposing that the test I went smoothly, then the kpt+spin
test should be performed as follows :
(1) Go inside the directory created for the test I  
 (likely  ,name_of_machine_yyyymmdd )
(2) Execute :
 cp ../t_kpt+spin.in .
 cp ../t_kpt+spin.files .
(3) Run your job with a command like
 /usr/local/mpi-pgi4/bin/mpirun -np 32 ../../abinip < t_kpt+spin.files >& log
 (this being for 32 processors).
 The main output file is called t_kpt+spin.out .
 At its end, it contains an analysis of the CPU and wall clock time.
 In sequential, this job lasts between 3000 secs (IFC compiler) 
 and 4000 secs (PGI compiler) on a PC Intel 2.8 GHz bought in May 2005.
 With 182 k-points and 2 spins, it might be parallelized over 364 processors,
 but the sequential part of the job is estimated to a bit more than 1%, so that it should
 saturate at a speed-up below 100 .
(4) You can edit the file t_kpt+spin.in, and increase
 ngkpt, or set ndtset 2 , or uncomment the definitions
 of wfoptalg and nbdblock, as proposed in the file ,
 then go back to step 3 and run the job.
 Increasing the number of k points (thanks to ngkpt e.g. ngkpt 16 16 16) will
 allow a better maximal speed-up. However, the test becomes
 less realistic.

You can get more information about the t_kpt+spin.in ,
t_kpt+spin.files and corresponding output files by reading the 
http://www.abinit.org/Infos/abinis_help.html help file.

==============================================================================
