Date: Fri, 25 Apr 2008 12:39:27 -0400
From: Neil Horman <nhorman@tuxdriver.com>
To: Cliff Wickman <cpw@sgi.com>
Cc: nhorman@tuxdriver.com, linux-numa@vger.kernel.org
Subject: Re: [PATCH] fix cpumask ncpus parameter handling

In a follow up to my previous post, I tracked down that bug involving
Cpus_allowed field of proc_pid_status.  As it turns out its not a kernel bug
after all, but rather a subtlety of what it reports.  As it turns out (and as
one might expect), cpus_allowed reports the cpu which the associated task is
permitted to run on.  However, there are cases in which tasks are allowed to run
on all available cpus, and in some of these cases the kernel will set the
Cpus_allowed field to CPU_MASK_ALL, which boils down to a bitmask of all 1's,
NR_CPUS bits long.  Since NR_CPUS's is statically defined in smp kernels to be a
large number defining the maximum number of cpus a kernel can manage, its
possible for the mask in /proc/<pid>/status[Cpus_allowed] to be a superset of
the actual available cpus.  This can lead to sched_setafinity returning EINVAL
even if your physcpubind parsing completed successfully.  This patch should
correct the problem, by limiting the Cpus_allowed mask to the number of bits
implied by sysconf(_SC_NPROCESSORS_CONF).  Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 libnuma.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff -up numactl-2.0.0-rc1/libnuma.c.orig numactl-2.0.0-rc1/libnuma.c
--- numactl-2.0.0-rc1/libnuma.c.orig	2008-04-25 09:33:01.000000000 -0400
+++ numactl-2.0.0-rc1/libnuma.c	2008-04-25 09:52:19.000000000 -0400
@@ -450,6 +450,7 @@ set_thread_constraints(void)
 	int buflen;
 	char *buffer;
 	FILE *f;
+	int ncpumask = (1<<(sysconf(_SC_NPROCESSORS_CONF)))-1;
 	/*
 	 * The maximum line size consists of the string at the beginning plus
 	 * a digit for each 4 cpus and a comma for each 64 cpus.
@@ -480,10 +481,22 @@ set_thread_constraints(void)
 	fclose(f);
 	free (buffer);
 
-	if (maxprocnode < 0) {
+	/*
+	 * Cpus_allowed in the kernel can be defined to all f's
+	 * i.e. it may be a superset of the actual available processors.
+	 * As such let's reduce maxproccpu with a mask of the actual 
+	 * available cpus.
+	 */
+	maxproccpu &= ncpumask;
+
+	/*
+	 * Sanity checks
+	 */
+	if (maxproccpu == 0)
+		numa_warn(W_cpumap, "Available cpus are empty set");
+
+	if (maxprocnode < 0)
 		numa_warn(W_cpumap, "Cannot parse %s", mask_size_file);
-		return;
-	}
 	return;
 }
 
-- 
