GPU utilisation and optimisation

Stephan Goll
Stephan Goll
Joined: 13 Dec 05
Posts: 20
Credit: 1874367
RAC: 0
Topic 84897

Okay, let's start with some simple things. First: I'm running linux ... and on an nvdia equipped box I get:

boinc@zaphod:~$ lsmod
Module Size Used by
nvidia 11197407 20
The log from einstein says:
[04:17:37][12634][INFO ] Using CUDA device #0 "GeForce GT 430" (48 CUDA cores / 134.40 GFLOPS)

I also have an nvidia ION based box:
boinc@atom:~$ lsmod
Module Size Used by
nvidia 12286260 18
[02:31:52][20014][INFO ] Using CUDA device #0 "ION" (16 CUDA cores / 52.80 GFLOPS)

This looks for me that there are up to 20 processes / tasks / jobs (what ever) are using this kernel module. I do run my box headless, no X server, so I guess these are boinc tasks. When stopping boinc this number goes to zero. So far, so good.
Most nvidia GPUs have n * 16 CUDA cores. For me it would make sense to run so many boinc tasks on the GPU as it has CUDA core. That brings me to my first question:

Is my guessing right and is it possible to adjust the numbers of boinc tasks to 16? Or may be 15 ... for those who running an GUI on their computers letting one CUDA core free for the GUI? Or is this number adjustable?

Okay, I believe that the 20 is equal to the numbers ob e@h tasks put in one workunit. So I think adjusting this number may be easy ... hopefully.
Because this would make some thinges easier an brings me to question two:

With n CUDA cores (n may vary) and 20 (?) boinc tasks per job ... would it make sense to run more than one job in parallel on the GPU up to the number of CUDA cores of the built-in nvidia card?

Yes, I know there were suggestions to do this ... but my point is not the empirical approach, I want to do it "right"(tm). Means: I like to know what I'm doing.

Which brings me tho question three:

Does anyone know how it looks when running e@h on ATI / AMD GPU cards?

boinc@celeron:~$ lsmod | grep fglrx
fglrx 2610490 113
button 4650 2 i915,fglrx

Okay, I don't know why the button kernel modules is using the fglrx module, but who cares. I can't believe that this is computing so much. :D After stopping boinc I get:

boinc@celeron:~$ lsmod | grep fglrx
fglrx 2610490 80
button 4650 2 i915,fglrx

Looks like 32 (113 - 1 - 80) boinc tasks / processes are using the AMD GPU. Running a Radeon HD 5570 and using http://de.wikipedia.org/wiki/ATI-Radeon-HD-5000-Serie#Modelldaten I see a little reserve. 400 stream processores are available, 113 seems to be used.
May be there is some potential for improvement.

So far from my simple point of view. Any comments ... suggestions ... ideas?
Stephan

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Joined: 28 Aug 06
Posts: 164
Credit: 1864017
RAC: 0

GPU utilisation and optimisation

Hi!

Quote:

Okay, let's start with some simple things. First: I'm running linux ... and on an nvdia equipped box I get:

boinc@zaphod:~$ lsmod
Module Size Used by
nvidia 11197407 20
The log from einstein says:
[04:17:37][12634][INFO ] Using CUDA device #0 "GeForce GT 430" (48 CUDA cores / 134.40 GFLOPS)

I also have an nvidia ION based box:
boinc@atom:~$ lsmod
Module Size Used by
nvidia 12286260 18
[02:31:52][20014][INFO ] Using CUDA device #0 "ION" (16 CUDA cores / 52.80 GFLOPS)

This looks for me that there are up to 20 processes / tasks / jobs (what ever) are using this kernel module.

Not directly, AFAIK this is the number of other kernel modules that depend on the module in question.

Anyway, the crucial thing to note is that GPU CUDA cores are working differently than CPU cores: they need to run (simplified) the same thread but on different portions of the data, while CPU cores can work separately and independently of one another. But this is a good thing, because the more powerful NVIDIA GPU cards have several hundreds or even thousands of CUDA cores. Imagine that every core would need to have its own BOINC tasks :-( !!! Each BRP4 task, for example, requires roughly 250 MB of RAM, if you would need several thousands of them to saturate the hardware, that would mean hundreds of GB of fast RAM that would be needed. But the way CUDA works, we can (at least theoretically) make use of all cores with a single process, by splitting the work in a way so that cores can work on different parts of the same data set.

Having said that, it sometimes helps performance a bit to have more than one BOINC GPU task at a time because then one task can do computations when the other task is busy transferring data from or to memory. But this is about using 2, maybe 3, perhaps 4 tasks, not dozens or hundreds.

Does this answers your question?

Cheers
HBE

Stephan Goll
Stephan Goll
Joined: 13 Dec 05
Posts: 20
Credit: 1874367
RAC: 0

> Having said that, it

Message 79384 in response to message 79383

> Having said that, it sometimes helps performance a bit to have more than one
> BOINC GPU task at a time because then one task can do computations when the
> other task is busy transferring data from or to memory. But this is about using
> 2, maybe 3, perhaps 4 tasks, not dozens or hundreds.

> Does this answers your question?

Hello Bikeman,
yes, this gives me an idea how the "running more tasks on one GPU" will work and how it's getting boosted. And it gives me some additional information that may help somehow.
Thanks,
Stephan

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.