Wrong GPU reported in output file.

Alez
Alez
Joined: 8 Apr 13
Posts: 7
Credit: 4335055
RAC: 0
Topic 84919

I noticed on this computer of mine running Albert
http://albertathome.org/host/6816

that the cuda tasks running ie Binary Radio Pulsar Search v1.33 (BRP4cuda32nv301)
example task here
http://albertathome.org/task/727451
correctly reports that it is running on a nVidia GTX 650.

the openCL tasks however ie. Gamma-ray pulsar search #2 v1.05 (FGRPopencl-nvidia)
example task here
http://albertathome.org/task/725134
reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.
in the output file it also reports the GT 610 as device 0 whilst in fact device 0 is a nVidia GTX 660ti. The 610 is device 1 and the GTX 650 is device 2 as reported by Boinc. I have also noticed that these units run for a little bit, then seem to stop doing any work with 0% load on the GPU but they keep running and the percentage complete slowly rises.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Joined: 28 Aug 06
Posts: 164
Credit: 1864017
RAC: 0

Wrong GPU reported in output file.

Hi!

Thanks for reporting this. This is a very nice system for tests in a multi-GPU environment. Plus some of the tasks show failures during execution but do not terminate (as they should).

We'll look into this.

Just to make sure:

Quote:
reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.

I understand you are sure that the GT 610 is not used by BOINC? Or are you saying that it's not supposed to be running BOINC tasks? E.g. is GPU-Z showing the 610 as idle?

Are you using the 32 bit or 64 bit version of BOINC?

Thx
HBE

Neil Newell
Neil Newell
Joined: 9 Jan 13
Posts: 13
Credit: 4081564
RAC: 0

Not sure if this is relevant

Message 79704 in response to message 79703

Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert).

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: Not sure if this is

Message 79705 in response to message 79704

Quote:
Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert).


That problem is due to a well-known design limitation in the BOINC back-end database which drives the website: it wasn't given a separate relational table which would allow multiple individual (and different) GPUs to be associated with a single host.

The BOINC client itself does enumerate the GPUs individually, and (E&OE) reports them correctly to the server. In theory, the scheduler should be able to handle disparate GPUs correctly - it's just the subsequent cosmetic reporting which is broken.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Neil Newell
Neil Newell
Joined: 9 Jan 13
Posts: 13
Credit: 4081564
RAC: 0

Interesting - thanks for the

Message 79706 in response to message 79705

Interesting - thanks for the info, I'd been wondering about it for a while (plus someone asked why my "2xGTX460" system was so fast!).

Alez
Alez
Joined: 8 Apr 13
Posts: 7
Credit: 4335055
RAC: 0

RE: Hi! Thanks for

Message 79707 in response to message 79703

Quote:

Hi!

Thanks for reporting this. This is a very nice system for tests in a multi-GPU environment. Plus some of the tasks show failures during execution but do not terminate (as they should).

We'll look into this.

Just to make sure:

Quote:
reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.

I understand you are sure that the GT 610 is not used by BOINC? Or are you saying that it's not supposed to be running BOINC tasks? E.g. is GPU-Z showing the 610 as idle?

Are you using the 32 bit or 64 bit version of BOINC?

Thx
HBE

The GT 610 is used to run the display plus run milkyway and seti. I use cc_config.xml file to control what runs on which GPU and these are the only two projects running, and truth be told the only 2 projects that it can run. During the tests conducted all GPU's were being used. The split was as follows
GTX 660ti - Primegrid
GTX 650 - Albert / Einstein
GT 610 - Seti / Milkyway. The display is also connected to this GPU

I'm running 7.0.60 (x64) boinc under window 7 home premium x64 , SP1

I noticed the failures on the OpenCL apps and have suspended them just now. All the Cuda apps work correctly.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Joined: 28 Aug 06
Posts: 164
Credit: 1864017
RAC: 0

Hi! Interesting. I guess

Hi!

Interesting.

I guess the problem is that the BOINC client and the science app are enumerating the devices in a different order, so the science app actually tries to run on the 610, and because there's already other stuff running there, it can't get enough RAM. So the root cause is a problem in device enumeration and a secondary problem is not handling the exception/error condition correctly.

Two bugs in one report...not bad :-) ! Thanks again

Cheers
HB

Alez
Alez
Joined: 8 Apr 13
Posts: 7
Credit: 4335055
RAC: 0

That would explain the zero

That would explain the zero load on the GTX 650. I wonder if the issue could be compounded by the different device enumerations.
For Boinc GTX 660ti is 0, GT610 is 1 and GTX 650 is 2,
however as the display is ran from the GT610 then windows would regard that as the primary device. Could this be causing a conflict with the app accessing the windows device listing rather than the Boinc device enumeration ?

If it was trying to run on the 610 then I'm actually surprised that the card didn't crash as it's not a very powerful card and the run times on the units ( milkyway) running on the card didn't indicate another project running simultaneously. Also would trying to run on a card already running a different app not cause a lock file error ? or is that only when projects try to use the same slots in the boinc folder ?
If Albert was trying to access the GT610 it never actually managed to run on the card.It may have thought it was, but it was never physically running on the card.
Let me know if you require me to try the tests app again.
Alex

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Joined: 28 Aug 06
Posts: 164
Credit: 1864017
RAC: 0

Hi all, Thanks for your

Message 79710 in response to message 79709

Hi all,

Thanks for your patience, hopefully the bug discussed in this thread is finally fixed in this new version discussed here: http://albertathome.org/node/84921.

Thanks for the testing!!!

Cheers
HBE

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.