Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

Jacob Klein
Jacob Klein
Joined: 6 Nov 11
Posts: 16
Credit: 2938967
RAC: 0

Just wanted to post to

Just wanted to post to mention that I will likely no-longer be testing these problematic work units.

Now that my RacerX machine has [GTX 660 Ti + GTX 660 Ti + GTX 460] instead of [GTX 660 Ti + GTX 460 + GTS 240], I will likely not be running any more [Albert/Einstein/SETI/SETIBETA] on it. I had only been running those 4 projects on the GTS 240, since it couldn't run any other projects, but it is now in storage. I pulled it out because the R343+ NVIDIA drivers are dropping support for pre-Fermi GPUs, so I replaced it with another GTX 660 Ti, and all 3 of my GPUs will can now focus on GPUGrid.

Thanks,
Jacob

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: I have the same errors,

Message 80267 in response to message 80263

Quote:
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).


@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tjreuter
tjreuter
Joined: 11 Feb 05
Posts: 32
Credit: 2084544
RAC: 0

RE: RE: I have the same

Message 80268 in response to message 80267

Quote:
Quote:
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.


Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).

Greetings from,
TJ.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: RE: I have the

Message 80269 in response to message 80268

Quote:
Quote:
Quote:
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.


Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).


Yes, visible now, thanks.

I assume we're talking about

Error Gamma-ray pulsar search #3 tasks for computer 7731 - tasks issued yesterday.

Unfortunately, Application details for host 7731 shows no APR for that app, because none of the tasks completed successfully.

And the server log https://albert.phys.uwm.edu/host_sched_logs/7/7731 isn't much use either, because the last scheduler contact was to report work only, with no new work requested.

What I'd like to see, if at all possible, is a copy of the server log for an example of a work request where an FGRP task was issued. It would look something like

Quote:
2014-07-01 17:18:03.1608 [PID=30917] [version] Checking plan class 'FGRPopencl-nvidia'
2014-07-01 17:18:03.1608 [PID=30917] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-07-01 17:18:03.1609 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) using conservative projected flops: 20.12G
2014-07-01 17:18:03.1609 [PID=30917] [version] Best app version is now AV913 (29.38 GFLOP)
2014-07-01 17:18:03.1610 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) 11362
2014-07-01 17:18:03.1610 [PID=30917] [version] Best version of app hsgamma_FGRP3 is [AV#913] (20.12 GFLOPS)
2014-07-01 17:18:03.1610 [PID=30917] [send] est delay 0, skipping deadline check
2014-07-01 17:18:03.1629 [PID=30917] [send] Sending app_version hsgamma_FGRP3 2 111 FGRPopencl-nvidia; projected 20.12 GFLOPS
2014-07-01 17:18:03.1630 [PID=30917] [CRITICAL] No filename found in [WU#605548 LATeah0109C_32.0_99_-5.66e-10]
2014-07-01 17:18:03.1630 [PID=30917] [send] est. duration for WU 605548: unscaled 745.62 scaled 745.95
2014-07-01 17:18:03.1630 [PID=30917] [send] [HOST#11362] sending [RESULT#1453006 LATeah0109C_32.0_99_-5.66e-10_0] (est. dur. 745.95s (0h12m25s95)) (max time 14912.31s (4h08m32s31))


Note that in my case (from host 11362) the server is estimating - last line - that the task will run for 746 seconds (which is what I'm seeing locally too), and won't be thrown out with a time limit error for over four hours.

That's calculated from "using conservative projected flops: 20.12G" a few lines above (which is a new one on me). Since your tasks error out in under 4 minutes, I assume the initial estimates must have been 20 times smaller than that - 12 seconds or something.

What I'd ideally like to see is a similar server log from your machine, showing the GFlops value it's using to calculate your runtime. You have to be quick to catch it: there seem to be very few tasks around at the moment, and I had to try several times. Then, you have to capture the server log within a minute, otherwise another attempt will overwrite the successful one (unless you set NNT before your computer asks again). There's something very odd about the way the Albert server is setting these estimated speeds, and we haven't fully got to the bottom of it yet.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tjreuter
tjreuter
Joined: 11 Feb 05
Posts: 32
Credit: 2084544
RAC: 0

Thank you Richard for your

Message 80270 in response to message 80269

Thank you Richard for your swift replay. I will see it I can "catch" these server code in the coming days.

By the way all your assumptions are correct.

Greetings from,
TJ.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

My HD7770's estimates have

My HD7770's estimates have just got to the point where one of the apps for Binary Radio Pulsar Search (Perseus Arm Survey) now completes without error,
the other app version is still erroring at 422 seconds.

All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 8143

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

I think we're going to have

I think we're going to have real problems with the Gamma-ray pulsar search #3 app for a while.

I posted that my host 11362 was getting runtime estimates of 12 minutes, time allowed 4 hours, @ 20 GHz.

Turns out that two of the three tasks I've returned so far would have exceeded bounds if I hadn't inoculated them. So my GTX 470 GPU is running at an effective rate of 1 GHz or less. As is described elsewhere, this app is very much still a work-in-progress, where very little work is done on the GPU, and most of it still on the CPU - it wants a full CPU core, and uses it to the hilt.

Similarly, TJ's GTX 660 has been taking around three hours for the matching tasks over at the main Einstein project. So that makes even more of a mockery of the server dishing out a bounds limit of four minutes for his machine - his speed must be mis-estimated by a factor of 1,000 or so.

And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tullio
tullio
Joined: 22 Jan 05
Posts: 53
Credit: 137342
RAC: 0

Most of my gamma ray units

Most of my gamma ray units finish in time but are not validated. All on CPU.
Tullio

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

RE: And to put the icing on

Message 80274 in response to message 80272

Quote:
And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...


I've got something like 26 inconclusives spread across all these intel GPU hosts, all of them running openCL 1.1 drivers, most of them are anonymous with an i3-3220 and a HD Graphics 2500 and Boinc 7.0.64:

https://albertathome.org/host/4792
https://albertathome.org/host/5414
https://albertathome.org/host/9043
https://albertathome.org/host/9046
https://albertathome.org/host/9048
https://albertathome.org/host/9041
https://albertathome.org/host/9045
https://albertathome.org/host/9048
https://albertathome.org/host/9089
https://albertathome.org/host/9090
https://albertathome.org/host/9091
https://albertathome.org/host/9094
https://albertathome.org/host/9095
https://albertathome.org/host/9099
https://albertathome.org/host/9101
https://albertathome.org/host/9106
https://albertathome.org/host/9114
https://albertathome.org/host/9115
https://albertathome.org/host/9119
https://albertathome.org/host/9122
https://albertathome.org/host/9129
https://albertathome.org/host/10714

Can we have an OpenCL 1.2 requirement put into FGRPopencl-intel_gpu app please.

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

I've already reported to

Message 80275 in response to message 80274

I've already reported to Bernd, by email:

Quote:

And, (4), a quite different gripe. https://albertathome.org/host/11362/tasks is plodding through some FGRP #3 OpenCL tasks. EVERY SINGLE ONE (sorry for shouting) has been paired with a different one from a sequence of apparently identical, anonymous, "Intel(R) Core(TM) i3-3220 CPU" with HD 2500 iGPUs. So far, I've returned results paired with hosts:

9042, 9045, 9046, 9089, 9093, 9095, 9100, 9110, 9128

They were all created between 28 September and 2 October last year, all are still active (have contacted the server within the last 24 hours), and all have the faulty OpenCL v1.1 driver which makes all tasks inconclusive. Somebody is wasting a lot of time and electricity with those machines: they feel like a job lot, and although anonymous are probably on the same institutional account. I wondered if you could identify the account holder's email address, and persuade them to update their drivers? It would speed FGRP testing up a lot.


He replied,

Quote:
I'll take a look; not sure this will fit in today, though.


'today' being Thursday 03 July.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.