Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

Jacob Klein

Joined: 6 Nov 11

Posts: 16

Credit: 2938967

RAC: 0

Just wanted to post to

1 Jul 2014 12:40:35 UTC

Message 80266

(moderation:

)

Just wanted to post to mention that I will likely no-longer be testing these problematic work units.

Now that my RacerX machine has [GTX 660 Ti + GTX 660 Ti + GTX 460] instead of [GTX 660 Ti + GTX 460 + GTS 240], I will likely not be running any more [Albert/Einstein/SETI/SETIBETA] on it. I had only been running those 4 projects on the GTS 240, since it couldn't run any other projects, but it is now in storage. I pulled it out because the R343+ NVIDIA drivers are dropping support for pre-Fermi GPUs, so I replaced it with another GTX 660 Ti, and all 3 of my GPUs will can now focus on GPUGrid.

Thanks,
Jacob

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: I have the same errors,

1 Jul 2014 12:42:44 UTC

Message 80267 in response to message 80263

(moderation:

)

Quote:

I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tjreuter

Joined: 11 Feb 05

Posts: 32

Credit: 2084544

RAC: 0

RE: RE: I have the same

1 Jul 2014 16:08:28 UTC

Message 80268 in response to message 80267

(moderation:

)

Quote:

Quote:
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).

Greetings from,
TJ.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

RE: RE: RE: I have the

1 Jul 2014 17:36:38 UTC

Message 80269 in response to message 80268

(moderation:

)

Quote:

Quote:
Quote:
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).

Yes, visible now, thanks.

I assume we're talking about

Error Gamma-ray pulsar search #3 tasks for computer 7731 - tasks issued yesterday.

Unfortunately, Application details for host 7731 shows no APR for that app, because none of the tasks completed successfully.

And the server log https://albert.phys.uwm.edu/host_sched_logs/7/7731 isn't much use either, because the last scheduler contact was to report work only, with no new work requested.

What I'd like to see, if at all possible, is a copy of the server log for an example of a work request where an FGRP task was issued. It would look something like

Quote:

2014-07-01 17:18:03.1608 [PID=30917] [version] Checking plan class 'FGRPopencl-nvidia'
2014-07-01 17:18:03.1608 [PID=30917] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-07-01 17:18:03.1609 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) using conservative projected flops: 20.12G
2014-07-01 17:18:03.1609 [PID=30917] [version] Best app version is now AV913 (29.38 GFLOP)
2014-07-01 17:18:03.1610 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) 11362
2014-07-01 17:18:03.1610 [PID=30917] [version] Best version of app hsgamma_FGRP3 is [AV#913] (20.12 GFLOPS)
2014-07-01 17:18:03.1610 [PID=30917] [send] est delay 0, skipping deadline check
2014-07-01 17:18:03.1629 [PID=30917] [send] Sending app_version hsgamma_FGRP3 2 111 FGRPopencl-nvidia; projected 20.12 GFLOPS
2014-07-01 17:18:03.1630 [PID=30917] [CRITICAL] No filename found in [WU#605548 LATeah0109C_32.0_99_-5.66e-10]
2014-07-01 17:18:03.1630 [PID=30917] [send] est. duration for WU 605548: unscaled 745.62 scaled 745.95
2014-07-01 17:18:03.1630 [PID=30917] [send] [HOST#11362] sending [RESULT#1453006 LATeah0109C_32.0_99_-5.66e-10_0] (est. dur. 745.95s (0h12m25s95)) (max time 14912.31s (4h08m32s31))

Note that in my case (from host 11362) the server is estimating - last line - that the task will run for 746 seconds (which is what I'm seeing locally too), and won't be thrown out with a time limit error for over four hours.

That's calculated from "using conservative projected flops: 20.12G" a few lines above (which is a new one on me). Since your tasks error out in under 4 minutes, I assume the initial estimates must have been 20 times smaller than that - 12 seconds or something.

What I'd ideally like to see is a similar server log from your machine, showing the GFlops value it's using to calculate your runtime. You have to be quick to catch it: there seem to be very few tasks around at the moment, and I had to try several times. Then, you have to capture the server log within a minute, otherwise another attempt will overwrite the successful one (unless you set NNT before your computer asks again). There's something very odd about the way the Albert server is setting these estimated speeds, and we haven't fully got to the bottom of it yet.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tjreuter

Joined: 11 Feb 05

Posts: 32

Credit: 2084544

RAC: 0

Thank you Richard for your

1 Jul 2014 20:41:04 UTC

Message 80270 in response to message 80269

(moderation:

)

Thank you Richard for your swift replay. I will see it I can "catch" these server code in the coming days.

By the way all your assumptions are correct.

Greetings from,
TJ.

Claggy

Joined: 29 Dec 06

Posts: 122

Credit: 4040969

RAC: 0

My HD7770's estimates have

1 Jul 2014 20:47:12 UTC

Message 80271

(moderation:

)

My HD7770's estimates have just got to the point where one of the apps for Binary Radio Pulsar Search (Perseus Arm Survey) now completes without error,
the other app version is still erroring at 422 seconds.

All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 8143

Claggy

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

I think we're going to have

2 Jul 2014 10:47:40 UTC

Message 80272

(moderation:

)

I think we're going to have real problems with the Gamma-ray pulsar search #3 app for a while.

I posted that my host 11362 was getting runtime estimates of 12 minutes, time allowed 4 hours, @ 20 GHz.

Turns out that two of the three tasks I've returned so far would have exceeded bounds if I hadn't inoculated them. So my GTX 470 GPU is running at an effective rate of 1 GHz or less. As is described elsewhere, this app is very much still a work-in-progress, where very little work is done on the GPU, and most of it still on the CPU - it wants a full CPU core, and uses it to the hilt.

Similarly, TJ's GTX 660 has been taking around three hours for the matching tasks over at the main Einstein project. So that makes even more of a mockery of the server dishing out a bounds limit of four minutes for his machine - his speed must be mis-estimated by a factor of 1,000 or so.

And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

tullio

Joined: 22 Jan 05

Posts: 53

Credit: 137342

RAC: 0

Most of my gamma ray units

2 Jul 2014 15:42:25 UTC

Message 80273

(moderation:

)

Most of my gamma ray units finish in time but are not validated. All on CPU.
Tullio

Claggy

Joined: 29 Dec 06

Posts: 122

Credit: 4040969

RAC: 0

RE: And to put the icing on

5 Jul 2014 20:58:06 UTC

Message 80274 in response to message 80272

(moderation:

)

Quote:

And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...

I've got something like 26 inconclusives spread across all these intel GPU hosts, all of them running openCL 1.1 drivers, most of them are anonymous with an i3-3220 and a HD Graphics 2500 and Boinc 7.0.64:

Can we have an OpenCL 1.2 requirement put into FGRPopencl-intel_gpu app please.

Claggy

Richard Haselgrove

Joined: 10 Dec 05

Posts: 143

Credit: 5409572

RAC: 0

I've already reported to

5 Jul 2014 21:17:40 UTC

Message 80275 in response to message 80274

(moderation:

)

I've already reported to Bernd, by email:

Quote:

And, (4), a quite different gripe. https://albertathome.org/host/11362/tasks is plodding through some FGRP #3 OpenCL tasks. EVERY SINGLE ONE (sorry for shouting) has been paired with a different one from a sequence of apparently identical, anonymous, "Intel(R) Core(TM) i3-3220 CPU" with HD 2500 iGPUs. So far, I've returned results paired with hosts:

9042, 9045, 9046, 9089, 9093, 9095, 9100, 9110, 9128

They were all created between 28 September and 2 October last year, all are still active (have contacted the server within the last 24 hours), and all have the faulty OpenCL v1.1 driver which makes all tasks inconclusive. Somebody is wasting a lot of time and electricity with those machines: they feel like a job lot, and although anonymous are probably on the same institutional account. I wondered if you could identify the account holder's email address, and persuade them to update their drivers? It would speed FGRP testing up a lot.

He replied,

Quote:

I'll take a look; not sure this will fit in today, though.

'today' being Thursday 03 July.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports