Sending work

Submitted on 31 Oct 2011 8:42:08 UTC

Albert@home is still an unofficial, non-public test project. Don't expect anything to work here. The only type of work Albert@home is currently sending out is for a highly experimental BRP4 OpenCL application.

Comments

Gaurav Khanna

Joined: 8 Nov 04

Posts: 9

Credit: 2818895

RAC: 0

Since the upgrade to 7.0.2

8 Dec 2011 15:15:05 UTC

Message 78529

Quote

(moderation:

)

Since the upgrade to 7.0.2 I'm not getting any work for the GPUs ..

08-Dec-2011 07:20:02 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:20:02 [Albert@Home] Requesting new tasks for ATI
08-Dec-2011 07:20:05 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:20:05 [Albert@Home] No tasks sent
08-Dec-2011 07:58:09 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:58:09 [Albert@Home] Requesting new tasks for NVIDIA
08-Dec-2011 07:58:11 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:58:11 [Albert@Home] No tasks sent

host details are here:
http://albertathome.org/host/1396

Any thoughts?

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

Click on the "last contact"

8 Dec 2011 19:31:03 UTC

Message 78530 in response to message 78529

Quote

(moderation:

)

Click on the "last contact" link to see the scheduler logs.

In case of your host I see

2011-12-08 15:42:31.5128 [PID=32492]    [version] Checking plan class 'atiOpenCL'
2011-12-08 15:42:31.5128 [PID=32492]    [version] GPU RAM required min: 536870912.000000, supplied: 0
2011-12-08 15:42:31.5128 [PID=32492]    [version] [AV#459] app_plan() returned false

Hm - looks like there's something wrong with the GPU RAM size reporting. I'll look into that tomorrow.

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

Hm ... according to

8 Dec 2011 19:56:28 UTC

Message 78531 in response to message 78530

Quote

(moderation:

)

Hm ... according to sched_request the ATI device isn't OpenCL capable:
[pre]

1
ATI Radeon HD 5800 series (Cypress)
1024458752.000000
1
0
0.000000
0.000000
0.000000
50000000000.000000
1.4.815
8
1024
1788
508
0
0
64
20
1
256
256
16384
16384
16384

[/pre]

STE/E

Joined: 18 Jan 05

Posts: 33

Credit: 7886269

RAC: 0

RE: Hm ... according to

8 Dec 2011 20:14:16 UTC

Message 78532 in response to message 78531

Quote

(moderation:

)

Quote:

Hm ... according to sched_request the ATI device isn't OpenCL capable:
[pre]

1
ATI Radeon HD 5800 series (Cypress)
1024458752.000000
1
0
0.000000
0.000000
0.000000
50000000000.000000
1.4.815
8
1024
1788
508
0
0
64
20
1
256
256
16384
16384
16384

[/pre]

I don't think Cal version 1.4.815 is OpenCL Capable ...

STE\/E

Gaurav Khanna

Joined: 8 Nov 04

Posts: 9

Credit: 2818895

RAC: 0

Weird .. with BOINC 6.13 the

8 Dec 2011 20:49:03 UTC

Message 78533 in response to message 78532

Quote

(moderation:

)

Weird .. with BOINC 6.13 the same ATI GPU completed scores of work units:

For example:
http://albertathome.org/task/53182

And I use the ATI GPU for OpenCL development all the time ..

Gaurav Khanna

Joined: 8 Nov 04

Posts: 9

Credit: 2818895

RAC: 0

What exactly does BOINC do to

8 Dec 2011 20:50:44 UTC

Message 78534

Quote

(moderation:

)

What exactly does BOINC do to check for OpenCL?

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

RE: I don't think Cal

9 Dec 2011 9:35:08 UTC

Message 78535

Quote

(moderation:

)

Quote:

I don't think Cal version 1.4.815 is OpenCL Capable ...

It is, but you have to install the SDK and register the OpenCL ICD yourself. Installing the driver is not sufficient.

Gaurav, have you changed anything in your setup?

Oliver

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

RE: What exactly does BOINC

9 Dec 2011 10:10:21 UTC

Message 78536 in response to message 78534

Quote

(moderation:

)

Quote:

What exactly does BOINC do to check for OpenCL?

It uses libOpenCL via late binding to query a few basic properties. Using 10.8 (as you do) you have to make sure it's available as it's not installed automatically in the usual library paths. Better still, upgrade your driver to 11.7 (11.11 on Linux!). That version of the Catalyst driver installs the OpenCL runtime all by itself. Our app officially requires at least 11.7 anyway as we build it using SDK 2.5.

Oliver

Gaurav Khanna

Joined: 8 Nov 04

Posts: 9

Credit: 2818895

RAC: 0

Thanks Oliver. That was

9 Dec 2011 21:22:42 UTC

Message 78537 in response to message 78536

Quote

(moderation:

)

Thanks Oliver. That was helpful. Figured it out .. it was a 64-bit Vs 32-bit issue. The ATI app is 32-bit, but the BOINC client I built is 64-bit. So, I had to make both 32 and 64 bit OpenCL libs available in my library path.

Its working now (finished a result successfully):
http://albertathome.org/task/54521

x3mEn

Joined: 21 Jun 11

Posts: 9

Credit: 10000

RAC: 0

RE: Aborting task

11 Dec 2011 13:42:59 UTC

Message 78538

Quote

(moderation:

)

Quote:

Aborting task p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1: exceeded elapsed time limit 6394.64 (2800000.00G/432.72G)

3 WUs were aborted after ~6395 sec of run time for the same reason:
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1 - 6,395.56 sec
p2030.20100913.G44.55+00.20.C.b5s0g0.00000_1824_2 - 6,395.53 sec
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_744_0 - 6,395.32 sec

WTF?

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

Hi! My guess is that the way

11 Dec 2011 16:17:11 UTC

Message 78539 in response to message 78538

Quote

(moderation:

)

Hi!

My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. So the workunits would now have to be generated with a higher number of estimated/max floating point ops per task. Because BRP4 workunits are all created equal, no matter on which platform they will eventually get crunched, one value of estimated floating point ops must be used for CPU, NVIDIA/CUDA and ATI/OpenCL (and all supported BOINC client versions). No trivial task.

Until this gets fixed on the server side for new workunits, one could theoretically do a workaround in the client_state.xml file to prevent more WUs erroring out (and wrecking your quota):

- Stop BOINC
- open the client_state.xml file in an editor
- replace occurrences of the following two lines

[pre]
140000000000000.000000
2800000000000000.000000
[/pre]
with (say)
[pre]
1400000000000000.000000
28000000000000000.000000
[/pre]

(Actually you should check that only WUs of the Albert@Home project are changed if you use that BOINC instance for other projects as well).

CU
HBE

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

RE: My guess is that the

13 Dec 2011 14:11:49 UTC

Message 78540 in response to message 78539

Quote

(moderation:

)

Quote:

My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x.

No, it's worse - it's the new server code ("credit new") that does this. Although we are still using server-assigned credit here on Albert, the run time estimation etc. is handled by the new system. This is part of what we intend to test here.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

oh.... Were you able to

13 Dec 2011 19:20:44 UTC

Message 78541 in response to message 78540

Quote

(moderation:

)

oh....

Were you able to compensate for this effect? That is, will newly generated WUs have a chance to be computed in time without the manual editing I described above?

Thx,
HB

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

So far I didn't get any help

14 Dec 2011 10:46:12 UTC

Message 78542 in response to message 78541

Quote

(moderation:

)

So far I didn't get any help from the BOINC devs that I asked for, so I'm still analyzing and digging through the code myself.

Indeed it currently looks like some things have changed on both ends - client and server - and I still need to understand how these changes work together.

Edit:

Hm, apparently rsc_fpops_est and rsc_fpops_bound pass the server code unchanged, that are still the values written by the WUG...

In the Client there is:
[pre] max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops;
[/pre]
where rsc_fpops_bound should be what it gets passed from the server, and avp->flops the "flops" of the "app version". Apparently that's also sent by the server for the App it sends.

Bikeman, what's that in your case? There must be something like

[pre]
...
62700155339.574905
...

[/pre]
in the client_state.xml you edited.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

Hi! Let me see... [pre]

14 Dec 2011 19:24:29 UTC

Message 78543 in response to message 78542

Quote

(moderation:

)

Hi!

Let me see...

[pre]

einsteinbinary_BRP4
119
i686-pc-linux-gnu
0.150000
1.000000
2402526409719.028320
atiOpenCL
[/pre]

(but that is after I made the editing).

As different users see different cut-off times, I would have expected that this scales with the BOINC benchmark result for the individual graphics card. For mine, it seems to be this:

[pre]

1
ATI Radeon HD 5800 series (Cypress)
1002438656.000000
1
1
4176000000000.000000
[/pre]

CU
HB

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

The "peak_flops" is not

14 Dec 2011 20:12:58 UTC

Message 78544 in response to message 78543

Quote

(moderation:

)

The "peak_flops" is not benchmarked. It's merely a theoretical upper bound derived basically from the number of "cores" times the clock frequency. Thus the peak_flops should be identical for similar devices.

According to the CreditNew description "The scheduler adjusts this [peak_flops], using the elapsed time statistics, to get the app_version.flops_est it sends to the client (from which job durations are estimated)."

pragmatic pranc...

Joined: 26 Jan 05

Posts: 153

Credit: 70000

RAC: 0

RE: [pre]

14 Dec 2011 20:17:33 UTC

Message 78545 in response to message 78543

Quote

(moderation:

)

Quote:

[pre]

einsteinbinary_BRP4
119
i686-pc-linux-gnu
0.150000
1.000000
2402526409719.028320
atiOpenCL
[/pre]

Yes, let's see. Local stuff, from the HD4850
[pre]

einsteinbinary_BRP4
109
windows_intelx86
0.200000
1.000000
33321667311.708328
ATIOpenCL
6.13.8
[/pre]

Quote:

[pre]

1
ATI Radeon HD 5800 series (Cypress)
1002438656.000000
1
1
4176000000000.000000
[/pre]

For the actual card, I'll do the whole shebang:
[pre]

1
ATI Radeon HD 4700/4800 (RV740/RV770)
1040187392.000000
1
1
2000000000000.000000
1.4.1607
5
1024
2047
2047
625
900
64
10
1
256
4096
8192
8192
8192

ATI RV770
Advanced Micro Devices, Inc.
4098
1
0
62
63
1
1
cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing
1073741824
16384
625
10
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)
OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)
CAL 1.4.1607

[/pre]

How come my flops are so little on the einsteinbinary? Only 33321667311 for Windows versus Heinz's 2402526409719 for Linux? Yeah I get it, those are estimated flops by the server, but heck. When the peak flops my GPU can do is 2000000000000 flops, or 60 times the estimated amount of flops my GPU is getting, no wonder why a) work is estimated so (s)low and b) I have a TDCF of 9.59! Plonk.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

Hi! This is quite a mission

14 Dec 2011 21:40:30 UTC

Message 78546 in response to message 78545

Quote

(moderation:

)

Hi!

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!

I think (dreaming) that it would be best if (optionally) apps could have a self-profiling option. E.g. if the app info stuff defined for the app in question includes a special tag (say ), then the BOINC client could call the app with all the WU command line items of a workunit, and the additional command line option --benchmark-only . The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit.

pragmatic pranc...

Joined: 26 Jan 05

Posts: 153

Credit: 70000

RAC: 0

RE: This is quite a mission

14 Dec 2011 21:53:21 UTC

Message 78547 in response to message 78546

Quote

(moderation:

)

Quote:

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!

If no work ever validates, like in my situation, then the adjustment will also never happen. Why will the work never validate? That's still where I come up clueless - no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-)

Quote:

... The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit.

Either that or allow that the user sets the amount of flops for all OpenCL capable hardware. Now it can only be set on CUDA work and then only when using the anonymous platform.

But really, why is the flops count for my hardware set so low on the server, when the peak flops show that it can do way better. For the HD5800 the amount of digits for the peak flops and the 'actual flops' is the same (13). For my HD4850 it's 13 for the peak flops and 11 for the actual flops. There's got to be something wrong there.

I think I'll get work, then exit BOINC, adjust the flops number in client_state.xml, restart BOINC and see if that will make a difference. Maybe that will even validate work.

Edit: hehe, I edited the flops value to 1919403979592.269394, now all tasks think they'll run for 11 minutes. That's gonna wreck my DCF completely. ;-)

Maybe I should make it even less...

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

RE: This is quite a mission

14 Dec 2011 22:11:07 UTC

Message 78548 in response to message 78546

Quote

(moderation:

)

Quote:

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!

Yep, that occurred to me, too.

I already added some code to our plan-class stuff that should allow me to play around with the flops estimation a bit. I intend to do this tomorrow, together with some more analysis of the scheduler code (sched_version.cpp).

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

@Jord: Well, because I

14 Dec 2011 22:19:43 UTC

Message 78549

Quote

(moderation:

)

@Jord: Well, because I fiddled manually in client_state.xml, finally a lot of workunits were completed (even validated, not sure if that matters), and my card took ca 4500 sec per WU.

I think your card takes around 33k sec to complete a WU, so no matter what the theoretical (computed) peak performance is, the server would be right to assign a ca 7 times lower performance to your card.

Why is your card slower? Actually the debugging output gives a hint that because of the physical capabilities of your card, the app was forced to re-size the internal processing layout to make it fit. I'm afraid it requires some deep analysis to find out whether this re-sizing is leading to differences in the result of the computation, and whether the differences are tolerable (==>validator adjustment) or intolerable (maybe the re-sizing has a bug).

It would be instructive to see whether the debugging output in question is common to all 4xxx series cards. Ah..but that's a different subject.

@Bernd: my host is now almost out of work (was on nomorework) so I'll give it a try tomorrow or whenever it's ready.

CU
HB

robertmiles

Joined: 16 Nov 11

Posts: 31

Credit: 4468368

RAC: 0

Over on GPUGRID, I saw

14 Dec 2011 22:54:54 UTC

Message 78550

Quote

(moderation:

)

Over on GPUGRID, I saw something about them finding that the HD4xxx series cards had some type of memory access problem - a limit on the amount of graphics memory each processor on the GPU can access before it starts using a much lower bandwidth path to the computer's main memory instead. I haven't kept up with whether more recent software updates have removed this restriction.

pragmatic pranc...

Joined: 26 Jan 05

Posts: 153

Credit: 70000

RAC: 0

RE: I think your card takes

14 Dec 2011 23:14:54 UTC

Message 78551 in response to message 78549

Quote

(moderation:

)

Quote:

I think your card takes around 33k sec to complete a WU, so no matter what the theoretical (computed) peak performance is, the server would be right to assign a ca 7 times lower performance to your card.

7 or 60? Quite some difference. But OK, I am running with a changed flops value, still only 11 digits long but different than what Albert gave me. Since its estimates are all too low (you're right about the ~32k seconds) I've made it think that the tasks are actually longer, not shorter.

Just too bad I'm still quite busy with Skyrim. That hacks into the time anything else can use the GPU. ;-)

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

The way of calculating

15 Dec 2011 19:02:22 UTC

Message 78552 in response to message 78551

Quote

(moderation:

)

The way of calculating (projected_)flops differs largely depending on how many tasks with this app version your host has successfully computed. Maybe this value differs between your hosts.

Anyway, I did change the scheduler (the "projected_flops" supplied by th eplan classes should be much lower now). At least they should bot overestimate the actual flops now, which could lead to "maximum time exceeded" errors. Time estimates on the Client side may be far off now, though. Have a try.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

Hmmm....I get the same

15 Dec 2011 20:44:28 UTC

Message 78553 in response to message 78552

Quote

(moderation:

)

Hmmm....I get the same cut-off time as before (even tho I resetted the Albert project before allowing new work). In addition, the app now seems to be configured to use a full CPU core.

http://albertathome.org/task/66620

CU
HB

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

Ok, scheduler reverted. Needs

15 Dec 2011 21:16:35 UTC

Message 78554 in response to message 78553

Quote

(moderation:

)

Ok, scheduler reverted. Needs further investigation.

pragmatic pranc...

Joined: 26 Jan 05

Posts: 153

Credit: 70000

RAC: 0

RE: The way of calculating

16 Dec 2011 0:32:33 UTC

Message 78555 in response to message 78552

Quote

(moderation:

)

Quote:

The way of calculating (projected_)flops differs largely depending on how many tasks with this app version your host has successfully computed. Maybe this value differs between your hosts.

LOL, like zero times for me? None of the tasks I do validate, remember?

As for testing your over_flops, what do I do with the extra tasks? Stupid BOINC always fetches 6 tasks, doesn't matter that it then takes ~9 days to do them... Though you now have ~9 days to come up with a better schedule(r). ;-)

robertmiles

Joined: 16 Nov 11

Posts: 31

Credit: 4468368

RAC: 0

Have you thought of starting

16 Dec 2011 0:54:09 UTC

Message 78556

Quote

(moderation:

)

Have you thought of starting with a certain number of dummy tasks, to be replaced with similar information from tasks actually completed as soon as there are enough of them?

Some BOINC projects limit the number of tasks any computer can have downloaded and in progress at first, with this limit relaxed as soon as there are enough tasks successfully completed by that computer to get a better idea of how often it can handle yet another workunit.

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

RE: Some BOINC projects

16 Dec 2011 13:21:13 UTC

Message 78557 in response to message 78556

Quote

(moderation:

)

Quote:

Some BOINC projects limit the number of tasks any computer can have downloaded and in progress at first, with this limit relaxed as soon as there are enough tasks successfully completed by that computer to get a better idea of how often it can handle yet another workunit.

That's certainly an option to limit the effect of the runtime estimation / work fetch going mad. But actually I'd like to understand and fix what's going wrong in the first place.

For now i raised the FLOPS estimation and thus the FLOPS limit by a factor of 10 for newly generated workunits. It will take some time (usually about 1.5d) until the first tasks from that will be sent out, though.

robertmiles

Joined: 16 Nov 11

Posts: 31

Credit: 4468368

RAC: 0

I've read that at least some

16 Dec 2011 17:30:54 UTC

Message 78558

Quote

(moderation:

)

I've read that at least some of the BOINC versions never initialize one of the variables often used in runtime estimation. You may want to add reporting of the variables you use so you can check for signs of this.

pragmatic pranc...

Joined: 26 Jan 05

Posts: 153

Credit: 70000

RAC: 0

RE: That's certainly an

17 Dec 2011 1:23:32 UTC

Message 78559 in response to message 78557

Quote

(moderation:

)

Quote:

That's certainly an option to limit the effect of the runtime estimation / work fetch going mad. But actually I'd like to understand and fix what's going wrong in the first place.

There is something weird going on with the amount of tasks one has per day. As you can see from my double zero credit & RAC, I haven't had one task validate yet. So by now, the amount of tasks I should be able to download for the v1.19 app should be 1, maybe 2.

Yesterday it was 26, now it is 32. Why is it going up?
I am not returning any valid work. Shouldn't it, like in the old days, continue to go down and eventually only give me 1 task per device (CPU core or GPU) per day? As with this, I can continue ad infinitum doing 'bad work'.

Bernd Machenschalk

Administrator

Joined: 15 Oct 04

Posts: 155

Credit: 6218130

RAC: 0

I incorporated D.A.s recent

20 Dec 2011 10:26:47 UTC

Message 78560

Quote

(moderation:

)

I incorporated D.A.s recent fix for using "conservative flops estimate" in case "we don't have enough statistics" (i.e. too few valid results) into the scheduler running on Albert.

Let's see whether this helps ...

PS: Besides I added some logging that should write the Client's max runtime for every job sent to the scheduler log. You may spot it in the logs for your hosts.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

Hi! I just got

22 Dec 2011 22:46:41 UTC

Message 78561 in response to message 78560

Quote

(moderation:

)

Hi!

I just got this:

[pre]
2011-12-22 22:39:37.5065 [PID=14669] [version] Checking plan class 'atiOpenCL'
2011-12-22 22:39:37.5065 [PID=14669] [version] host_flops: 2.972295e+09, speedup: 15.00, projected_flops: 4.458442e+10, peak_flops: 4.176000e+12, peak_flops_factor: 1.00
[/pre]

Still, the estimated CPU time as displayed by boinccmd for such a task is below 50 seconds ... :-( It will actaully take almost 100 times longer.

Bikeman (Heinz-...

Joined: 28 Aug 06

Posts: 164

Credit: 1864017

RAC: 0

I guess I got a few from the

25 Dec 2011 21:16:15 UTC

Message 78562 in response to message 78561

Quote

(moderation:

)

I guess I got a few from the old batch.

Now everything is fine, the runtime estimate is reasonably pessimistic now and tasks validate ok.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

RE: no hint from Oliver

4 Jan 2012 11:58:47 UTC

Message 78563 in response to message 78547

Quote

(moderation:

)

Quote:

no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-)

Sort of, holiday season... :-)

Happy new year!

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

RE: It would be instructive

4 Jan 2012 12:02:15 UTC

Message 78564 in response to message 78549

Quote

(moderation:

)

Quote:

It would be instructive to see whether the debugging output in question is common to all 4xxx series cards. Ah..but that's a different subject.

They will. The 4xxx series doesn't support local memory, it's emulated via global memory which incurs a big impact on performance. Also, this series only allows for 64 work items per work group when local memory is used, hence the resizing. However, I doubt that the resizing actually affects the accuracy of the computation, but if it does, it needs to be fixed!

Oliver

Joined: 28 Feb 05

Posts: 10

Credit: 1285478

RAC: 0

Hi, I also have aborted task

6 Jan 2012 20:38:39 UTC

Message 78565

Quote

(moderation:

)

Hi,

I also have aborted task due to

Quote:

exceeded elapsed time limit 19036.53 (28000000.00G/1470.86G) problem

The GPU is in bad state with reboot required. All other downloaded openCL tasks are started by BOINC and immediately aborted with:

Quote:

Output file p2030.20100913.G44.55+00.20.N.b6s0g0.00000_2424_1_3 for task p2030.20100913.G44.55+00.20.N.b6s0g0.00000_2424_1 absent

This is finished after reaching the daily quota of task
I successfully finished atiopenCL tasks with 50000s runtime.
System: Linux Ubuntu Oneiric
OpenCL: ATI GPU 0: Juniper (driver version CAL 1.4.1646, device version OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213), 1024MB)
Catalyst 11.11

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

RE: They will. The 4xxx

31 Jan 2012 11:21:05 UTC

Message 78566 in response to message 78564

Quote

(moderation:

)

Quote:

They will. The 4xxx series doesn't support local memory, it's emulated via global memory which incurs a big impact on performance. Also, this series only allows for 64 work items per work group when local memory is used, hence the resizing. However, I doubt that the resizing actually affects the accuracy of the computation, but if it does, it needs to be fixed!

Well, it turned out it does indeed! We'll fix it ASAP.

Oliver

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 320

Credit: 8545955

RAC: 0

Ok, bug fix implemented and

1 Feb 2012 12:19:53 UTC

Message 78567 in response to message 78566

Quote

(moderation:

)

Ok, bug fix implemented and tested. We'll release v1.20 shortly...

Oliver