Albert@home is still an unofficial, non-public test project. Don't expect anything to work here. The only type of work Albert@home is currently sending out is for a highly experimental BRP4 OpenCL application.
Copyright © 2024 Einstein@Home. All rights reserved.
Comments
Since the upgrade to 7.0.2
)
Since the upgrade to 7.0.2 I'm not getting any work for the GPUs ..
08-Dec-2011 07:20:02 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:20:02 [Albert@Home] Requesting new tasks for ATI
08-Dec-2011 07:20:05 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:20:05 [Albert@Home] No tasks sent
08-Dec-2011 07:58:09 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:58:09 [Albert@Home] Requesting new tasks for NVIDIA
08-Dec-2011 07:58:11 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:58:11 [Albert@Home] No tasks sent
host details are here:
http://albertathome.org/host/1396
Any thoughts?
Click on the "last contact"
)
Click on the "last contact" link to see the scheduler logs.
In case of your host I see
Hm - looks like there's something wrong with the GPU RAM size reporting. I'll look into that tomorrow.
BM
BM
Hm ... according to
)
Hm ... according to sched_request the ATI device isn't OpenCL capable:
[pre]
1
ATI Radeon HD 5800 series (Cypress)
1024458752.000000
1
0
0.000000
0.000000
0.000000
50000000000.000000
1.4.815
8
1024
1788
508
0
0
64
20
1
256
256
16384
16384
16384
[/pre]
BM
RE: Hm ... according to
)
I don't think Cal version 1.4.815 is OpenCL Capable ...
STE\/E
Weird .. with BOINC 6.13 the
)
Weird .. with BOINC 6.13 the same ATI GPU completed scores of work units:
For example:
http://albertathome.org/task/53182
And I use the ATI GPU for OpenCL development all the time ..
What exactly does BOINC do to
)
What exactly does BOINC do to check for OpenCL?
RE: I don't think Cal
)
It is, but you have to install the SDK and register the OpenCL ICD yourself. Installing the driver is not sufficient.
Gaurav, have you changed anything in your setup?
Oliver
RE: What exactly does BOINC
)
It uses libOpenCL via late binding to query a few basic properties. Using 10.8 (as you do) you have to make sure it's available as it's not installed automatically in the usual library paths. Better still, upgrade your driver to 11.7 (11.11 on Linux!). That version of the Catalyst driver installs the OpenCL runtime all by itself. Our app officially requires at least 11.7 anyway as we build it using SDK 2.5.
Oliver
Thanks Oliver. That was
)
Thanks Oliver. That was helpful. Figured it out .. it was a 64-bit Vs 32-bit issue. The ATI app is 32-bit, but the BOINC client I built is 64-bit. So, I had to make both 32 and 64 bit OpenCL libs available in my library path.
Its working now (finished a result successfully):
http://albertathome.org/task/54521
RE: Aborting task
)
3 WUs were aborted after ~6395 sec of run time for the same reason:
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1 - 6,395.56 sec
p2030.20100913.G44.55+00.20.C.b5s0g0.00000_1824_2 - 6,395.53 sec
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_744_0 - 6,395.32 sec
WTF?
Hi! My guess is that the way
)
Hi!
My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. So the workunits would now have to be generated with a higher number of estimated/max floating point ops per task. Because BRP4 workunits are all created equal, no matter on which platform they will eventually get crunched, one value of estimated floating point ops must be used for CPU, NVIDIA/CUDA and ATI/OpenCL (and all supported BOINC client versions). No trivial task.
Until this gets fixed on the server side for new workunits, one could theoretically do a workaround in the client_state.xml file to prevent more WUs erroring out (and wrecking your quota):
- Stop BOINC
- open the client_state.xml file in an editor
- replace occurrences of the following two lines
[pre]
140000000000000.000000
2800000000000000.000000
[/pre]
with (say)
[pre]
1400000000000000.000000
28000000000000000.000000
[/pre]
(Actually you should check that only WUs of the Albert@Home project are changed if you use that BOINC instance for other projects as well).
CU
HBE
RE: My guess is that the
)
No, it's worse - it's the new server code ("credit new") that does this. Although we are still using server-assigned credit here on Albert, the run time estimation etc. is handled by the new system. This is part of what we intend to test here.
BM
BM
oh.... Were you able to
)
oh....
Were you able to compensate for this effect? That is, will newly generated WUs have a chance to be computed in time without the manual editing I described above?
Thx,
HB
So far I didn't get any help
)
So far I didn't get any help from the BOINC devs that I asked for, so I'm still analyzing and digging through the code myself.
Indeed it currently looks like some things have changed on both ends - client and server - and I still need to understand how these changes work together.
BM
Edit:
Hm, apparently rsc_fpops_est and rsc_fpops_bound pass the server code unchanged, that are still the values written by the WUG...
In the Client there is:
[pre] max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops;
[/pre]
where rsc_fpops_bound should be what it gets passed from the server, and avp->flops the "flops" of the "app version". Apparently that's also sent by the server for the App it sends.
Bikeman, what's that in your case? There must be something like
[pre]
...
62700155339.574905
...
[/pre]
in the client_state.xml you edited.
BM
Hi! Let me see... [pre]
)
Hi!
Let me see...
[pre]
einsteinbinary_BRP4
119
i686-pc-linux-gnu
0.150000
1.000000
2402526409719.028320
atiOpenCL
[/pre]
(but that is after I made the editing).
As different users see different cut-off times, I would have expected that this scales with the BOINC benchmark result for the individual graphics card. For mine, it seems to be this:
[pre]
1
ATI Radeon HD 5800 series (Cypress)
1002438656.000000
1
1
4176000000000.000000
[/pre]
CU
HB
The "peak_flops" is not
)
The "peak_flops" is not benchmarked. It's merely a theoretical upper bound derived basically from the number of "cores" times the clock frequency. Thus the peak_flops should be identical for similar devices.
According to the CreditNew description "The scheduler adjusts this [peak_flops], using the elapsed time statistics, to get the app_version.flops_est it sends to the client (from which job durations are estimated)."
BM
BM
RE: [pre]
)
Yes, let's see. Local stuff, from the HD4850
[pre]
einsteinbinary_BRP4
109
windows_intelx86
0.200000
1.000000
33321667311.708328
ATIOpenCL
6.13.8
[/pre]
For the actual card, I'll do the whole shebang:
[pre]
1
ATI Radeon HD 4700/4800 (RV740/RV770)
1040187392.000000
1
1
2000000000000.000000
1.4.1607
5
1024
2047
2047
625
900
64
10
1
256
4096
8192
8192
8192
ATI RV770
Advanced Micro Devices, Inc.
4098
1
0
62
63
1
1
cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing
1073741824
16384
625
10
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)
OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)
CAL 1.4.1607
[/pre]
How come my flops are so little on the einsteinbinary? Only 33321667311 for Windows versus Heinz's 2402526409719 for Linux? Yeah I get it, those are estimated flops by the server, but heck. When the peak flops my GPU can do is 2000000000000 flops, or 60 times the estimated amount of flops my GPU is getting, no wonder why a) work is estimated so (s)low and b) I have a TDCF of 9.59! Plonk.
Hi! This is quite a mission
)
Hi!
This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!
I think (dreaming) that it would be best if (optionally) apps could have a self-profiling option. E.g. if the app info stuff defined for the app in question includes a special tag (say ), then the BOINC client could call the app with all the WU command line items of a workunit, and the additional command line option --benchmark-only . The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit.
HB
RE: This is quite a mission
)
If no work ever validates, like in my situation, then the adjustment will also never happen. Why will the work never validate? That's still where I come up clueless - no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-)
Either that or allow that the user sets the amount of flops for all OpenCL capable hardware. Now it can only be set on CUDA work and then only when using the anonymous platform.
But really, why is the flops count for my hardware set so low on the server, when the peak flops show that it can do way better. For the HD5800 the amount of digits for the peak flops and the 'actual flops' is the same (13). For my HD4850 it's 13 for the peak flops and 11 for the actual flops. There's got to be something wrong there.
I think I'll get work, then exit BOINC, adjust the flops number in client_state.xml, restart BOINC and see if that will make a difference. Maybe that will even validate work.
Edit: hehe, I edited the flops value to 1919403979592.269394, now all tasks think they'll run for 11 minutes. That's gonna wreck my DCF completely. ;-)
Maybe I should make it even less...
RE: This is quite a mission
)
Yep, that occurred to me, too.
I already added some code to our plan-class stuff that should allow me to play around with the flops estimation a bit. I intend to do this tomorrow, together with some more analysis of the scheduler code (sched_version.cpp).
BM
BM
@Jord: Well, because I
)
@Jord: Well, because I fiddled manually in client_state.xml, finally a lot of workunits were completed (even validated, not sure if that matters), and my card took ca 4500 sec per WU.
I think your card takes around 33k sec to complete a WU, so no matter what the theoretical (computed) peak performance is, the server would be right to assign a ca 7 times lower performance to your card.
Why is your card slower? Actually the debugging output gives a hint that because of the physical capabilities of your card, the app was forced to re-size the internal processing layout to make it fit. I'm afraid it requires some deep analysis to find out whether this re-sizing is leading to differences in the result of the computation, and whether the differences are tolerable (==>validator adjustment) or intolerable (maybe the re-sizing has a bug).
It would be instructive to see whether the debugging output in question is common to all 4xxx series cards. Ah..but that's a different subject.
@Bernd: my host is now almost out of work (was on nomorework) so I'll give it a try tomorrow or whenever it's ready.
CU
HB
Over on GPUGRID, I saw
)
Over on GPUGRID, I saw something about them finding that the HD4xxx series cards had some type of memory access problem - a limit on the amount of graphics memory each processor on the GPU can access before it starts using a much lower bandwidth path to the computer's main memory instead. I haven't kept up with whether more recent software updates have removed this restriction.
RE: I think your card takes
)
7 or 60? Quite some difference. But OK, I am running with a changed flops value, still only 11 digits long but different than what Albert gave me. Since its estimates are all too low (you're right about the ~32k seconds) I've made it think that the tasks are actually longer, not shorter.
Just too bad I'm still quite busy with Skyrim. That hacks into the time anything else can use the GPU. ;-)
The way of calculating
)
The way of calculating (projected_)flops differs largely depending on how many tasks with this app version your host has successfully computed. Maybe this value differs between your hosts.
Anyway, I did change the scheduler (the "projected_flops" supplied by th eplan classes should be much lower now). At least they should bot overestimate the actual flops now, which could lead to "maximum time exceeded" errors. Time estimates on the Client side may be far off now, though. Have a try.
BM
BM
Hmmm....I get the same
)
Hmmm....I get the same cut-off time as before (even tho I resetted the Albert project before allowing new work). In addition, the app now seems to be configured to use a full CPU core.
http://albertathome.org/task/66620
CU
HB
Ok, scheduler reverted. Needs
)
Ok, scheduler reverted. Needs further investigation.
BM
BM
RE: The way of calculating
)
LOL, like zero times for me? None of the tasks I do validate, remember?
As for testing your over_flops, what do I do with the extra tasks? Stupid BOINC always fetches 6 tasks, doesn't matter that it then takes ~9 days to do them... Though you now have ~9 days to come up with a better schedule(r). ;-)
Have you thought of starting
)
Have you thought of starting with a certain number of dummy tasks, to be replaced with similar information from tasks actually completed as soon as there are enough of them?
Some BOINC projects limit the number of tasks any computer can have downloaded and in progress at first, with this limit relaxed as soon as there are enough tasks successfully completed by that computer to get a better idea of how often it can handle yet another workunit.
RE: Some BOINC projects
)
That's certainly an option to limit the effect of the runtime estimation / work fetch going mad. But actually I'd like to understand and fix what's going wrong in the first place.
For now i raised the FLOPS estimation and thus the FLOPS limit by a factor of 10 for newly generated workunits. It will take some time (usually about 1.5d) until the first tasks from that will be sent out, though.
BM
BM
I've read that at least some
)
I've read that at least some of the BOINC versions never initialize one of the variables often used in runtime estimation. You may want to add reporting of the variables you use so you can check for signs of this.
RE: That's certainly an
)
There is something weird going on with the amount of tasks one has per day. As you can see from my double zero credit & RAC, I haven't had one task validate yet. So by now, the amount of tasks I should be able to download for the v1.19 app should be 1, maybe 2.
Yesterday it was 26, now it is 32. Why is it going up?
I am not returning any valid work. Shouldn't it, like in the old days, continue to go down and eventually only give me 1 task per device (CPU core or GPU) per day? As with this, I can continue ad infinitum doing 'bad work'.
I incorporated D.A.s recent
)
I incorporated D.A.s recent fix for using "conservative flops estimate" in case "we don't have enough statistics" (i.e. too few valid results) into the scheduler running on Albert.
Let's see whether this helps ...
BM
PS: Besides I added some logging that should write the Client's max runtime for every job sent to the scheduler log. You may spot it in the logs for your hosts.
BM
Hi! I just got
)
Hi!
I just got this:
[pre]
2011-12-22 22:39:37.5065 [PID=14669] [version] Checking plan class 'atiOpenCL'
2011-12-22 22:39:37.5065 [PID=14669] [version] host_flops: 2.972295e+09, speedup: 15.00, projected_flops: 4.458442e+10, peak_flops: 4.176000e+12, peak_flops_factor: 1.00
[/pre]
Still, the estimated CPU time as displayed by boinccmd for such a task is below 50 seconds ... :-( It will actaully take almost 100 times longer.
HB
I guess I got a few from the
)
I guess I got a few from the old batch.
Now everything is fine, the runtime estimate is reasonably pessimistic now and tasks validate ok.
HB
RE: no hint from Oliver
)
Sort of, holiday season... :-)
Happy new year!
RE: It would be instructive
)
They will. The 4xxx series doesn't support local memory, it's emulated via global memory which incurs a big impact on performance. Also, this series only allows for 64 work items per work group when local memory is used, hence the resizing. However, I doubt that the resizing actually affects the accuracy of the computation, but if it does, it needs to be fixed!
Oliver
Hi, I also have aborted task
)
Hi,
I also have aborted task due to
.
The GPU is in bad state with reboot required. All other downloaded openCL tasks are started by BOINC and immediately aborted with:
This is finished after reaching the daily quota of task
I successfully finished atiopenCL tasks with 50000s runtime.
System: Linux Ubuntu Oneiric
OpenCL: ATI GPU 0: Juniper (driver version CAL 1.4.1646, device version OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213), 1024MB)
Catalyst 11.11
RE: They will. The 4xxx
)
Well, it turned out it does indeed! We'll fix it ASAP.
Oliver
Ok, bug fix implemented and
)
Ok, bug fix implemented and tested. We'll release v1.20 shortly...
Oliver