Project server code update

The project will be taken down in about an hour to perform an update of the BOINC server code. Ideally you shouldn't notice anything, but usually the world isn't ideal. See you again on the other side.

Comments

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: @Richard, yeah the pfc

Message 79998 in response to message 79997

Quote:
@Richard, yeah the pfc scale should be compensating handily for any initial SIMD related disagreement there (It's had enough time), but since the scaling swing is in the opposite direction to GPU, and likely below 1 as at seti (which implies magical CPU fairies), I believe the coarse scaling correction there should be the first step in isolation. Supporting effects include the SIMAP non-SIMD app on SIMD aware android client whetstone, as well as Seti's uniformly below 1 pfc_scales despite quite tight theoretically based estimates.


Are you sure that the SETI rsc_fpops_est are 'tight'?

I remember that when we were helping Josh Von Korff choose initial runtime estimates for Astropulse, we had a rule-of-thumb that the *stock* MB CPU app reached a DCF - only available scaling factor in those days - of ~0.2 on the then cutting-edge Intel Core2 range (Q6600 and similar). The stock SETI app has had internal despatch of at least some SIMD pathways for a long time, and more have been added over the years.

Knowing Eric's approach to these matters - he's never wanted to exclude anyone from the search for ET, no matter how primitive their hardware - I suspect rsc_fpops_est may have been 'tight' for the mythical cobblestone reference machine (1 GHz FPU only), but never since.

Certainly the same GPU I'm plotting here has a SETI APR of 180, again running two tasks at once. So, even allowing for the inefficient GPU processing of autocorrelations (which I don't think has been reversed into the AR-fpops curve), SETI thinks this card is twice as fast as the stock BOINC code here and at GPUGrid does. At GPUGrid, it runs pretty tight cuda60 code, and since the whole project has revolved around NV and cuda since they dropped the PS3 dead-end, I reckon they know their stuff. They over-pay credit, but that's a manual decision, not BOINC's fault.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: Are you sure that the

Message 79999 in response to message 79998

Quote:
Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

Quote:
Certainly the same GPU I'm plotting here has a SETI APR of 180, again running two tasks at once. So, even allowing for the inefficient GPU processing of autocorrelations (which I don't think has been reversed into the AR-fpops curve), SETI thinks this card is twice as fast as the stock BOINC code here and at GPUGrid does. At GPUGrid, it runs pretty tight cuda60 code, and since the whole project has revolved around NV and cuda since they dropped the PS3 dead-end, I reckon they know their stuff. They over-pay credit, but that's a manual decision, not BOINC's fault.

It's relative. [You use the theoretically best serial algorithm for estimate, as opposed to reflect back implementation ineffeicient or otherwise]... On Seti the GPU autocorrelation uses my own 4NFFT method, so in [uncounted] computation it's 4x...Since that's drowned by latencies to the tune of 60% on high end cards, you roughly double the claim for that portion. Other areas (variable) are more efficient. Then divide the overall claim by about 3.3 times (because of AVX global pfc). The net result is 'shorties' that should be getting ~100 credits, getting ~40-60

A similar effect is happening here, with tasks that should be ~1000, seeing a median of ~500. It's not the project supplied estimates that are out, but the induced scaling error.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: Are you sure that

Message 80000 in response to message 79999

Quote:
Quote:
Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.


I have to say I'm ... surprised.

For a recently completed task, I can see

14979862651149.264000
111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: RE: Are you

Message 80001 in response to message 80000

Quote:
Quote:
Quote:
Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

I have to say I'm ... surprised.

For a recently completed task, I can see

14979862651149.264000
111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.

'That' has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Bernd Machenschalk
Bernd Machenschalk
Administrator
Joined: 15 Oct 04
Posts: 155
Credit: 6218130
RAC: 0

RE: Attached a new host to

Message 80002 in response to message 79976

Quote:

Attached a new host to Albert, looking through the logs i keep getting the following download error:

14-Jun-2014 06:06:34 [Albert@Home] Giving up on download of EatH_mastercat_1344952579.txt: permanent HTTP error

Was related to the web code update. Shouldn't have affected crunching. Should be fixed now.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: 'That' has already

Message 80003 in response to message 80001

Quote:
'That' has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).


Yes, I said I was running anon. plat. I'll have to get a raw one on a stock machine.

Quote:
seeing a median of ~500


?? My own machine - attached after the bulk of the averaging has been done by the population at large - is showing a median of 1716.32 currently. That's climbed from 1644.33 at last night's show.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: 'That' has

Message 80004 in response to message 80003

Quote:
Quote:
'That' has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).

Yes, I said I was running anon. plat. I'll have to get a raw one on a stock machine.

Quote:
seeing a median of ~500

?? My own machine - attached after the bulk of the averaging has been done by the population at large - is showing a median of 1716.32 currently. That's climbed from 1644.33 at last night's show.

i.e. median on *my* GPU here early in the piece, which has been running one task at a time, from the figures you posted earlier at http://albertathome.org/node/84961&postid=112911

We know that after that it's been going up, It's not clear yet IMO whether that's a controlled rise/correction or another instability.

[There's two scales fighting, the host app version driving it upward, and the global PFC scale selected from the underclaiming CPU app driving it down, which one wins is a matter of numbers]

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: i.e. median on *my* GPU

Message 80005 in response to message 80004

Quote:

i.e. median on *my* GPU here early in the piece, which has been running one task at a time, from the figures you posted earlier at http://albertathome.org/node/84961&postid=112911

We know that after that it's been going up, It's not clear yet IMO whether that's a controlled rise/correction or another instability.


Still rising inexorably here - this is roughly the last three days (right margin is midnight UTC tonight). Horizontal lines are 1K and 10K, still logarithmic. My minimum (of 60) is 1355

Edit - to your edit: there's no CPU app here - that was Eric's point.

'Binary Radio Pulsar Search' and 'Binary Radio Pulsar Search (Arecibo, GPU)' are deployed as different Applications on this project, not just different app_versions.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: RE: RE: Are

Message 80006 in response to message 80001

Quote:
Quote:
Quote:
Quote:
Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

I have to say I'm ... surprised.

For a recently completed task, I can see

14979862651149.264000
111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.

'That' has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).


CPU flops under anon are [unless supplied in app_info.xml] raw whetstone for CPU and some mystical figure for GPU. At some point it was just 10X CPU and I am seeing roughly 10x for Eve.

rsc_fpops_est for anon is a combination of APR and rsc_fpops_est iirc.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: Still rising inexorably

Message 80007 in response to message 80005

Quote:
Still rising inexorably here - this is roughly the last three days (right margin is midnight UTC tonight). Horizontal lines are 1K and 10K, still logarithmic. My minimum (of 60) is 1355

No certainty in how that will react to correcting the CPU app scale as patch one. In an ideal world it wouldn't react at all (though I firmly believe it will react visibly, I'm open to surprises.). That's what I want to watch, because I suspect it should bump up a bit further, then level, drop and oscillate (for subsequent smoothing in patch two). Anyway, with such steady work it should have stabilised by now and hasn't so rolling up my sleeves for the first pass (CPU coarse scale correction)

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: [There's two scales

Message 80008 in response to message 80004

Quote:

[There's two scales fighting, the host app version driving it upward, and the global PFC scale selected from the underclaiming CPU app driving it down, which one wins is a matter of numbers]


There is no underclaiming CPU app!

edit: what Richard said... global pfc are for GPU apps only - but there are several of those according to applications

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: CPU flops under anon

Message 80009 in response to message 80006

Quote:

CPU flops under anon are [unless supplied in app_info.xml] raw whetstone for CPU and some mystical figure for GPU. At some point it was just 10X CPU and I am seeing roughly 10x for Eve.

rsc_fpops_est for anon is a combination of APR and rsc_fpops_est iirc.

Yep, that's pretty much the way I remember it ( spaghetti, including assorted mysticism and voodoo :P). Once we get the cpu-misscalingout of the picture, pass 2 might either be focussed on smoothing/damping, GPU scale correction, or both. In any of those cases, I'll let what happens with CPU guide.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: ... global pfc are for

Message 80010 in response to message 80008

Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

I'm telling you it takes no fewer than nlogn operations to do an FFT ...
- AVX CPU + App tells me it did it in nlogn/3.3 . 'I'm Magic'
- Server stupidly sets pfc_scale to 1/3.3 'This magic app wins'
- Jason says 'hang on a minute, Boinc client uses broken inapplicable non-SIMD whetstone to calculate that... Ya canna defy the laws of physics, there are no CPU fairies or magic... It's using SIMD and doing up to 8 operations per cycle for an average of ~3.3x throughput' (scaling is about 50%)

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: ... global pfc

Message 80011 in response to message 80010

Quote:
Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: RE: ... global

Message 80012 in response to message 80011

Quote:
Quote:
Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.

I'm pretty sure the CPU apps will be claiming far fewer operations so be chosen as scale. Here is a bit special because of those aggregate GPU tasks, so how that's weaved in is another question ( i.e. treated as GPU Only? or 16x a CPU task ?).

With the current logic pure GPU only projects would likely grant from 3.3x-100x the credit, and initial estimates not be all that bad.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: ... global pfc

Message 80013 in response to message 80010

Quote:
Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )


Well, you're the one who's been walking the code, but let's standardise on one set of terminology.

Using *BOINC* terminology, the hierarchy is

[pre]Project
|__> Application
|__> App_version (each separate binary executable)[/pre]

The 'underclaiming SSE-AVX' *executables* are a suite within the BRP4 (CPU only) Application. That is different from, and will have a (if I understand you correctly) *different* pfc_scale from the suite of BRP4G (GPU only)

According to the CreditNew whitepaper, normalisation is applied across versions of each application, not across the project as a whole. So, in the specific case at point (BRP4 and BRP4G), the GPU app will not be scaled by CPU concerns, because there isn't a CPU executable within the GPU suite.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: RE: ... global

Message 80014 in response to message 80013

Quote:
Quote:
Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

Well, you're the one who's been walking the code, but let's standardise on one set of terminology.

Using *BOINC* terminology, the hierarchy is

[pre]Project
|__> Application
|__> App_version (each separate binary executable)[/pre]

The 'underclaiming SSE-AVX' *executables* are a suite within the BRP4 (CPU only) Application. That is different from, and will have a (if I understand you correctly) *different* pfc_scale from the suite of BRP4G (GPU only)

According to the CreditNew whitepaper, normalisation is applied across versions of each application, not across the project as a whole. So, in the specific case at point (BRP4 and BRP4G), the GPU app will not be scaled by CPU concerns, because there isn't a CPU executable within the GPU suite.

That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: RE: RE: ...

Message 80015 in response to message 80012

Quote:
Quote:
Quote:
Quote:
... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.

I'm pretty sure the CPU apps will be claiming far fewer operations so be chosen as scale. Here is a bit special because of those aggregate GPU tasks, so how that's weaved in is another question ( i.e. treated as GPU Only? or 16x a CPU task ?).

With the current logic pure GPU only projects would likely grant from 3.3x-100x the credit, and initial estimates not be all that bad.

JASON, CPU and GPU have different APPS here!

there is a BRP app and a BRPG app. thta's like MB and AP on SETI...

or are you saying pfc for MB and AP are the same? :P

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: JASON, CPU and GPU have

Message 80016 in response to message 80015

Quote:

JASON, CPU and GPU have different APPS here!

there is a BRP app and a BRPG app. thta's like MB and AP on SETI...

or are you saying pfc for MB and AP are the same? :P

No, and don't yell at me please ;)

I am saying that the BRP4G and 5G estimates come from CPU app estimates multiplied by 16 ---> different app & hardware, but estimates are linked.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: That's where it depends

Message 80017 in response to message 80014

Quote:
That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.

I think that one is for Bernd and/or Oliver to answer. Or find the code for it...

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: That's where it

Message 80018 in response to message 80017

Quote:
Quote:
That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.

I think that one is for Bernd and/or Oliver to answer. Or find the code for it...

That's what I asked for the customisations for. In either case, It's still broke ;)

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: JASON, CPU and

Message 80019 in response to message 80016

Quote:
Quote:

JASON, CPU and GPU have different APPS here!

there is a BRP app and a BRPG app. thta's like MB and AP on SETI...

or are you saying pfc for MB and AP are the same? :P

No, and don't yell at me please ;)


I'm female. If you don't listen I yell. At which point the voice becomes so highpitched, males go into automatic 'isn't she cute when she is upset' ignore mode :P Aren't sterotypes great?

Quote:
I am saying that the BRP4G and 5G estimates come from CPU app estimates multiplied by 16 ---> different app & hardware, but estimates are linked.


that might be an explanation for that unreasonable extra scaling we are seeing, indeed.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: In either case, It's

Message 80020 in response to message 80018

Quote:
In either case, It's still broke ;)


If It is broke, It needs to either work, win the lottery or find somebody that will keep It [I shudder to think what services It might supply in return].

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: That's where it depends

Message 80021 in response to message 80014

Quote:
That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.


The runtime estimate we see displayed in a BOINC client doesn't come - as an estimate - from the server. Instead, we get two different numbers from the server - job size and host speed - and the client does the math.

size --> rsc_fpops_est
speed -->

I have no doubt that Bernd, Oliver et al will have set the rsc_fpops_est for the GPU app at 16* the est for the CPU app. We can check that, because rsc_fpops_est is set by hand into the workunit template, and transmitted *unchanged* through the workunit generator (analog of SETI's 'splitter') and on to our computers. At this project - thankfully - all workunits of a given type will have the same rsc_fpops_est.

You saw 3-second estimates because the *speed* term in the formula was wrong. We need to track that down, but it's nothing to do with a task size estimate. And it corrected itself once there was a usable APR for the host, to substitute for the faulty initial estimate.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: That's where it

Message 80022 in response to message 80021

Quote:
Quote:
That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.

The runtime estimate we see displayed in a BOINC client doesn't come - as an estimate - from the server. Instead, we get two different numbers from the server - job size and host speed - and the client does the math.

size --> rsc_fpops_est
speed -->

I have no doubt that Bernd, Oliver et al will have set the rsc_fpops_est for the GPU app at 16* the est for the CPU app. We can check that, because rsc_fpops_est is set by hand into the workunit template, and transmitted *unchanged* through the workunit generator (analog of SETI's 'splitter') and on to our computers. At this project - thankfully - all workunits of a given type will have the same rsc_fpops_est.

You saw 3-second estimates because the *speed* term in the formula was wrong. We need to track that down, but it's nothing to do with a task size estimate. And it corrected itself once there was a usable APR for the host, to substitute for the faulty initial estimate.

Actually at least for anon platform the code says est and bound are being scaled by *something* ... checking what, and where non anon-platform sets those, is requiring some backtracking.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: RE: That's

Message 80023 in response to message 80022

Quote:
Quote:
Quote:
That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.

The runtime estimate we see displayed in a BOINC client doesn't come - as an estimate - from the server. Instead, we get two different numbers from the server - job size and host speed - and the client does the math.

size --> rsc_fpops_est
speed -->

I have no doubt that Bernd, Oliver et al will have set the rsc_fpops_est for the GPU app at 16* the est for the CPU app. We can check that, because rsc_fpops_est is set by hand into the workunit template, and transmitted *unchanged* through the workunit generator (analog of SETI's 'splitter') and on to our computers. At this project - thankfully - all workunits of a given type will have the same rsc_fpops_est.

You saw 3-second estimates because the *speed* term in the formula was wrong. We need to track that down, but it's nothing to do with a task size estimate. And it corrected itself once there was a usable APR for the host, to substitute for the faulty initial estimate.

Actually at least for anon platform the code says est and bound are being scaled by *something* ... checking what, and where non anon-platform sets those, is requiring some backtracking.


Completely agree for anon_plat. But we're going down the other fork here.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

...

Message 80024 in response to message 80023

...

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: Actually at least

Message 80025 in response to message 80023

Quote:
Quote:
Actually at least for anon platform the code says est and bound are being scaled by *something* ... checking what, and where non anon-platform sets those, is requiring some backtracking.

Completely agree for anon_plat. But we're going down the other fork here.

Yep we need both... looking for where the scheduler calls add_result_to_reply(), which is located in sched_send.cpp . If the preceeding raw estimates come straight from the WU generator (aka splitter as you say), and never get a scalem then there are further mysteries to track down further into the GPU portion of the expedition.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

For your info, my

Message 80026 in response to message 80025

For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

All Gamma-ray pulsar search #3 tasks for computer 8143

Claggy

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: Actually at least for

Message 80027 in response to message 80022

Quote:
Actually at least for anon platform the code says est and bound are being scaled by *something* ... checking what, and where non anon-platform sets those, is requiring some backtracking.

One of those somethings is APR, since APR cant be sent as under anon.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: RE: Actually at

Message 80028 in response to message 80025

Quote:
Quote:
Quote:
Actually at least for anon platform the code says est and bound are being scaled by *something* ... checking what, and where non anon-platform sets those, is requiring some backtracking.

Completely agree for anon_plat. But we're going down the other fork here.

Yep we need both... looking for where the scheduler calls add_result_to_reply(), which is located in sched_send.cpp . If the preceeding raw estimates come straight from the WU generator (aka splitter as you say), and never get a scalem then there are further mysteries to track down further into the GPU portion of the expedition.


Let's focus on the NON-anon_plat case for now. That's the one in widest use across the BOINC community.

We can see the process in action in the server logs in this project. Taking my latest for reference:

2014-06-17 15:50:45.2751 [PID=29665]    [version] looking for version of einsteinbinary_BRP4G
2014-06-17 15:50:45.2751 [PID=29665]    [version] Checking plan class 'BRP4G-opencl-ati'
2014-06-17 15:50:45.2751 [PID=29665]    [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.480000
2014-06-17 15:50:45.2751 [PID=29665]    [version] plan_class_spec: No AMD GPUs found
2014-06-17 15:50:45.2751 [PID=29665]    [version] [AV#721] app_plan() returned false
2014-06-17 15:50:45.2751 [PID=29665]    [version] Checking plan class 'BRP4G-cuda32'
2014-06-17 15:50:45.2751 [PID=29665]    [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.480000
2014-06-17 15:50:45.2752 [PID=29665]    [version] plan_class_spec: driver version required max: 29053, supplied: 33788
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#723] app_plan() returned false
2014-06-17 15:50:45.2752 [PID=29665]    [version] Checking plan class 'BRP4G-cuda32-nv301'
2014-06-17 15:50:45.2752 [PID=29665]    [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.480000
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#716] (BRP4G-cuda32-nv301) setting projected flops based on host elapsed time avg: 69.17G
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#716] (BRP4G-cuda32-nv301) comparison pfc: 95.26G  et: 69.17G
2014-06-17 15:50:45.2752 [PID=29665]    [version] Best app version is now AV716 (70.26 GFLOP)
2014-06-17 15:50:45.2752 [PID=29665]    [version] Checking plan class 'BRP4G-opencl-ati'
2014-06-17 15:50:45.2752 [PID=29665]    [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.480000
2014-06-17 15:50:45.2752 [PID=29665]    [version] plan_class_spec: No AMD GPUs found
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#720] app_plan() returned false
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#716] (BRP4G-cuda32-nv301) setting projected flops based on host elapsed time avg: 69.17G
2014-06-17 15:50:45.2752 [PID=29665]    [version] [AV#716] (BRP4G-cuda32-nv301) comparison pfc: 95.26G  et: 69.17G
2014-06-17 15:50:45.2752 [PID=29665]    [version] Best version of app einsteinbinary_BRP4G is [AV#716] (69.17 GFLOPS)
2014-06-17 15:50:45.2753 [PID=29665]    [send] est delay 0, skipping deadline check
2014-06-17 15:50:45.2753 [PID=29665]    [version] get_app_version(): getting app version for WU#620860 (p2030.20131124.G176.58-00.38.S.b6s0g0.00000_1008) appid:29
2014-06-17 15:50:45.2753 [PID=29665]    [version] returning cached version: [AV#716]
2014-06-17 15:50:45.2753 [PID=29665]    [send] est delay 0, skipping deadline check
2014-06-17 15:50:45.2800 [PID=29665]    [send] Sending app_version einsteinbinary_BRP4G 2 133 BRP4G-cuda32-nv301; projected 69.17 GFLOPS
2014-06-17 15:50:45.2801 [PID=29665]    [send] est. duration for WU 620860: unscaled 4048.01 scaled 4050.90
2014-06-17 15:50:45.2801 [PID=29665]    [send] [HOST#5367] sending [RESULT#1493785 p2030.20131124.G176.58-00.38.S.b6s0g0.00000_1008_0] (est. dur. 4050.90s (1h07m30s90)) (max time 80960.18s (22h29m20s17))


In my case, the main estimation variable is

setting projected flops based on host elapsed time avg: 69.17G

I see something new in there, too:

comparison pfc: 95.26G et: 69.17G

Later on, the server does it's own version of the runtime estimate: that's purely to check that it's not sending more work than the host requested. There's a little more scaling at that stage, but to account for hosts which don't run 24/7. My correction is tiny, but Eyrie would see a big rescale for restricted BOINC availability.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: For your info, my

Message 80029 in response to message 80026

Quote:

For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

Claggy

whetstone, Flops and rsc_fpops_est for GPu and CPU?

edit: 'please' - sorry ::)

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: In my case, the main

Message 80030 in response to message 80028

Quote:


In my case, the main estimation variable is

setting projected flops based on host elapsed time avg: 69.17G

I see something new in there, too:

comparison pfc: 95.26G et: 69.17G

Later on, the server does it's own version of the runtime estimate: that's purely to check that it's not sending more work than the host requested. There's a little more scaling at that stage, but to account for hosts which don't run 24/7. My correction is tiny, but Eyrie would see a big rescale for restricted BOINC availability.

I've seen that annotation before, somewhere.
rr_sim I think - can you look at a sample please, to check local boinc log against server values?

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: In my case, the

Message 80031 in response to message 80030

Quote:
Quote:


In my case, the main estimation variable is

setting projected flops based on host elapsed time avg: 69.17G

I see something new in there, too:

comparison pfc: 95.26G et: 69.17G

Later on, the server does it's own version of the runtime estimate: that's purely to check that it's not sending more work than the host requested. There's a little more scaling at that stage, but to account for hosts which don't run 24/7. My correction is tiny, but Eyrie would see a big rescale for restricted BOINC availability.

I've seen that annotation before, somewhere.
rr_sim I think - can you look at a sample please, to check local boinc log against server values?


17/06/2014 18:08:02 | | [rr_sim] start: work_buf min 25920 additional 3456 total 29376 on_frac 0.999 active_frac 1.000
17/06/2014 18:08:02 | Albert@Home | [rr_sim] 0.00: p2030.20130202.G202.32-01.96.N.b1s0g0.00000_3552_3 finishes (17271.46G/69.12G)
17/06/2014 18:08:02 | Albert@Home | [rr_sim] 2808.68: p2030.20131124.G176.58-00.38.S.b5s0g0.00000_3024_2 finishes (240866.14G/69.12G)
17/06/2014 18:08:02 | Albert@Home | [rr_sim] 3551.29: p2030.20131124.G176.58-00.38.S.b5s0g0.00000_2912_2 finishes (280000.00G/69.12G)
17/06/2014 18:08:02 | Albert@Home | [rr_sim] 4293.89: p2030.20131124.G176.30-00.82.S.b4s0g0.00000_3040_2 finishes (280000.00G/69.12G)

The first two must be running tasks, partially completed. But the next two show the same 280 e12 I posted earlier as rsc_fpops_est here, divided by the familiar speed from APR.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

RE: RE: For your info, my

Message 80032 in response to message 80029

Quote:
Quote:

For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

Claggy

whetstone, Flops and rsc_fpops_est for GPu and CPU?

edit: 'please' - sorry ::)

CPU p_fpops is 4514900817.923695

HD7770 peak_flops is 3584000000000.000000

flops for the CPU app_version of hsgamma_FGRP3 is 845960315.482654

flops for the ATI GPU app_version of hsgamma_FGRP3 is 2950327174499.708000

rsc_fpops_est is 15000000000000.000000, with rsc_fpops_bound at 300000000000000.000000

With an Gamma-ray pulsar search #3 only request I got:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

2014-06-17 17:18:23.1994 [PID=2155 ] [send] CPU: req 8330.13 sec, 0.00 instances; est delay 0.00
2014-06-17 17:18:23.1995 [PID=2155 ] [send] AMD/ATI GPU: req 8692.21 sec, 0.00 instances; est delay 0.00
2014-06-17 17:18:23.1995 [PID=2155 ] [send] work_req_seconds: 8330.13 secs
2014-06-17 17:18:23.1995 [PID=2155 ] [send] available disk 95.78 GB, work_buf_min 95040
2014-06-17 17:18:23.1995 [PID=2155 ] [send] on_frac 0.923624 active_frac 0.985800 gpu_active_frac 0.984082
2014-06-17 17:18:23.1995 [PID=2155 ] [send] CPU features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx tm2 pbe
2014-06-17 17:18:23.3103 [PID=2155 ] [mixed] sending locality work first

2014-06-17 17:18:23.3223 [PID=2155 ] [version] get_app_version(): getting app version for WU#604131 (LATeah0109C_32.0_0_-1.48e-10) appid:30
2014-06-17 17:18:23.3223 [PID=2155 ] [version] looking for version of hsgamma_FGRP3
2014-06-17 17:18:23.3224 [PID=2155 ] [version] Checking plan class 'FGRPopencl-ati'
2014-06-17 17:18:23.3234 [PID=2155 ] [version] reading plan classes from file '/BOINC/projects/AlbertAtHome/plan_class_spec.xml'
2014-06-17 17:18:23.3234 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3234 [PID=2155 ] [version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G
2014-06-17 17:18:23.3234 [PID=2155 ] [version] Best app version is now AV911 (85.84 GFLOP)
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Checking plan class 'FGRPopencl-intel_gpu'
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [version] No Intel GPUs found
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#912] app_plan() returned false
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Checking plan class 'FGRPopencl-nvidia'
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: No NVIDIA GPUs found
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#925] app_plan() returned false
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Best version of app hsgamma_FGRP3 is [AV#911] (2950.33 GFLOPS)
2014-06-17 17:18:23.3236 [PID=2155 ] [send] est delay 0, skipping deadline check
2014-06-17 17:18:23.3264 [PID=2155 ] [send] Sending app_version hsgamma_FGRP3 7 111 FGRPopencl-ati; projected 2950.33 GFLOPS
2014-06-17 17:18:23.3265 [PID=2155 ] [CRITICAL] No filename found in [WU#604131 LATeah0109C_32.0_0_-1.48e-10]
2014-06-17 17:18:23.3265 [PID=2155 ] [send] est. duration for WU 604131: unscaled 5.08 scaled 5.59
2014-06-17 17:18:23.3265 [PID=2155 ] [send] [HOST#8143] sending [RESULT#1450173 LATeah0109C_32.0_0_-1.48e-10_1] (est. dur. 5.59s (0h00m05s59)) (max time 101.68s (0h01m41s68))
2014-06-17 17:18:23.3291 [PID=2155 ] [locality] send_old_work(LATeah0109C_32.0_0_-1.48e-10_1) sent result created 344.0 hours ago [RESULT#1450173]
2014-06-17 17:18:23.3291 [PID=2155 ] [locality] Note: sent NON-LOCALITY result LATeah0109C_32.0_0_-1.48e-10_1
2014-06-17 17:18:23.3292 [PID=2155 ] [locality] send_results_for_file(h1_0997.00_S6Direct)
2014-06-17 17:18:23.3365 [PID=2155 ] [locality] in_send_results_for_file(h1_0997.00_S6Direct, 0) prev_result.id=1488887

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: RE: For your

Message 80033 in response to message 80032

Quote:
Quote:
Quote:

For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

Claggy

whetstone, Flops and rsc_fpops_est for GPu and CPU?

edit: 'please' - sorry ::)

CPU p_fpops is 4514900817.923695

HD7770 peak_flops is 3584000000000.000000

flops for the CPU app_version of hsgamma_FGRP3 is 845960315.482654

flops for the ATI GPU app_version of hsgamma_FGRP3 is 2950327174499.708000

rsc_fpops_est is 15000000000000.000000, with rsc_fpops_bound at 300000000000000.000000

Claggy


Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: I've seen that

Message 80034 in response to message 80030

Quote:
I've seen that annotation before, somewhere.
rr_sim I think - can you look at a sample please, to check local boinc log against server values?

yes, were were there the other day digging out where whetstone was hiding. sched_version.cpp, estimate_flops() functions. That one for non- anon, and another slightly different for anon. For non-anon, Before statistics are gathered it's Boinc Whetstone for CPU (incidentally SIMD aware oin Android but not x86), and some mystery guesstimate for GPUs

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: I've seen that

Message 80035 in response to message 80034

Quote:
Quote:
I've seen that annotation before, somewhere.
rr_sim I think - can you look at a sample please, to check local boinc log against server values?

yes, were were there the other day digging out where whetstone was hiding. sched_version.cpp, estimate_flops() functions. That one for non- anon, and another slightly different for anon. For non-anon, Before statistics are gathered it's Boinc Whetstone for CPU (incidentally SIMD aware oin Android but not x86), and some mystery guesstimate for GPUs


Those mystery guesstimates for GPUs are one of the major quarries for our quest.

Claggy's ATI is running at 2.95 Teraflops, to put it in simpler numbers.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

RE: Unfortunately I missed

Message 80036 in response to message 80033

Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?


I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: RE: I've seen

Message 80037 in response to message 80035

Quote:
Quote:
Quote:
I've seen that annotation before, somewhere.
rr_sim I think - can you look at a sample please, to check local boinc log against server values?

yes, were were there the other day digging out where whetstone was hiding. sched_version.cpp, estimate_flops() functions. That one for non- anon, and another slightly different for anon. For non-anon, Before statistics are gathered it's Boinc Whetstone for CPU (incidentally SIMD aware oin Android but not x86), and some mystery guesstimate for GPUs


Those mystery guesstimates for GPUs are one of the major quarries for our quest.

Claggy's ATI is running at 2.95 Teraflops, to put it in simpler numbers.

Yep. Also be aware in that area, just to complicate matters, that there is a scheduler config option David's thrown in, enabling a random multiplier across the project_flops for each app_version, so that app versions get juggled at least before stats are gathered.

I'm getting the distinct impression he's 'lost' the old 0.1 GPU flops scaling there (haven't come across it yet anyway, still looking), meaning that'll probably be using the raw client supplied marketing flops value, possibly by some random number...

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: Unfortunately I

Message 80038 in response to message 80036

Quote:
Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: RE: RE: Unfortunate

Message 80039 in response to message 80038

Quote:
Quote:
Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G

That's not TeraFlops (speed), That's peak flop count, as in # of operations.

(verifying in code now)

*scratch that* looks broken, walking the lot with beer

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

RE: RE: RE: RE: Unfor

Message 80040 in response to message 80039

Quote:
Quote:
Quote:
Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G

That's not TeraFlops (speed), That's peak flop count, as in # of operations.

(verifying in code now)

*scratch that* looks broken, walking the lot with beer


The server is using it as a speed for estimation purposes. Maybe that's our problem.

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: *scratch that* looks

Message 80041 in response to message 80039

Quote:

*scratch that* looks broken, walking the lot with beer

peanut gallery: that's like saying that water is wet after falling in andd getting soaked...
Enjoy the beer. Valium might be the better choice.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

RE: RE: RE: RE: Unfor

Message 80042 in response to message 80039

Quote:
Quote:
Quote:
Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G

That's not TeraFlops (speed), That's peak flop count, as in # of operations.

(verifying in code now)

*scratch that* looks broken, walking the lot with beer

Boinc startup says:

17/06/2014 18:17:17 | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 1024MB, 984MB available, 3584 GFLOPS peak)
17/06/2014 18:17:17 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1348.5 (VM), device version OpenCL 1.2 AMD-APP (1348.5), 1024MB, 984MB available, 3584 GFLOPS peak)
17/06/2014 18:17:17 | | OpenCL CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.5 (sse2,avx), device version OpenCL 1.2 AMD-APP (1348.5))

The GTX460 always had a lot lower GFLOPS peak value, but was a lot more effective at Seti v6, v7 and AP v6, the exception being here, and the OpenCL Gamma-ray pulsar search #3 1.07 app, where the HD7770 was a little faster:

https://albert.phys.uwm.edu/host_app_versions.php?hostid=8143

Quote:

Gamma-ray pulsar search #3 1.07 windows_x86_64 (FGRPopencl-ati)
Number of tasks completed 13
Max tasks per day 45
Number of tasks today 0
Consecutive valid tasks 13
Average processing rate 3.55 GFLOPS
Average turnaround time 0.37 days

Gamma-ray pulsar search #3 1.07 windows_x86_64 (FGRPopencl-nvidia)
Number of tasks completed 12
Max tasks per day 44
Number of tasks today 0
Consecutive valid tasks 12
Average processing rate 2.87 GFLOPS
Average turnaround time 0.88 days

http://boinc.berkeley.edu/dev/forum_thread.php?id=8767&postid=51659

04/12/2013 21:25:07 | | CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version 331.58, CUDA version 6.0, compute capability 2.1, 1024MB, 854MB available, 1075 GFLOPS peak)
04/12/2013 21:25:07 | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 1024MB, 984MB available, 3584 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 331.58, device version OpenCL 1.1 CUDA, 1024MB, 854MB available, 1075 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1348.4 (VM), device version OpenCL 1.2 AMD-APP (1348.4), 1024MB, 984MB available, 3584 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.4 (sse2,avx), device version OpenCL 1.2 AMD-APP (1348.4))

Claggy

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

RE: RE: RE: RE: Quote

Message 80043 in response to message 80040

Quote:
Quote:
Quote:
Quote:
Quote:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G

That's not TeraFlops (speed), That's peak flop count, as in # of operations.

(verifying in code now)

*scratch that* looks broken, walking the lot with beer


The server is using it as a speed for estimation purposes. Maybe that's our problem.


of course it;s speed, it's APR later - 'based on' is our problem - something is being factored in incorrectly. AFAIK on SETI there's no such gross overestimation of GPU speed.

@ Claggy what is the peak flop count for that card? (sorry if you posted that aready)

edit: ta.

peak flops x pfc_ave ? the latter being <1 ?

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

yes, this is bizarre: once

Message 80044 in response to message 80043

yes, this is bizarre:

once stats are gathered:

Quote:
[pre]if (av.pfc.n > MIN_VERSION_SAMPLES) {
hu.projected_flops = hu.peak_flops/av.pfc.get_avg();
if (config.debug_version_select) {
log_messages.printf(MSG_NORMAL,
"[version] [AV#%d] (%s) adjusting projected flops based on PFC avg: %.2fG\n",
av.id, av.plan_class, hu.projected_flops/1e9
);
}[/pre]

Dodgy average aside (which we know all about the problems of sampled averages there, particularly with very few samples), looks like ratio of marketing flops estimate (from client) to operations (effective claimed)

Going to check if he's tweaked the definition of pfc here, because flops rate over average operations would give average time in seconds to me... chgecking that pfc with that beer...

[Edit:] no sign of our 0.1x scaling for GPU either, at least in albert code.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 143
Credit: 5409572
RAC: 0

Jason, with the high-scoring

Jason, with the high-scoring late validations, your average is now above par, at 1003.97

And your median is higher still, at 1168.97

I didn't want to spam the boards with my stats - just milestone theads - but apparently signatures are no longer optional. Follow the link if you're interested.

http://www.boincsynergy.com/images/stats/comb-3475.jpg

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 48
Credit: 2410
RAC: 0

Ok, so it is effectively

Ok, so it is effectively using a scaled (marketing) peak flops value - iow a totally unrealistic estimate.

We do need something as a starting point though. Those peak flops are as inadequate as using 10X CPU speed was.

Eve comes in at 91e9 peak flops. From SETI (too small to run here) her GPU is slightly faster than her CPU. CPU needs ~2h for BRP. So roughly the GPU tasks would take 32 hours. That makes her about 32x slower than a 780 - that's the span we are dealing with and it will only grow larger as GPUs get ever faster.

91*32 = 2912 - which is about the figure we saw earlier for fast GPUs - so the slope of the peak flops is not too bad, but the offset is. With an APR of 33 for the 780 and about 1 for Eve we are looking at a ~90x overestimate. For BRP at least.

that scaling value that is being applied must bring the estimates into the correct magnitude over on seti...
any chance to get that number from Eric?

I don't know. If you underestimate the speed, you cache too few tasks - more frequent top up - only a problem if you really can't connect for longer periods of time as you'd run dry (not really a problem either ;) ).

It's the overestimation that runs afoul of the built-in safety-checks.

So how about using 1/100 of peak flops as a GPU starting point? I mean you have to start _somewhere_ ...

Any problems with underestimating I've failed to consider?

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

jason_gee
jason_gee
Joined: 4 Jun 14
Posts: 111
Credit: 1043639
RAC: 0

RE: Jason, with the

Message 80047 in response to message 80045

Quote:

Jason, with the high-scoring late validations, your average is now above par, at 1003.97

And your median is higher still, at 1168.97

good. better late than never :D

Yes we'll definitely need to stabilise CPU here first. GPU is going to take a bit more digging yet, and whether or not there is any connection at estimate, scheduler or validation determined before that one's tackled in detail

There are definitely those dicey averages in play (everywhere) to start with, then also I'm surprised to be finding reliance on those (nearly useless) GPU marketing flops figures embedded even after stats are gathered. Until the primary CPU scales are fixed, and averages for all kinds are replaced with damped values, any particular odd logic choice in there is likely to be obliterated in the noise anyway. (Paraphrasing the comments about chaos burying the noise, lol )

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage