I am getting computation error

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Oliver Bock wrote:Claggy

Message 80535 in response to message 80528

Oliver Bock wrote:

Claggy wrote:

According to a stderr at Seti your host has two GPUs, through Boinc is reporting the most capable to the server:

[...]

I wonder is there is an api Bug present in the BRP apps where GPUs and OpenCL platforms aren't aligned correctly,

this api Bug fix might be related (Albert GPU apps predate it):

http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=328d87be2625ce68a04b0d1caf29e3826eab25f3

I don't think that this bug is related to the symptoms we see here, at least if Charlie describes the conditions that trigger it correctly (see commit message):

  • The host in question has only one OpenCL platform (if your log excerpt from SETI is correct)
  • The host has only GPUs from one vendor

I'll look into the problem...

Oliver

Raistmer's stder.txt of OpenCL devices only reports GPU devices, not CPU devices,

Boinc should be reporting (on it's startup in the Event Log) CPU OpenCL support from the AMD drivers installed, and is unlikely is report CPU OpenCL support from Intel drivers (AMD CPU).

 

I've been looking at:

http://boinc.berkeley.edu/trac/wiki/GPUApp

OpenCL Sample Notes

The openclapp sample calls boinc_get_opencl_ids(), which reads an init_data.xml file to determine which GPU to use. Actual project applications must also do this. Please see the ReadMe.txt file for more information.

This same sample is designed to run with AMD, NVIDIA and Intel Ivy Bridge GPUs. It is supplied with 3 minimal init_data.xml files, one for each of these 3 vendors (GPU "types".) Copy the appropriate init_data.xml file into the directory containing the openclapp executable.

 

This seems to imply you should send a init_data.xml suitable for each vendors device.

 

http://boinc.berkeley.edu/trac/wiki/OpenclApps

Application requirements

Your application must call

int boinc_get_opencl_ids(int argc, char** argv, int type, cl_device_id* device, cl_platform_id* platform);

This returns the OpenCL platform and device IDs for the GPU that your app should use. Pass the argc and argv your application receives from the BOINC client. The third argument type should specify the vendor of the desired GPU and can be one of the following:

Symbolvalue
PROC_TYPE_NVIDIA_GPU1
PROC_TYPE_AMD_GPU2
PROC_TYPE_INTEL_GPU3

With BOINC Clients version 7.0.12 or later, the first 3 arguments will be ignored and all data will be taken from the init_data.xml file in the slot directory. The first 3 arguments allow this to work with older BOINC Clients. If your OpenCL app can use OpenCL-capable GPUs from any vendor, you can pass 0 for the third argument (type); if you pass a type value of 0, the type will be taken from the gpu_type field of the init_data.xml file on newer clients, but will return an error code of CL_INVALID_DEVICE_TYPE on older clients.

This function is in the library boinc/api/libboinc_opencl.a (on Macs: boinc/mac_build/build/Deployment/libboinc_opencl.a). As an alternative to linking the library, you can add the file boinc/api/libboinc_opencl.cpp to your source files when building your project application.

 

This implies that if your app can run on a GPU from any vendor, and you set the type as 0, as opposed to, 1, 2 or 3, then the type is taken from the what actually in the init_data.xml, is the Wrong init_data.xml being supplied?

 

http://boinc.berkeley.edu/trac/wiki/ProjectPlan

Part of the Boinc project plan is:

 

Update boinc_get_opencl_ids() API code

  • Special value for type arg to mean "any vendor" (return error on older clients which don't have aid.gpu_type in init_data.xml file.)
  • what should it do if type arg does not match aid.gpu_type from init_data.xml file?
  • compatibility with <coproc> option in cc_config.xml?

 

Claggy

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 320
Credit: 8545955
RAC: 0

Claggy wrote:This seems to

Message 80536 in response to message 80535

Claggy wrote:

This seems to imply you should send a init_data.xml suitable for each vendors device.

That's not the case: init_data.xml is written by the client to communicate details to the app (which uses the BOINC API to access it).

Claggy wrote:

is the Wrong init_data.xml being supplied?

N/A, see above. 

 

The problem is clearly stated in the failing tasks STDERR output:

[09:27:06][8096][INFO ] Starting data processing...
GPU type not found in init_data.xml
[09:27:06][8096][ERROR] Failed to get OpenCL platform/device info from BOINC (error: -161)! [09:27:06][8096][ERROR] Demodulation failed (error: -161)!

@Tullio: I need a copy of an affected init_data.xml file. You'll find them underneath the slots directory of the BOINC data directory. Please make sure that the particular slot is used to run one of the failing tasks.

PLEASE DON'T PASTE THE FILE HERE (CONTAINS CONFIDENTIAL DATA)! SEND IT TO ME VIA PRIVATE MESSAGE! THANKS Cool

Thanks,
Oliver 

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 320
Credit: 8545955
RAC: 0

FYI, we deployed a different

FYI, we deployed a different scheduler on the 16th which was definitely flawed and we have an idea why you receive work you never requested. I reverted that one and the original one is in place again. This will likely fix your problem.

@Tullio: please run an update/work-request in both of your machines and report back.

Summary (to avoid confusion):

There are two issues at hand

  1. The scheduler sends types of tasks not requested (here: Intel GPU): both computers affected (wait for Tullio's work requests)
  2. The (correctly sent) ATI GPU tasks fail to run because of a seemingly flawed init_data.xml (wait for Tullio's PM)

 

Best,
Oliver 

tullio
tullio
Joined: 22 Jan 05
Posts: 53
Credit: 137342
RAC: 0

I am getting errors in binary

I am getting errors in binary pulsar radio search in Albert@home 1.40 while I do not get errors in Einstein@home with 1.39 on the same PC with Windows 8.1 and two ATI/AMD graphic processors, one on board called Devastator (8670D) and one discrete called Caico (HD 8470). Albert@home and Einstein@Qhome seem to see only the first while SETI@home sees both.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 320
Credit: 8545955
RAC: 0

Tullio, I do understand your

Tullio, I do understand your original problem description (see my summary above). What I was trying to say is:

  • We think we fixed the reason for problem 1, so I need you to try again and report back the result. Your hosts haven't contacted the server since we deployed the fix, please click on "Update" in the BOINC Manager.
  • We need a copy of an affected init_data.xml file in order to help you with problem 2. Please PM that file to me.

Thanks,
Oliver 

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Oliver Bock wrote:We need a

Message 80541 in response to message 80540

Oliver Bock wrote:

  • We need a copy of an affected init_data.xml file in order to help you with problem 2. Please PM that file to me.

I think there's a problem with that, When Boinc starts a task it'll populate a slot, when it finishes/errors etc, it'll empty the slot, (Even with Network suspended), I just tried it, two tasks running on my E8500/9800GTX+, one pending upload, only two slots.

Tullio's tasks are erroring between 1.5 and 2.75 secs after start, So he's not got much time to get the init_data.xml before the slot ceases to exist.

He might be able to the Activity menu to suspend GPU usage quick enough to be able to be able to recover an init_data.xml before it errors, if he's lucky.

Claggy

 

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 320
Credit: 8545955
RAC: 0

That's right unfortunately.

That's right unfortunately. One can, however, pass --exit_before_start to the BOINC client such that it will prepare the slot directory but exit right away. This way the init_data.xml file we be retained. Of course this means the core client has to be started manually, not automatically via the BOINC Manager or as a service.

HTH,
Oliver

Claggy
Claggy
Joined: 29 Dec 06
Posts: 122
Credit: 4040969
RAC: 0

Oliver Bock wrote:That's

Message 80543 in response to message 80542

Oliver Bock wrote:

That's right unfortunately. One can, however, pass --exit_before_start to the BOINC client such that it will prepare the slot directory but exit right away. This way the init_data.xml file we be retained. Of course this means the core client has to be started manually, not automatically via the BOINC Manager or as a service.

HTH,
Oliver

I tried putting --exit_before_start in a shortcut, But couldn't get it to work, the app started and kept running, i suspended usage via the task bar, suspended the tasks that had been running and tried fresh tasks, the same.

Instead i tried --exit_after_app_start 1

That did work, Boinc started an app ran it for a second, then Boinc exited.

Tullio will have to make a shortcut to boinc.exe and put in the Target Box the following: "C:\Program Files\BOINC\boinc.exe" --exit_after_app_start 1

He'll have to Suspend GPU usage via the task bar, grab some GPU work from Albert, let it download, exit Boinc, double click his new shortcut, wait for Boinc.exe to start in the command window, then exit, then navigate to the slot directories:

C:\ProgramData\BOINC\slots

and search each slot for the einsteinbinary_BRP5_1.40_windows_intelx86__BRP5-opencl-ati.exe app, once he's found the right slot he can PM the init_data.xml

Claggy

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 320
Credit: 8545955
RAC: 0

Yep, thanks for the

Yep, thanks for the details.

@Tullio: please try to follow the steps Claggy just described.

Oliver

 

Yacob
Yacob
Joined: 8 Dec 14
Posts: 5
Credit: 233336
RAC: 0

Dear all,several of my

Dear all,

several of my computers are giving errors of different types constantly. I had to suspend work for this project while I see what's going on:

Computer 12403: example task 1777793

I am not sure but I guess that I should apply the solution described by Claggy:

 --exit_after_app_start 1, am I right? 

 

Computer 12382: example task 1779193

For this one, a NVidia Quadro FX 3450 (on Ubuntu 12.04) which is not detected by BOINC, I guess I should just remove it. I cannot find the applications available for the project, but I guess they are only for GPU, so there is no point in keeping this computer if the GPU is not recognized/not valid. Am I right?

 

Computer 12620: example task 1780214

And about the 3rd one, I don't know what the hell is going on :)

 

1st and 3rd one work well with GPU projects like Milkyway, Einstein, SETI, Collatz and others.

Please, give me some clues.

Thanks

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.