Skip to content

thrust::system::system_error: parallel_for failed: cudaErrorInvalidValue during simulation execution #12

@joker-li717

Description

@joker-li717

Hello,

I am encountering an issue while running the simulation on my system with the following specifications:

Operating System: Ubuntu 18.04

CUDA: 10.2

GCC: 7.5.0

GPU: NVIDIA TITAN RTX

Simulation Parameters:

verbose level: 3

NPART: 262144

NRAND: 262144

NTHREAD_PER_BLOCK: 256

Number of devices detected: 2

Using device: 0 (NVIDIA TITAN RTX)

The simulation runs normally until the GPU memory is initialized for the chemical stage. The error occurs right after copying the data to the GPU for the chemical stage, where the following error is triggered:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorInvalidValue: invalid argument

Here’s a summary of the relevant logs leading to the error:
verbose is 3
NPART is 262144
NRAND is 262144
NTHREAD_PER_BLOCK is 256
trying to use device 0
set device 0 success
Number of device: 2
Using device #: 0
Major revision number: 7
Minor revision number: 5
Name: NVIDIA TITAN RTX
Total global memory: 24212.38 MB
Total shared memory per block: 48.00 kB
Total registers per block: 65536
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 1024
Maximum dimension of block: 1024102464
Maximum dimension of grid: 21474836476553565535
Clock rate: 1.77 GHz
Total constant memory: 64.00 kB
Texture alignment: 512
Concurrent copy and execution: Yes
Number of multiprocessors: 72
Kernel execution timeout: Yes

Start initialize random numbers
the first 5 random seeds are 1307079635
the first 5 random seeds are 3812457767
the first 5 random seeds are 93795793
the first 5 random seeds are 1175666032
the first 5 random seeds are 513151060
Initializing Physical Stage

Test world 1 0.000000 0.000000 0.000000 0.010000 0.010000 0.010000

Reading ./tables/physics/totalProtonCS3.dat
load data ./tables/physics/totalProtonCS3.dat to GPU ok
Reading ./tables/physics/DACS.txt
Reading ./tables/physics/ionizationCS.txt
Reading ./tables/physics/excitationCS.txt
Reading ./tables/physics/elasticCS.txt
Reading data and loading successfully!
finish writing source position
Total memory 24212 MB, estimated required memory 1061 MBTotal incident particles = 1 in batchs = 1
particles per batch is 1
Estimated Max. batch radical states = 16008000
Estimated Max. batch 2nd particles = 10005000
rm: 无法删除'./output/events.dat': 没有那个文件或目录
sim_num is 1
physicsPType is 1
sorting by thrust
particle.parentId = -1 particle.e is 200000000.000000
second_num is 407
where whereall are 433 433
sorting by thrust
second_num is 394
where whereall are 900 1333
sorting by thrust
second_num is 62
where whereall are 487 1820
sorting by thrust
second_num is 3
where whereall are 66 1886
sorting by thrust
second_num is 1
where whereall are 4 1890
sorting by thrust
second_num is 0
where whereall are 1 1891
total Time is 1173.557861 ms

ROI shape and size 1 5500.000000 5500.000000 5500.000000

In total
elec 749
ionize 867
a1b1 78
b1a1 24
rd 56
dis 117
The file 'deposit.txt' doesn't exist, will be created and initialized as 0 0
total particle simulated is 1, total energy deposited in ROI is 0.000774 MeV, total energy deposited in world is 0.018304 MeV

loading ./tables/prechem/branchInfo_prechem_org.txt
Information is listed in the following
number of branches
6
type 0 has 2 products: 3 1
type 1 has 2 products: 1 2
type 2 has 0 products:
type 3 has 3 products: 3 1 0
type 4 has 3 products: 4 1 1
type 5 has 3 products: 4 1 5
Brach types information for recombined electrons
Branch 1 prob 0.550000
Branch 4 prob 0.150000
Branch 2 prob 0.300000

loading ./tables/prechem/thermRecombInfo_prechem.txt
Information is listed in the following
There are 1481 entries for thermolizing electrons
start=0, end=37820the total number of initial reactant is 1891
i = 0/1891
idx_elec = 749, idx_wi = 867, idx_we_a1b1 = 78, idx_we_b1a1 = 24, idx_we_rd = 56, idx_w_dis = 117
After removing, numCurPar = 3024
File ./tables/chem/RadiolyticSpecies.txt was read as the following
In total 11 species
Species e- diffusion rate 0.004900 nm^2/ps nominal reaction radius 0.164000 nm
Species .OH diffusion rate 0.002800 nm^2/ps nominal reaction radius 0.130000 nm
Species .H diffusion rate 0.007000 nm^2/ps nominal reaction radius 0.205000 nm
Species H3O+ diffusion rate 0.009000 nm^2/ps nominal reaction radius 0.232000 nm
Species H2 diffusion rate 0.004800 nm^2/ps nominal reaction radius 0.173000 nm
Species OH- diffusion rate 0.005000 nm^2/ps nominal reaction radius 0.173000 nm
Species H2O2 diffusion rate 0.002300 nm^2/ps nominal reaction radius 0.115000 nm
Species O2 diffusion rate 0.002400 nm^2/ps nominal reaction radius 0.170000 nm
Species .HO2 diffusion rate 0.002300 nm^2/ps nominal reaction radius 0.210000 nm
Species O2- diffusion rate 0.001750 nm^2/ps nominal reaction radius 0.220000 nm
Species HO2- diffusion rate 0.001400 nm^2/ps nominal reaction radius 0.250000 nm

Reaction list is reorganized as the following:
Number of reactions list in the file: 21
Number of reactants for each reaction:
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
type of reactants for each reaction:
2 0
2 1
2 2
4 1
6 0
3 0
3 5
1 0
1 1
0 0
2 7
0 7
2 9
1 8
1 9
1 10
2 8
0 8
0 9
3 9
3 10
Number of new particles for each reaction:
2 0 1 1 2 1 0 1 1 3 1 1 1 1 2 2 1 1 3 1 1
type of new particles for each reaction:
particle type: 0, number of potential reactants for this particle type: 8
particle type of these potential reactants:
2 6 3 1 0 7 8 9
reaction type corresponds to these potential reactants:

particle type: 1, number of potential reactants for this particle type: 7
particle type of these potential reactants:
2 4 0 1 8 9 10
reaction type corresponds to these potential reactants:

particle type: 2, number of potential reactants for this particle type: 6
particle type of these potential reactants:
0 1 2 7 9 8
reaction type corresponds to these potential reactants:

particle type: 3, number of potential reactants for this particle type: 4
particle type of these potential reactants:
0 5 9 10
reaction type corresponds to these potential reactants:

particle type: 4, number of potential reactants for this particle type: 1
particle type of these potential reactants:
1
reaction type corresponds to these potential reactants:

particle type: 5, number of potential reactants for this particle type: 1
particle type of these potential reactants:
3
reaction type corresponds to these potential reactants:

particle type: 6, number of potential reactants for this particle type: 1
particle type of these potential reactants:
0
reaction type corresponds to these potential reactants:

particle type: 7, number of potential reactants for this particle type: 2
particle type of these potential reactants:
2 0
reaction type corresponds to these potential reactants:

particle type: 8, number of potential reactants for this particle type: 3
particle type of these potential reactants:
1 2 0
reaction type corresponds to these potential reactants:

particle type: 9, number of potential reactants for this particle type: 4
particle type of these potential reactants:
2 1 0 3
reaction type corresponds to these potential reactants:

particle type: 10, number of potential reactants for this particle type: 2
particle type of these potential reactants:
1 3
reaction type corresponds to these potential reactants:

In total 21 reactions
Reaction 0 radius 0.106994 0.160685 0.188323 0.217122 0.259417
Reaction 1 radius 0.079690 0.116624 0.134561 0.152323 0.176404
Reaction 2 radius 0.059631 0.081543 0.090634 0.098591 0.107715
Reaction 3 radius 0.000714 0.000722 0.000723 0.000724 0.000725
Reaction 4 radius 0.089671 0.135914 0.160209 0.185999 0.225061
Reaction 5 radius 0.086730 0.125359 0.143593 0.161225 0.184305
Reaction 6 radius 0.267838 0.437376 0.541703 0.670880 0.933775
Reaction 7 radius 0.135098 0.213396 0.258374 0.310351 0.402185
Reaction 8 radius 0.048267 0.068468 0.077599 0.086126 0.096753
Reaction 9 radius 0.039413 0.051999 0.056796 0.060757 0.065001
Reaction 10 radius 0.102360 0.155129 0.182845 0.212260 0.256794
Reaction 11 radius 0.101174 0.155419 0.184768 0.216786 0.267627
Reaction 12 radius 0.066543 0.095727 0.109355 0.122415 0.139295
Reaction 13 radius 0.072683 0.109659 0.128889 0.149113 0.179271
Reaction 14 radius 0.091116 0.142015 0.170434 0.202377 0.255924
Reaction 15 radius 0.081020 0.125221 0.149457 0.176230 0.219706
Reaction 16 radius 0.064677 0.092265 0.104899 0.116820 0.131891
Reaction 17 radius 0.084958 0.127919 0.150161 0.173457 0.207963
Reaction 18 radius 0.087767 0.133322 0.157372 0.183016 0.222154
Reaction 19 radius 0.157987 0.249298 0.301638 0.361995 0.468200
Reaction 20 radius 0.164099 0.260314 0.316085 0.381107 0.498001
At least here!!
Reading ./tables/dna/WholeNucleoChromosomesTable.binStraight Chromatin Table: Reading ./tables/dna/StraightChromatinFiberUnitTable.txt
Bend Chromatin Table: Reading ./tables/dna/BentChromatinFiberUnitTable.txt
Bent Histone Table: Reading ./tables/dna/BentHistonesTable.txt
Straight Histone Table: Reading ./tables/dna/StraightHistonesTable.txt
DNA geometry has been loaded to GPU memory
Finish initialize neighborindex
radius 0.242674
radius 0.377493
radius 0.485348
radius 0.350529
radius -269.638031
radius 0.000000
radius 0.287839
radius 0.434117
radius 0.301995
radius 0.287839
radius 0.084936
radius 0.000000
Setting for judging DNA damage
inipar is 0 test
Reading ./output/prechemRes.dat
inipar is 3024 test2
inipar is 3024 test3
Read initial radical information as the following
Initial radical number is 3024
type 0 radcial/molecule number 753
type 1 radcial/molecule number 1082
type 2 radcial/molecule number 74
type 3 radcial/molecule number 871
type 4 radcial/molecule number 127
type 5 radcial/molecule number 117
type 6 radcial/molecule number 0
type 7 radcial/molecule number 0
type 8 radcial/molecule number 0
type 9 radcial/molecule number 0
type 10 radcial/molecule number 0

Start GPU memory initialization
Finish copying data to GPU for chemical stage

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorInvalidValue: invalid argument
已放弃 (核心已转储)

Steps to Reproduce:

Set up the simulation with the specified parameters.

Run the simulation with the provided environment.

The error occurs during the GPU memory initialization phase.

Additional Information:

The issue happens consistently during this step.

I’ve verified that the system and CUDA environment are configured correctly, and the GPU is functioning as expected.

I’ve also checked that all the required data files (such as ./tables/chem/RadiolyticSpecies.txt) are correctly loaded without issues.

Could anyone help identify what might be causing the cudaErrorInvalidValue error and how to resolve it? Any insights or suggestions would be greatly appreciated!

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions