-
Notifications
You must be signed in to change notification settings - Fork 15
Description
In a Repast HPC model we were seeing non-portable results when executing a seeded model on different machines, or when executed within a singularity container.
Through debugging / reducing the problem size, we narrowed down the difference in results to be caused by the ordering of agent vectors returned by void repast::Context< T >::selectAgents(int count, std::vector< T * > & selectedAgents, bool remove = false).
On different platforms, this would return the same set of agents, but in a different order per machine. Later parts of the model process the vector in order RNG-based probability checks resulting in differing behaviour on different machines.
This (I believe) occurs due to the comparison of pointers within the default ordering of std::set<T*>, which is used by the selectAgents process, so when agent pointers are in a different order on different machines / platforms the order of the returned vector is different.
It is not clear from the selectAgents documentation if this is intended to be portable or not.
I've not managed to reliably reproduce this behaviour in a simpler/small model while using new for agent allocation, but have managed to produce a MWE which uses placement new to enforce ascending and non-ascending pointer ordering, demonstrating that the vector order depends on the order of agents:
https://github.com/ptheywood/repasthpc-select-agents-vector-order
We've implemented a workaround for this in the model by sorting the returned vector by the agent id, then performing a shuffle using the seeded PRNG to generate the same sequence reliably across separate machines.
Real-world example
Using examples from the larger model in question, initialised with a small number of agents (4) on a single mpi rank with a given fixed seed (the minimal scale problem which shows this difference on my local machine), on one machine the pointers allocated at initialised in order as:
# Allocation in order, added to the context
AgentId(1, 0, 0, 0) 0x55d8fa142d60
AgentId(2, 0, 0, 0) 0x55d8fa2fe640
AgentId(3, 0, 0, 0) 0x55d8fa31f370
AgentId(4, 0, 0, 0) 0x55d8fa270cd0
# returned by selectAgents
AgentId(1, 0, 0, 0) 0x55d8fa142d60
AgentId(3, 0, 0, 0) 0x55d8fa31f370
AgentId(2, 0, 0, 0) 0x55d8fa2fe640
AgentId(4, 0, 0, 0) 0x55d8fa270cd0
# Sorted by pointer
AgentId(1, 0, 0, 0) 0x55d8fa142d60
AgentId(4, 0, 0, 0) 0x55d8fa270cd0
AgentId(2, 0, 0, 0) 0x55d8fa2fe640
AgentId(3, 0, 0, 0) 0x55d8fa31f370
When executed on the same machine within a singularity container (with the same OS/repast/mpich/gcc versions) the ordering of agent pointers are:
# Allocation in order, added to the context
AgentId(1, 0, 0, 0) 0x560d5572dd60
AgentId(2, 0, 0, 0) 0x560d557e87e0
AgentId(3, 0, 0, 0) 0x560d55b8e580
AgentId(4, 0, 0, 0) 0x560d55b52760
# returned by selectAgents
AgentId(1, 0, 0, 0) 0x560d5572dd60
AgentId(3, 0, 0, 0) 0x560d55b8e580
AgentId(4, 0, 0, 0) 0x560d55b52760
AgentId(2, 0, 0, 0) 0x560d557e87e0
# Sorted by pointer
AgentId(1, 0, 0, 0) 0x560d5572dd60
AgentId(2, 0, 0, 0) 0x560d557e87e0
AgentId(4, 0, 0, 0) 0x560d55b52760
AgentId(3, 0, 0, 0) 0x560d55b8e580
I.e. the same agents are returned by selectAgents but in a different order (1,3,2,4 vs 1,3,4,2).
A MWE has been produced which reliably shows this issue by forcing non-ascending ordering of pointers through placement new: https://github.com/ptheywood/repasthpc-select-agents-vector-order
Possible Cause
I think the cause of this is due to the use of std::set<T*> within the random agent selection process, when SelectAgents is called.
Related methods in repast hpc appear to be:
repast.hpc/src/repast_hpc/Context.h
Lines 1002 to 1005 in 7c03167
| template<typename T> | |
| void Context<T>::selectAgents(int count, std::vector<T*>& selectedAgents, bool remove){ | |
| selectNElementsInRandomOrder(begin(), end(), count, selectedAgents, remove); | |
| } |
repast.hpc/src/repast_hpc/Random.h
Lines 701 to 704 in 07abc53
| template<typename T, typename I> | |
| void selectNElementsInRandomOrder(I iteratorStart, I iteratorEnd, int count, std::vector<T*>& selectedElements, bool remove = false){ | |
| selectNElementsInRandomOrder(iteratorStart, countOf(iteratorStart, iteratorEnd), count, selectedElements, remove); | |
| } |
repast.hpc/src/repast_hpc/Random.h
Lines 662 to 670 in 07abc53
| template<typename T, typename I> | |
| void selectNElementsInRandomOrder(I iterator, int size, int count, std::vector<T*>& selectedElements, bool remove = false){ | |
| // Transfer all elements from the vector to a set | |
| std::set<T*> selectedElementSet; | |
| selectedElementSet.insert(selectedElements.begin(), selectedElements.end()); | |
| selectedElements.clear(); | |
| selectNElementsAtRandom(iterator, size, count, selectedElementSet, remove); | |
| shuffleSet(selectedElementSet, selectedElements); | |
| } |
repast.hpc/src/repast_hpc/Random.h
Lines 351 to 367 in 07abc53
| template<typename T> | |
| void shuffleSet(std::set<T*>& elementSet, std::vector<T*>& elementList){ | |
| elementList.assign(elementSet.begin(), elementSet.end()); | |
| shuffleList(elementList); | |
| // // Or- probably faster... ?? | |
| // | |
| // elementList.reserve(elementSet.size()); | |
| // typename std::set<T*>::iterator setIterator = elementSet.begin(); | |
| // DoubleUniformGenerator rnd = Random::instance()->createUniDoubleGenerator(0,1); | |
| // for(int i = 0; i < elementSet.size(); i++){ | |
| // int j = (int)(rnd.next() * (i + 1)); | |
| // elementList.push_back(elementList[j]); | |
| // elementList[j] = *setIterator; | |
| // setIterator++; | |
| // } | |
| } |
repast.hpc/src/repast_hpc/Random.h
Lines 317 to 327 in 07abc53
| template<typename T> | |
| void shuffleList(std::vector<T*>& elementList){ | |
| if(elementList.size() <= 1) return; | |
| IntUniformGenerator rnd = Random::instance()->createUniIntGenerator(0, elementList.size() - 1); | |
| T* swap; | |
| for(size_t i = 0, sz = elementList.size(); i < sz; i++){ | |
| int other = rnd.next(); | |
| swap = elementList[i]; | |
| elementList[i] = elementList[other]; | |
| elementList[other] = swap; | |
| } |
Ultimately I believe this is due to the initial ordering of the elementList being assigned to the order of elementSet in shuffleSet, i.e. std::less<T*> via std::set<T*> default ordering?
This may not be the only source of pointer-comparison based non-portability.