-
Notifications
You must be signed in to change notification settings - Fork 42
Description
An alternative implementation for multiprocessing of paintGrid exists in hexrd.fitgrains. The approach works as follows:
-
Create a
Workerclass that implements themultiprocessing.Processinterface, but is not a subclass of that class. This will be used when multiprocessing is disabled, for example during profiling. The worker exits when the queue is empty. -
Create a
WorkerMPclass that subclassesWorkerandmultiprocessing.Process. Basically all this class needs to do is call 'Process.init' to enable multiprocessing. -
Create a
multiprocessing.JoinableQueueand populate it with the job-specific information, which tends to be very small (for example, a single quaternion). -
Pack all of the contextual data into a dictionary, to be passed to the individual workers during instantiation.
-
Create a multiprocessing.Manager.List to hold the results
-
Start the multiprocessing workers sequentially:
for i in range(n_cpus): w = Worker(queue, results, params) w.start()
Each worker begins processing immediately, its possible processing may even complete before all workers have been spun up.
-
Wait until the results list is complete, updating progress bars based on its length.
Improvements to be made:
- Refactor this multiprocessing approach into a separate module containing abstract base classes to avoid code duplication.
- Implement a custom map function that is called with a
Workerclass (not an instance), the contextual information, number of cpus, a list of data to iterate over, and a progress callback as input. It creates a queue to pass to the workers, creates a managed list to hold the results, spins up the workers sequentially and begins processing, and then enters a loop to report progress until processing is complete. The function returns the list of results. - Consider breaking this into a custom
Poolclass with amapmethod. This implementation is cleaner, Pool would be instantiated by passing theWorkerclass, the contextual data dict, and the number of cpus, map would be called with the list of data over which to iterate, and the callback. The problem is that for smaller datasets, much of the processing time appears to be consumed by spinning up the Workers themselves, so we want each worker to begin processing immediately, not wait until the entire pool is ready. We should time the initialization step though, perhaps it is not such a big issue. - Convert paintGrid multiprocessing to use this approach.
- Refactor fitgrains multiprocessing to use this new approach.