If you’re looking for a place to start, W3Schools has a Python tutorial that’s pretty straightforward. It breaks things down ...
a naive parallel scan algorithm compatible with arbitrary sized input arrays; a work-efficient parallel scan algorithm compatible with arbitrary sized input arrays; a stream compaction algorithm built ...