concurrency
core idea is that there are three classes of software
- single thread single process (single core)
- multiple threads multiple processes (2-8 cores)
- distributed processing (9+ cores)
- is growing as computers improve. (2) is shrinking and is not future proof for most scalability problems. (2) is for gamers.
Python supports concurrency, you just need to understand how to fit into the Python concurrency paradigms.
- Threads
- pro: shared state
- con: also shared state (race conditions)
- Processes
- pro: independence
- con: pickling / interprocess controll
- Async
- pro: cheap and easy
- con: only for IO-bound tasks
asyncio
- based on an event-loop (twisted, gevent, etc)
- intelligently switch execution while the program is awaiting IO
- Async switches are cheap because they are internally using generators
- With explicit keywords (‘yield’, ‘await’) switching is cooperative and there is no risk of inconsistent state
threads
- threads share state, but this can be tricky with race conditions
- Threads switch on their own basically for free, so they must always assume they will be interrupted.
- This is where the GIL comes in. It is a protection.
- Would you rather have one simple lock, or many, many individual locks?
rules:
- pick between locks or queues (queues are preferred because too many and code is serial)
- thread before you fork
processes
- not every process is parallelizeable
- “putting 5 workers on making a baby does not give baby in 1 month”
- amdahl’s law - there is a spectrum of theorhetical speedup with concurrency of tasks
- on the scale of lawn mowing to baby making how parallelizeable is this task?