| Approach | Notes |
|---|---|
| asyncio with asyncio.Queue + worker coroutines | Elegant for I/O-bound work and avoids GIL concerns; requires making the fetch/parse step async (e.g., wrapping blocking calls with asyncio.to_thread or switching to aiohttp). Risk: if the provided helper is a blocking call, the event loop will be blocked unless explicitly offloaded. Several candidates who tried writing a custom async HTML parser hit encoding errors or ran out of time. |
| DFS (recursive or stack-based) single-threaded baseline | Simpler to implement first; some candidates started here and then refactored. Risk of stack overflow for deep sites; BFS is generally preferred for breadth-first discovery and easier to parallelize. |
| Distributed multi-server design (follow-up only, no code required) | Use a central queue (e.g., Redis) with one coordinator assigning URL batches to worker servers; adds fault tolerance and horizontal scale but significant operational complexity. |