Week 9: Workers
Week 9: Workers
The week didn't start at Starbucks. It didn't start at 2pm.
It started Monday at 6:30am in LA, at my girlfriend's mom's house, while she was still asleep. I'd gone down for the weekend. Different room, different city, different context entirely. I sat down at her kitchen table with my laptop and started drawing boxes.
Three processes. A server that routes. A client that submits requests and leaves. A worker that stays connected and processes what it's given. Job struct. Workers linked list. An event loop in the middle. I sketched the rough shape of everything before I wrote a line of code, which isn't how I usually start.
It helped.
The Same Linked List, Again
The first real work Monday was building the structural backbone: jobs.h, workers.h, the node linked lists that would underpin both.
I've written this same boilerplate so many times. Struct with a next pointer. Head, tail, count. Functions to add, remove, look up by ID. I built it for players in Month 2. I built it for packets. I built it for pending ACKs. Every new project in C, I open an old header file, copy the node structure, adjust the type names, and start over.
Monday I got tired of it and spent twenty minutes looking into whether C could do what C++ does with templates: write a linked list once, reuse it for any type. The answer is that it can't. Not in any way that's worth it. C has macros, and you can do things with void * pointers, but none of it gives you what C++ templates give you. C doesn't have the concept of a generic. It knows what a struct is. Structs don't inherit. There's no <T>.
I wrote the linked list twice, once for jobs and once for workers, copy-pasted and adjusted, and moved on.
Learning a language's limits is part of learning the language. Not a failure to figure it out. Just information about what the tool does and doesn't do.
The other Monday piece was epoll, something I'd wanted to experiment with since the game server's event loop felt clunky. I got a basic epoll instance running, then did something I'd always wanted in the previous projects but never figured out: I added file descriptor 0 to the epoll set. Stdin.
In the game server and the proxy, the only way to interact with a running server was to kill it. Change a constant, recompile, restart. There was no way to send a command to a live process. Connecting stdin to the event loop fixed that. Type stats while the server is running, get a snapshot. Type quit, get a clean shutdown. Small thing. Felt significant in a way that's hard to explain unless you've wanted it for months.
By end of Monday: jobs.h, workers.h, common.h, and a server that echoed whatever I typed at it. No connections. No submissions. Just a process watching fd 0 through epoll and a cleaner mental model of what the week was going to look like.
Clients Don't Need to Stay
Tuesday I started building client connection support and quickly discovered I'd been planning wrong.
The original design had a client struct, a persistent data structure for tracking connected clients, similar to how the game server tracked players. I started sketching the client list before I realized I didn't need one.
A player in a game is a persistent entity. They connect, they move, they interact with other players, they disconnect. The server needs to know who they are across multiple messages. You need a struct, a linked list, a lookup function.
A client submitting a job is a transaction. Connect, send the request, receive the response, close the connection. There is no state between messages. The server calls accept(), reads the packet, parses the command type, calls the appropriate handler, sends a reply, and closes the fd. The entire interaction happens in one call to handle_client_request(). The client doesn't live in memory because there's nothing to remember.
Not every connection needs to be a relationship.
That distinction shaped the whole architecture. The client listener is fire-and-forget: accept, respond, close. The worker listener registers each connection with epoll. Workers are persistent, they stay connected, they get job assignments pushed to them and send results back asynchronously. Two sockets, two ports (CLIENT_PORT 1209, WORKER_PORT 1205), two fundamentally different patterns living inside the same event loop.
By end of Tuesday the client could submit jobs, check status, and retrieve results. Jobs accumulated in the jobs list and returned "in queue" to anyone who asked. No workers existed yet so nothing got processed, but the foundation held. I also pulled the buffer management and time utilities from Month 2 and dropped them straight into the utils folder, one of the first times building something new felt genuinely like building on top of something I'd already made.
Wednesday's Worker
Wednesday I built worker.c, and for the first time it felt like a system.
The design took shape faster than I expected, which usually means I'd been thinking about it without realizing it. Worker connects to the server on WORKER_PORT, receives a WPACKET_CONNECTED packet with its assigned ID, and enters its own event loop. Server adds the worker fd to epoll and to the workers linked list. From there, the worker waits. When a job comes in from a client, the server scans the linked list for the first W_READY worker, sends it a WPACKET_NEWJOB packet with the job metadata, and marks it W_BUSY. Worker extracts the metadata, calls process_job(), sends a WPACKET_RESULTS packet back when done. Server stores the result, marks the job J_SUCCESS, sets the worker back to W_READY.
The job logic lived in job_processing.c. process_job() calls determine_job_type() first. That function extracts the first word from the submission buffer using memmove() to shift the rest of the content left, then returns a job type constant. Three types to start: echo, wordcount, capitalize. If the keyword doesn't match anything known, it returns WERR_INVALIDJOB.
One design decision I liked: when a new job was created, its results field was immediately populated with the original metadata the client sent. If you queried results before the job finished, you got your own input back, not an error, not garbage. A reminder of what you submitted. It prevented null-read crashes and it was actually useful as a sanity check.
Wednesday evening: three terminals, server running, one worker connected, one client. Submit a job. Watch the server assign it. Watch the worker print "processing..." and come back with a result. ./client results 0 returned real output.
Getting it working isn't the end. But it's the moment you stop wondering if it can work.
Thirty Minutes on a Data Structure
Thursday started with a design problem I spent half an hour on.
Before Thursday, the server's behavior when a job came in and no workers were free was simple: discard the job. That's not a queue. That's routing with losses. A real queue holds jobs when all workers are busy and drains them as workers become free.
The data structure question: what do I store? All I needed was job IDs. The jobs themselves lived in the jobs linked list with all their data. The queue just needed to know which IDs were waiting and in what order.
My instinct was a dynamic array. Append to the back, pop from the front. Simple. Familiar. Then I thought through popping. To remove an element from the front of an array, you shift every remaining element left by one. O(N) per pop. The queue gets popped every time a worker becomes available, which in a busy system is constant. That's not a pop operation at scale; it's a copy of the entire queue on every job assignment.
Linked list. Just job IDs and next pointers. struct JobQ with an int job_id and a struct JobQ *next. That's the whole node. pop_queue() returns the id, frees the node, O(1). check_queue() runs at the bottom of every main loop iteration: if there's a W_READY worker and a job in the queue, pop and assign.
Working through a tradeoff yourself is different from being told the answer. One you understand; the other you remember.
Thursday also added the error taxonomy that retry logic depends on. Two codes: WERR_INVALIDJOB when the submission keyword is unknown, WERR_UNKNOWN for anything unexpected. Invalid jobs fail immediately and permanently. Retrying a malformed submission produces the same error every time. Unknown failures retry up to three times, incrementing retry_ct on the job struct each time, then fail permanently. The worker sends its error code in every WPACKET_STATUS packet so the server knows which branch to take.
Worker disconnection mid-job: handle_worker_disconnection() checks if that worker's cur_job_id is valid. If so, it finds the job and requeues it. The worker is removed from the list. Nothing is dropped.
Four Types
Friday I found a bug that had been in the code since Wednesday.
job_wordcount() was counting characters, not words.
The original loop incremented a counter for every non-space character in the buffer. That's charcount. I'd been testing manually on short strings where the numbers were close enough to seem right. I caught it before stress testing, which is the only reason it didn't cause more confusion.
The fix: job_wordcount() now counts word transitions. A new word starts when a non-space character follows a space or the beginning of the string. Then I wrote job_charcount() as a separate function, because counting characters is its own valid operation. Added JTYPE_CHARCOUNT to common.h. Four job types instead of three.
A Thousand Workers
Typing ./client submit "wordcount one two three" into a terminal repeatedly stopped being useful fast. I wanted to summon N workers with one command and flood the server with submissions to watch what held.
create_workers.c is short. It takes a count, forks N times, and calls execl("./worker", "./worker", NULL) in each child. That's the whole program. But execl does something I didn't fully appreciate until I wrote it: it replaces the calling process's program image entirely. The PID stays. The parent's stack, its memory, its state are gone. The child becomes a completely different program wearing the same process ID.
Which means one binary call can put a hundred workers on the server. Open a terminal, run ./create_workers 100, and a hundred persistent worker processes connect simultaneously. submit_jobs.c is the other side: it mimics client.c exactly but loops, opening N connections sequentially in a single process, submitting N packets without spawning anything new. No forking. One call, ten thousand jobs.
I stress-tested with a thousand workers and ten thousand submissions. The server handled it. I watched the queue drain.
What I couldn't stop thinking about was the treasure game. That project had bots, but spawning them meant opening terminals one at a time and running the binary by hand. Ten bots meant ten terminals, each started manually, each stopped manually. With this pattern I could have written a spawn_bots.c that forked a hundred of them in under a second and shut them all down just as fast. The bots would have run games against each other while I watched from one terminal. The game could have playtested itself.
The most powerful tools are the ones you wish you'd had in the last project.
There was a ceiling I hit along the way: fork failed: Resource temporarily unavailable. I'd been forking workers faster than they finished and the OS process table filled up. Not a bug in my code. A hard constraint from the kernel. There are only so many processes the system can manage at once, governed by ulimit, and I'd blown past it. submit_jobs.c was the workaround: sequential connections in a single process instead of a new process per job. The load test runs. The ceiling doesn't matter.
There's a difference between a bug and a limit. Bugs are yours to fix. Limits are the system telling you something true about the world.
What Week 9 Proved
The clearest thing I understood by Friday was why the architecture worked.
Three processes, three clearly defined roles. The server knows nothing about what jobs do. It routes and tracks state. The worker knows nothing about the queue. It connects, waits, processes what arrives, reports back. The client knows nothing about workers. It submits and asks. Each piece communicates through a defined protocol and can change independently as long as the protocol holds.
I've heard this described in every systems course I've taken. Separation of concerns. Loose coupling. It sounds obvious until you're making the decisions yourself: realizing the client struct isn't needed, choosing where determine_job_type() lives, deciding that the results field should be pre-populated with metadata. Those aren't concepts. They're judgment calls under ambiguity, and whether they're right only becomes clear once the system is running.
Something else is clearer now too. By Month 3, the hardest part isn't showing up. The sessions fly by. I find myself wanting to extend them. Six weeks ago there was a negotiation every time I sat down; now there isn't. What's harder is deciding what to focus on long enough to actually master it before moving on. Month 3 had twenty possible directions. I picked one and committed. That choosing is where the real work lives now, not the sitting down.
The nature of the difficulty has changed. The conflict shifted from whether I'd show up to what I should build while I'm here. I don't always have a good answer. The code is real. So is the person writing it.
And Monday, 6:30am, someone else's kitchen, fourty days of the habit already behind me. The context was completely different. The habit didn't care.
Same table. 2pm. Day 46.
Related Posts
Week 8: Prediction
Fixing the broken game. Client-side prediction, server reconciliation, a comprehensive reliability system, and the transformation from networking prototype to something production-ready.
Week 7: Hostile Networks
Building a UDP proxy to stress test the treasure hunt game. Discovering that real networks break everything. Rethinking the architecture from the ground up.
Week 6: Flow State
Building a multiplayer treasure hunt game. Finding flow. Making hard decisions. 30 days complete.