Skip to main content

Ethan Thornberg

Week 7: Hostile Networks

Ethan Thornberg11 min read
Deep WorkNetworkingUDPProxyGame DevelopmentNetwork SimulationMonth 2

Week 7: Hostile Networks

Week 6 ended with something I was proud of.

Multiplayer treasure hunt game. Eighty treasures on a 100x100 map. Up to twelve players simultaneously. Bot AI with nearest-treasure pathfinding. A 10x10 viewport that scrolls with you. Thursday afternoon I'd added bots, turned up their movement speed, and watched twelve of them race around competing for treasure. It was genuinely fun to watch.

Week 7 I built a proxy that sits between client and server and pretends to be the internet.

Then I ran the game through it.

That was the week.

The Routing Problem

Monday I started on the proxy. The idea was straightforward: intercept every packet between client and server, add artificial delay, randomly drop some of them, see how the game holds up.

The architecture seemed obvious at first. Client sends a packet, proxy intercepts it, proxy forwards it to the server. Thirty minutes in, I hadn't written anything useful.

UDP doesn't preserve destination addresses. When a packet arrives at the proxy, I know who sent it. I have no idea where it's supposed to go. This is fundamental — UDP is stateless. Each packet is a standalone thing that flies through the air and lands wherever you aim it. The proxy receives the packet but has no memory of where it was supposed to go.

I could solve this on the proxy side. Build a routing table, track every client, map sender addresses to their corresponding destination. But that meant the proxy had to understand the game's protocol. Every time I added a new client type or changed a message, I'd need to update the proxy too. Brittle, maintenance-heavy, and defeating the point.

I wanted the proxy to be invisible. Something you could drop in front of any UDP application — not just this game — without touching a line of the application code.

After staring at the problem for a while, I landed on address embedding.

Instead of sending a packet directly to the server, the client sends it to the proxy with the destination address prepended to the start of the packet itself. The proxy reads the first few bytes, extracts the destination, strips the header, and forwards the payload to where it was actually supposed to go. On the return trip, the server sends to the proxy with the client's address embedded the same way.

The key was building this through wrapper functions — send_proxy() and rec_proxy() — that match sendto() and recvfrom() exactly. Same arguments, same return type. The only thing a caller has to change is the function name.

The best abstraction is the one that costs nothing to adopt.

By end of Monday, proxy_config.h, proxy_utils.h/c, and proxy.c were working. The routing was clean. I had something I could actually test with.

Making It Hurt

Tuesday I added what I actually wanted: configurable delay and packet loss.

The delay worked through a packet queue. Each incoming packet gets timestamped and enqueued. Before forwarding anything, the proxy checks the head of the queue: has delay_ms elapsed since this packet arrived? If not, it waits. FIFO ordering, so packets leave in the same order they came in — just delayed.

Packet loss was literally one line. Before forwarding, roll a random number between 1 and 100. If it's less than drop_rate, return without sending. The packet is gone. No one on either side knows it ever existed.

Command line configuration: ./proxy 100 10 for 100ms delay and 10% drop rate. The proxy prints live statistics — packets received, forwarded, dropped — so you can see exactly what's happening.

Then I integrated it into the treasure hunt game.

Changed every sendto() to send_proxy(). Changed every recvfrom() to rec_proxy(). That was it. The entire integration.

Five seconds.

I sat with that for a moment. On Monday I'd designed the wrapper functions specifically so this would be trivial. It was. One conceptual decision made 24 hours earlier had completely eliminated integration friction. The design paid off exactly as intended. That almost never happens when you're still learning.

I made a note in my journal and moved on to testing.

When the Game Broke

50ms latency.

I pressed 'w' to move up.

There was a pause. Nothing dramatic. Just... a gap. My input went somewhere and had to come back before the character moved. I pressed a few more keys. Each one had that lag behind it.

50ms is a normal latency. That's what you get connecting to a server a few hundred miles away. That's not a bad connection. That's just the internet.

I reminded myself of that fact, and it made the frustration worse.

100ms. The gap doubled. I'd press a direction and see my character start moving almost a full second later because the round-trip added up. More unsettling: I could see other players not where they were, but where they were 100ms ago. Their afterimage. If someone was racing me for a treasure, I was reacting to a ghost. By the time my movement arrived, they'd already been somewhere else for a tenth of a second.

Predicting where I'd be, where they'd be, and where the treasure was relative to both of us — in real time, with delayed input — was genuinely difficult.

200ms. I stopped trying to collect treasures. I couldn't predict where I'd be by the time I arrived anywhere.

Then I turned on packet loss. 10%.

The game fundamentally broke.

Movement packets dropped silently. I pressed 'd'. Nothing happened. The server never got the update. I was pressing buttons and the character was frozen. I'd hold the key down for a full second and sometimes nothing would happen until a packet finally got through.

Treasure collection desynced in a way that was maddening. I'd walk onto a treasure — nothing. Score didn't change. Treasure didn't disappear. What had happened: the server got my movement update, moved me onto the treasure, processed the collection, removed it from the world, and sent me a confirmation. But the confirmation packet got dropped. My client never heard back. It was still trying to collect a treasure that the server had already removed.

The worst bug I found: when a player position packet went missing entirely, the client didn't handle the gap gracefully. Instead of skipping the missing data, it unpacked the next available bytes as if they were that player's coordinates. Those bytes were actually the next player's data. The result: my character teleported across the map. Not because I moved. Not because of lag. Just because a packet was missing and the byte unpacking went wrong.

I sat back and looked at the screen.

Combined conditions — 100ms plus 10% loss. Half my movements failed silently. Treasures were permanently desynced, just sitting there unable to be collected while the server had already removed them. Players blinking across the map. Score frozen.

I realize how poorly architected my program is for these scenarios.

That thought arrived quietly. Not panic, not embarrassment. Just clarity.

The Architecture Was the Problem

I wasn't just dealing with bugs. I was dealing with assumptions.

The entire architecture assumed a perfect network. Packets always arrive. In order. Immediately. Every player movement is confirmed before the next frame. The server is the source of truth and nothing is ever lost.

That's not how networks work. Even a good connection has latency. Even a reliable connection occasionally drops a packet. 50ms and 1-5% packet loss isn't a bad network — it's an average one.

The game I'd built worked fine in my terminal. Client and server running on the same machine, connected through localhost. Zero latency. Zero packet loss. I'd been testing against a hypothetical network that doesn't exist anywhere real players would actually connect from.

I could trace five specific architectural failures. No client prediction — every movement waits for server confirmation before rendering, so one round-trip of lag is baked into every single keypress. No reliability for critical messages — connection packets, treasure confirmations, player joins can disappear with no retry logic and no acknowledgment. No state reconciliation — when client and server diverge due to dropped packets, there's nothing to pull them back together, they just drift. No interpolation — other players either have data or they don't, and no data means teleporting. And underlying all of it: the system was designed for ideal conditions that don't exist anywhere outside of localhost.

Building something that works locally is easy. Building something that survives a hostile network is a different problem entirely.

I hadn't understood that distinction by reading about it. I understood it by pressing 'w' and watching nothing happen.

Slower Work

Wednesday was documentation.

No new features. Just writing down what the proxy revealed. Why the address embedding approach worked. What each failure mode meant at the architecture level. What the specific failures were under each test condition.

I also added color output to the proxy stats, reorganized the folder structure — proxy_v1 for routing only, proxy_v2 for delay and loss — and prepared notes for the blog post.

There's a version of me that would skip this day. The problem was obvious, the solutions were obvious, just start building. But the problem wasn't quite as obvious as I thought. The difference between "network is unreliable" as an abstract fact and "here are the six specific things that break in my specific codebase when latency hits 100ms" is the difference between understanding something and knowing what to actually fix.

Writing forces you to be precise. You can't write a clear explanation of something you half-understand.

Thursday I spent reading. Glenn Fiedler's networking articles. Gabriel Gambetta's client-server game architecture series. Valve's Source engine documentation.

I'd read Gambetta before, back in Week 5. The concepts were interesting but vague. Client prediction. Server reconciliation. Lag compensation. I'd read the words without fully understanding the problems they were solving.

Reading them Thursday was different. I had a concrete failure in front of me. Every technique Gambetta described mapped to something specific I'd watched break on Tuesday.

The key insight: not everything needs to be reliable.

Position updates are fine as fire-and-forget UDP. If one drops, the next one arrives 33ms later with more accurate data. Retransmitting old positions would add latency for no benefit. But connection handshakes, treasure collection confirmations, player joins — these can't be lost. One dropped connection packet and the server never knows a client exists. That's a permanent failure with no recovery.

Selective reliability. Not everything needs an ACK. But some things absolutely must have one.

That distinction made the Week 8 architecture cleaner than I expected. Not rebuilding everything. Just adding a reliable delivery layer to a specific subset of messages and leaving position updates alone.

Early Start

Friday I started Week 8 before the week officially started.

Documentation was done. Research was done. I had a clear plan and a productive session ahead of me.

Added a YOUR_ID_IS message type. Server now assigns each client a stable numeric ID on connection and sends it to them. Implemented send_user_id() to handle the assignment. Clients store the ID and use it going forward.

Started the foundation for client-side prediction: the client now ignores position broadcasts for its own player. Instead of waiting for the server to confirm movement and then rendering it, the client renders its own position immediately from its local state. Server corrections will come later to handle divergence. But the input lag is gone.

Each piece directly addressed something I'd watched fail on Tuesday.

What Week 7 Proved

The proxy worked exactly as designed.

Five seconds to integrate. Transparent routing. The address embedding approach meant the game code needed essentially no changes. That was the best outcome I could have hoped for on Monday. The design decision paid off.

What the proxy exposed was harder to sit with.

Watching the game break wasn't discouraging the way I expected it to be. It was clarifying. All the abstract problems I'd read about — client prediction, server reconciliation, reliable delivery — became concrete. They weren't best practices for some hypothetical scale. They were solutions to failures I'd just directly experienced.

I've been building something that works locally for six weeks. This week was the moment I realized "works locally" and "works on a real network" are different categories. The proxy built the bridge between them.

The game wasn't broken because of bad code. It was broken because of what I'd assumed when writing it.

You don't have to plan for failure from day one. But you have to be honest when failure shows up and tells you what it needs.

The proxy is now the testing framework for everything Week 8 builds. Every improvement I make — client prediction, selective reliability, server reconciliation — can be validated by running it through ./proxy 100 10 and seeing if things still break the same way.

Thirty-five days. Zero missed.

Week 8 is about building for the network that actually exists, not the one I was testing against.

Same table. 2pm. Day 36.