Fallacies
Fallacies of distributed computing
List
The network is reliable.
assume that network calls always succeed.
networks can be unreliable — packets get dropped, connections time out, routers go down.
System my crash or hang
Best Practice
Use timeouts, retries, circuit breakers (e.g. Resilience4j).
Latency is zero.
assume messages travel instantly.
even on fast networks, latency can vary significantly
e.g. network hops, cloud VPC
If you assume zero latency, you might chain too many remote calls synchronously, leading to bad performance.
Best Practice
Design APIs with latency in mind
Use asynchronous messaging (e.g. Kafka) where appropriate.
Bandwidth is infinite.
We assume the network can handle as much data as we send.
In reality, bandwidth is limited and can saturate quickly.
A service tries to send a 1GB file to another service via REST — network congestion slows everything else down.
A burst of large responses might overwhelm the load balancer or the database replication stream.
Best Practice
Use streaming APIs (e.g. application/ndjson).
Limit payload sizes.
Compress data if needed.
The network is secure.
assume the network is private and can’t be attacked.
In reality, networks can be sniffed, man-in-the-middled, or spoofed.
Always use TLS (HTTPS).
Authenticate and authorize service calls (e.g. mutual TLS or OAuth2).
Topology doesn't change.
We assume the network structure (hosts, IPs, routing) is stable.
In reality, servers go down, IPs change (especially in the cloud), or services are moved.
Use service discovery (e.g. Eureka, Consul) instead of hard-coded IPs.
Expect dynamic scaling and design for it.
There is one administrator.
assume a single entity manages the entire system.
In reality, different teams or organizations might manage different parts.
Your team manages a microservice but relies on a shared Kafka cluster managed by another team — they might change configurations or restart the broker unexpectedly.
Define clear contracts and SLAs.
Expect partial outages of dependencies.
Transport cost is zero.
assume that sending data over the network is free.
In reality, data transfer can cost money (especially in the cloud) and CPU.
AWS charges for data transfer between availability zones or regions. Sending lots of logs or files can generate high bills.
Large payloads also cost CPU cycles on both ends.
Optimize data transfer.
Use caching and batching where appropriate.
The network is homogeneous.
assume all parts of the network are the same.
In reality, they vary: different OS, software versions, hardware, languages, protocols.
A Spring Boot service on Linux calls a .NET service on Windows — different serialization, case sensitivity, or character encodings might cause bugs.
Kafka clients might behave differently between Java and Python.
Use standard protocols (e.g. JSON, gRPC).
Test across multiple environments.
Links
https://ably.com/blog/8-fallacies-of-distributed-computing
Last updated
Was this helpful?