back of envelope
Aim is to get a rough estimate of users of a system, to decide how to go about designing it
Terms
DAU = Daily active users
Normally focused on Average
Doing activities which are either reads or writes, these are need to be considered separately in design
Key numbers
OPERATION
NOTE
LATENCY
SCALED LATENCY
L1 cache reference
Level-1 cache, usually built onto the microprocessor chip itself.
0.5 ns
Consider L1 cache reference duration is 1 sec
Branch mispredict
During the execution of a program, CPU predicts the next set of instructions. Branch misprediction is when it makes the wrong prediction. Hence, the previous prediction has to be erased and new one calculated and placed on the execution stack.
5 ns
10 s
L2 cache reference
Level-2 cache is memory built on a separate chip.
7 ns
14 s
Mutex lock/unlock
Simple synchronization method used to ensure exclusive access to resources shared between many threads.
25 ns
50 s
Main memory reference
Time to reference main memory i.e. RAM.
100 ns
3m 20s
Compress 1K bytes with Snappy |Snappy is a fast data compression and decompression library written in C++ by Google and used in many Google projects like BigTable, MapReduce and other open source projects. |3,000 ns |1h 40 m |Send 1K bytes over 1 Gbps network||10,000 ns |5h 33m 20s| |Read 1 MB sequentially from memory |Read from RAM. |250,000 ns|5d 18h 53m 20s| |Round trip within same datacenter| We can assume that the DNS lookup will be much faster within a data center than it is to go over an external router.| 500,000 ns| 11d 13h 46m 40s| |Read 1 MB sequentially from SSD disk |Assumes SSD disk. SSD boasts random data access times of 100000 ns or less.| 100,000 ns| 23d 3h 33m 20s| |Disk seek| Disk seek is the method to get to the sector and head in the disk where the required data exists. |10,000,000 ns |231d 11h 33m 20s| |Read 1 MB sequentially from disk |Assumes regular disk, not SSD. Check the difference in comparison to SSD!| 20,000,000 ns |~1.2 years| |Send packet CA->Netherlands->CA |Round trip for packet data from U.S.A to Europe and back.| 150,000,000 ns |~9.5 years|
Picture representation
Imagine working on an assignment for college
Your brain is the CPU (ie L1 cache) stuff that is in memory, super fast
Go to a book on your desk is like RAM
Going to a book in a book shelf, is like Disk Memory
Going to the library to get a book and making the return trip to your desk at home, is like doing a network call
Common numbers
data conversions
1 byte = 8 bit
1024 bytes = 1 Kilobyte
1024 kilobytes = 1 megabyte
1024 megabytes = 1 gigabyte
1024 gigabytes = 1 terabyte
1 char = 1 byte
1 integer = 4 bytes
32 bit integer
unix timestamp = 4 bytes
time
3600 seconds per hour
86,400 seonds per day
2.5 million seconds per month
Traffic estimate example
Info to find out
avg Number of users using system
avg number of actions (read/write) per user
Time frame of these
avg size of data to read or write
The costs of memory/cdn, bandwidth, storage
Think about approx avg revenue per user to determine profitablity
10 mill DAU for 30 reads = 300 mill read requests
10 mill DAU for 1 write = 10 mill write requests
300 mill per day = 300 mill/ 86400 = 3472 read req per day
10 mill per day = 10 mill/ 86400 = 115 writes req per day
Memory used
300 mill reads by 500bytes (approx) = 150 Gb total used (size of CDN)
For reads, number of data will be less (20% of total) as same data will be read often. This will probably be cached.
150 * 0.2 = 30 Gb to store
For data replication, ie data in distributed cache (will affect costs)
30 Gb by 3 replicas = 90Gb
Bandwidth
300 mill request by 1.5mb size of payload = 450000GB per day
Avg data transfer per second is 5.2 gb = 450000Gb/86400
Need to cater for variability, as never stable
Storage
Much more will be needed for long term storage ie 10 years, this is average as users leave (data kept x years then deleted)
10 years of data (365 * 10) by 10 mill writes per day at 1.5mb per payload = 55 petrabytes
Last updated