Wednesday, April 4, 2007

Memory latency

I recently found someone that measured (circa 2006) the different latencies of the memory hiearchy on a typical PC. Here are the important numbers:
levellatency
L1 cache (on die)1.4ns
L2 cache (off die)9.7ns
RAM (PC2700 DDR) 28.5ns
Hard Drive (250 GB Maxtor 7200 RPM) 25.6ms

The test machine had an AMD Athlon XP 3000+ processor. Another interesting piece of data is the throuput (sustained data transfer) of the RAM, 2541 MB/s versus of the hard disk: 67 MB/s.

Note that in Intel x86 and AMD processors besides the hiearchy of cache, you also have the translation lookaside buffer (aka TLB) which is a small cache that keeps the most recent virtual-to-physical translations. This quite important if you code runs in an OS that uses virtual memory such as Windows or Linux or Solaris or MacOsX. When there is a TLB miss the CPU has to scan the page tables (living in RAM) to find the appropiate entry. This results in about 3 to 4 RAM reads. In terms of the size, the 1995 Pentium had one code TLB and two data TLBs, one of the data TLB has 64 entries (for 4Kb pages) and the other 8 entries (for 4Mb pages). The instruction TLB is 32 entries.

The biggest hit in terms of TLB-related (or virtual-to-physical translation) happens when the OS switches processes. This causes the active page tables to change which in turn requires a total (or partial) TLB invalidation.

The credits for the measures go to Diomidis Spinellis in his paper "Some Types of Memory are More Equal than Others".