Memory Hierarchy
S-RAM Cell¶
6T-SRAM Cell (6 transistor)
SRAM is volatile because the NOT gates need powering to maintain their state (remember the CMOS tech, VDD is needed to make these gates work)
The word line is the read/write enable.
The complementary bit lines are connected to a single sense amplifier. The job of this sense amp is to calc the difference of the voltages on the two lines. If it is positive, then the sense amp outputs a valid logic 1
else a valid logic 0
.
This is what happens when the data needs to be read, word line is set high, the sense amps detect and output accordingly.
When writing, the word line is set high and a strong voltage needs to be supplied through the bit line and its complement (to get the loop to flip states if necessary)
DRAM¶
2 compoenents
DRAM is volatile because it needs powering/refreshing to maintain its state
To read, we set the word line high and read the voltge of the capacitor on the bit line
To write, we set the word line high and supply a strong 1
or 0
to charge or discharge the capacitor.
DRAM looses charge (data) overtime.
DRAM cells need to be refreshed (refresh rate!) to keep the data intact
significantly slower than SRAM but cheaper
A D-Latch uses a mux - 4 nand gates - 16T.
A DFF uses 2 muxes - 32T + 2T for inverter
Hard Disk¶
Memory Addressing¶
Memory Hierarchy¶
Locality of Reference¶
It is possible to give the user an illusion that they’re running at SRAM speed at all times due to the locality of reference.
The locality of reference states that reference to memory location \(X\) at time \(t\) implies that reference to \(X+\Delta X\) \(t + \Delta t\) becomes more probable as \(\Delta X\), \(\Delta t\) approach zero
In laymen terms: there exists the tendency of a CPU to access the same set of memory locations repetitively over a short period of time.
Evidence that memory reference patterns exhibit locality of reference:
Local stack frame grows nearby to one another Related program instructions are near one another Data (e.g: arrays) are also nearby one another (why python lists are slow)
Average LD/ST Times¶
\(t_ave = \alpha t_c + (1-\alpha)(t_c + t_m) = t_c + (1-\alpha)t_m\). t_c is always there, but \(1-\alpha\) times we also have \(t_m\)
\(\alpha\) - cache hit ratio
Most modern CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a Translation Lookaside Buffer (TLB) used to speed up virtual-to-physical address translation for both (executable) instructions and data. Data cache is usually organized as a hierarchy of more cache levels (L1, L2, L3, L4, etc.).
FA Cache¶
- Expensive
- Needs 64 bis (32 bits address, 32 bits data) to store each memory address
- Bitwise compare at each cache line
- Tri state buffer at each row
- Large
OR
gate to computeHIT
- Very Fast because of parallel lookup when given an address
- Flexible because any mem can go to any cache line
DM Cache¶
- Cheaper:
- Less SRAM is used as the
TAG
only contains theT
upper bits of the address A - Only one bitwise compare needed for
2^(32-T)
addresses - A demux and
32-T
selector bits are needed - Not as flexible as FA Cache because
2^(32-T)
addresses are stored consecutively - Contention Problem (2 different addresses with same lower
32-T
bits map to same cache line) - Slower since no parallel searching