bzdww

Get answers and suggestions for various questions from here

Cache/memory Coherence model

cms

1. Multi-Core, Multi-Cache

In this era of CPU cores reaching double digits, L1/L2 Cache is not limited to all core shared designs. In the general SMP structure of the CPU, generally each core will have its own L1 Cache or even L2 Cache, while several cores share an L2 Cache or L3 Cache. Cache and Memory are no longer the one-to-one correspondence, and multiple Caches will interact with the same Memory.

In this case, to ensure the consistency of the data taken by the core, the easiest way is to:


All read and write operations occur directly on the memory.

Naturally, this has a huge impact on performance. Cache does not play any role at all, and multiple cores need to preempt the bus for Memory.

Let's start by making improvements to this simplest model.


2. Valid bit

In order to indicate whether the current Cache can be read directly (consistent with the memory data, no modification has occurred), we assign a status bit to each cache line: valid/invalid, and correct the corresponding Read/Write/Evict rule:

Read Rules :

  1. if NOT hit cache(Miss or in invalid state), read from memory and refill with valid state.
  2. if hit and cache line state is valid, read from cache directly.

Write Rules :

  1. send a write notify signal to all other cache
  2. if other cache contains such cache line with valid state, change state to invalid
  3. flush back to memory and
    • if hit and in valid state, could directly modify the cache line, keep state valid
    • if not hit or is in invalid state, refill cache line with state valid

Evict Rules :

  1. just remove the cache line no matter it is in valid or invalid state

Here you can think of the concept of Read/Write Lock to ensure the atomicity of the operation, so the interaction between multiple caches is inevitable:


  • write will block other write and read
  • read only block other read

The valid bit can improve the efficiency of the CPU to read data. If the cache is valid, it can be read directly from the cache. However, for write operations, it still directly triggers the interaction between Memory and cache, and the performance is not improved.


3. MSI (Modified-Shared-Invalid)

It can be found that if some core caches separately store a certain cache line, then when these cores modify the data in the cache line, they can directly write to the cache, and there is no need to write the Memory. However, when this cache line is evicted, you need to flush the data into Memory.

Based on this, you can add a new state to the two-dimensional state of the valid bit, called the MSI protocol , including three states (copy from wikipedia):

MSI States :

  • Modified (M) : The block has been modified in the cache. The data in the cache is then inconsistent with the backing store (e.g. memory). A cache with a block in the “M” state has the responsibility to write the block to the backing store when it is evicted.
  • Shared (S) : This block is unmodified and exists a read-only state in at least one cache. The cache can evict the data without writing it to the backing store.
  • Invalid (I) : This block is not present in the current cache, and must be fetched from memory or another cache if the block is to be stored in this cache.

Modification is exactly what we mentioned at the beginning of the section. The cache line is in an exclusive state. For the modification of the M state cache line, memory storage will not be triggered (as long as no evict occurs), and there is no need to interact with any other memory. S is similar to the valid state in the valid bit, and I is the invalid state.

The corresponding read/write/evict rules also need to be revised accordingly:

Read Rules :

  1. if NOT hit cache(Miss or in I state), ask other cache if there is M state cache line
    • if M state cache exists, ask it to flush back, then both refill with S state, current cache may get the value from M state cache directly or from memory.
    • if not, refill from memory with S state
  2. if hit cache and is in M or S state, read the value from cache directly

Write Rules :

  1. if NOT hit cache(Miss or in I state), ask other cache if there is M or S state cache line
    • if M state cache exists, then the cache line try to write must merge with the M state cache line, since it is unable to know if other block in the M cache line is modified. Current cache may get the value from the M state cache or from memory after the M state cache flush back. When getting the value, current cache merge it into and fill the cache with M state. The M state cache mark the cache line with I state.
    • if S state cache exists, aks those S state cache to modify state with I state, and fill cache with M state.
    • if NO M or S state exist, just fill the cache with M state
  2. if hit and is in M state, directly write the cache.
  3. if hit and is in S state, ask other S state cache to modify state with I state and fill cache with M state.

Evict Rules :

  1. if cache line is in M state, write back the cache line into memory
  2. if cache line is in S or I state, just remove the cache line

It is worth noting that in order to reduce the memory bus pressure, the operation of flush/refill can be reduced as much as possible. Therefore, in the above rules, we are biased to put the cache line in M ​​state (cache for write operations) or I state (for The cache line becomes an invalid cache) instead of flushing or refilling it and setting it to S state.

In addition, the M state cache line will only exist in at most one cache, and for a cache line, it is impossible to have both M and S states in several caches. The I state cache line can coexist with any other state cache. Specifically, the coexistence of state is as follows:



MSI reduces the number of flush operations and the data corresponding to the cache through M state. It does not need to ask other caches whether they have corresponding cache lines. On the other hand, it also reduces the number of refill operations (of course, the valid bits can also pass. The refill method achieves the same purpose).

We can notice that if a cache line of a cache is in S state, then there are two cases:

  • This cache monopolizes this cache line. In this case, the modification of this cache line can be done directly without notifying other caches.
  • This cache shares this cache line with other caches. In this case, the modification of this cache line must notify other caches.

Based on this, the MSI protocol can be further evolved.


4. MESI (Modified-Exclusive-Shared-Invalid)

For the exclusive S state mentioned above, the MESI protocol introduces a new state E, and the number of states is thus also changed to four, where I state remains unchanged, M state is adjusted, and S state is split (copy From wikipedia):

  • Modified (M): The cache line is present only in the current cache, and is dirty; it has been modified from the value in main memory. The cache is required to write the data back to main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. The write-back changes the line to the Exclusive state.
  • Exclusive (E): The cache line is present only in the current cache, but is clean; it matches main memory. It may be changed to the Shared state at any time, in response to a read request. Alternatively, it may be changed to the Modified state when writing to it.
  • Shared (S): Indicates that this cache line may be stored in other caches of the machine and is clean; it matches the main memory. The line may be discarded (changed to the Invalid state) at any time.
  • Invalid (I): Indicates that this cache line is invalid (unused).

The mutual conversion of M and E depends on whether the value write and flush back operations. The value write operation will cause the cache line to change from E state to M state; the flush back operation will cause the cache line to change from E state to M state. Of course, the value write on M state will hold Mstate; the flush back on E state will remain E state (however, this should not happen.)

The read/write/evict rules for MESI are as follows:

Read Rules :

  1. if “NOT” hit(Miss or in I state), ask other cache if there is cache line with M, E or S state
    • if M state cache exists, ask the M state cache to flush back, then both refill with S state, current cache may get the value from M state cache directly or from memory.
    • if E or S state cache exists, both refill with S state, current cache may get the value from E or S state cache directly or from memory.
    • if neither other M, E, S state cache exists, refill with state E
  2. if is in M, S or E states, directly read the value from cache.

Write Rules :

  1. if “NOT” hit(Miss or in I state), ask other cache if there is cache line with M, E or S state
    • if M state cache exists, ask the M state cache to flush back, current cache get the value from M state directly based on bus between caches. Then M state cache change the cache line state to I while current cache merge the cache line value from M and mark it with M state.
    • if E or S state cache exists, ask them to change into I state, then mark the cache line as M state
    • if none of above state exists, just mark the cache line as M state.
  2. if hit and is in M state, write value and just leave it as M state.
  3. if hit and is in E state, write value and change it to M state.
  4. if hit and is in S state, ask other S state cache to be I state, write value and change to M state.

Evict Rules :

  1. if cache line is in M state, write back the cache line into memory
  2. if cache line is in E, S or I state, just remove the cache line

The following figure is a flow chart of the above text description rules, which is clearer than the text (refer to the Wikipedia entry):


Although only one state has been added, it can be seen that MESI's Read/Write/Evict Rules has become very complicated with respect to MSI. The main purpose of this design is to minimize the Cache memory storage operation, and therefore requires a large number of cached message delivery broadcasts and the value of the transmission and reception (such as determining how many copies of the cache line, and the direct value of the transmission between the cache) This also drives mainstream vendors to gradually introduce complex cache coherence techniques to speed up cache interactions (even unified management of Memory interactions).

Take Intel and ARM as examples:

The above figure is a block diagram of the ARM A57/A53 big-LITTEL component interconnect. It can be seen that the design of the cache/memory part of modern multi-core CPUs has become more and more complicated in order to meet the requirements of cache coherence between multiple cores.

Finally, similar to MSI, for a cache line, some states are unlikely to coexist, such as M and E, that is, it is impossible to have a cache line containing M state in one cache, and another cache. It contains the same cache line, but its state is E. The specific state coexistence is as follows:

It can be considered that the coexistence relationship of MESI is to expand the MSI diagram one step further.

5. Further joining status

As you can see, following the MSI -> MESI step, you can further subdivide some states, or adjust some states, then you can get a more complicated (but not necessarily more efficient, cache interaction or A protocol that requires cost or a change, such as:

  • MOESI Prototcol : Subdivide S state, adding an Owner(O) state to indicate that a cache line is multi-cache share (S state), but only one cache has the right to modify (O state). All O-initiated modifications will then flush back (or not to be Mstate) and notify those S state caches to get the new value.
  • MOSI Prototcol : Replace E state with Owner(O) state. O state is similar to MO's O, and it is not exhaustive.
  • MESIF Prototcol & MERSI Prototcol : Added Foward(F) and Recent(R) states respectively, see wikipedia.

It is worth mentioning that different levels of Cache may use different protocols, such as ARM chip L1 using MOESI Protocol, and L2 using MESI Protocol. :


The SCU uses hybrid Modified Exclusive Shared Invalid (MESI) and Modified Owned Exclusive Shared Invalid (MOESI) protocols to maintain coherency between the individual L1 data caches and the L2 cache. The L1 data caches support the MESI protocol. The L2 memory system contains a snoop tag array that is a duplicate copy of each of the L1 data cache directories. The snoop tag array reduces the amount of snoop traffic between the L2 memory system and the L1 memory system. Any line that resides in the snoop tag array in the Modified/Exclusive state belongs to the L1 memory system. Any access that hits against a line in this state must be serviced by the L1 memory system and passed to the L2 memory system. If the line is invalid or in the shared state in the snoop tag array, then the L2 cache can supply the data.

In addition, the state does not necessarily lead to high efficiency, and at the same time, the entire cache / memory system design and implementation will be more complex. Adding a state introduces additional state transitions and situations that are exponentially increased, so vendors generally choose the most appropriate (such as the most cost-effective) implementation.

6. The last word

In physics, there is an interesting assumption about the intersection of particle physics (minimum) and the concept of the universe (maximum): the interior of an atom is a universe, and our universe is just the interior of another atom. . This is also the so-called genre of physics.
The serpent (Ouroboros transliteration Ulopolos, Greek : οὐροβόρος, also known as the biting snake) is a symbol that has been passed down from ancient times , roughly as a snake (or dragon ) is swallowing its own tail , resulting in the formation A ring (sometimes also shown in a twisted shape, the shape of the Arabic numeral "8"), whose name is "self-devourer."

For the computer science community, the same breeder also exists. For example, the Coherence described in this article is similar to the Coherence model in the distributed model: one is the kernel world (very small), and the other is a vast network (maximum), which is similar in some respects.

Reference link: