Memory Access Timing and System Speed

Using PowerPC 750 'G3' Upgrades

Newer Technology White Paper

 

OVERVIEW

The importance of memory access timing and the overall system performance in G3 upgraded computers is critical. There is a window of correct timing values which must be used to ensure the fastest performance.

Any access to DRAM involves a large number of wait states, so the trick is to use the time efficiently. To do this one needs to select bus frequencies whose period (1/frequency) divides nicely into the time required to access the memory. If you pick an unfortunate value, for instance one that requires 4.1 clock periods, you must set the memory timing to five clock periods to avoid violating memory timing. In this case, 0.9 clock periods are wasted resulting in slower system performance. There are certain "sweet spots" in bus timing that give the best performance - a faster or slower frequency reduces system performance.

This was not true in 604- and 604e-based systems with "look-aside" Level 2 caches. Since these run at a fixed ratio to the bus, faster was always better, DRAM timing was a very secondary effect.

 

DETAILS

Fast system bus speeds may cause a reduction in both performance and stability when upgrading a PowerPC 601, 603 or 604 based Power Mac to a G3 card based on the PowerPC 750 processor. In order to understand this, a basic understanding of the dual bus G3 architecture is necessary.

 

601 - 604e PROCESSORS AND CACHE

The PowerPC 601, 603 and 604 processors have one bus. The I/O, cache memory and system memory (SIMMs and DIMMs) all share this bus. These devices share the same physical connections for communication with the processor. By far the fastest device on the bus is the L2 cache * relatively small, high speed static memory. The cache controller predicts what data is likely to be needed from the slower DRAM based system memory and preloads this data to the faster cache memory.

A "cache hit" occurs when data requested by the processor is found in the cache. In the case of a cache hit, the data will be accessed at full bus speed, usually between 25ns (40 MHz bus) and 17ns (60 MHz bus). For a single bus system, increasing the bus speed gives better performance.

A "cache miss" occurs if the requested data is not found in cache memory and must be fetched from the slower system memory. In a single bus system (PowerPC 604e and earlier), the fast cache memory along with the slower system memory and other slower I/O devices are connected to a common bus. This mismatch in device speed and bus speed is handled by adding "wait states" when accesses are made to the slower devices. Once a bus cycle has started, wait states are added by holding the state of the bus for one or more additional bus clock cycles.

The likelihood of any particular access being a cache hit or miss will vary depending on the cache load algorithm and on the application itself. A typical system will usually average a 70% to 80% cache hit rate making fast cache accesses the majority of bus activity.

 

G3 PROCESSORS AND CACHE

G3 processor upgrade cards based on the PowerPC 750 processor, have two buses. One for the traditional I/O and system memory, and a second bus dedicated for L2 cache only. Because the second bus is dedicated to high speed memory, it allows much faster L2 cache accesses than possible a single bus system. This high speed memory is soldered directly to the processor upgrade card enabling access times of 3ns to 8ns. This eliminates the need for a fast system bus clock because the L2 cache is no longer on the system bus.

 

MEMORY ACCESS

Typical system DRAM memory is designed for either 60ns or 70ns access times, and wait states are required to access this memory. A faster system bus clock may actually reduce performance due to the required additional wait states. For best performance with a dual bus system, it is more important to match or sync the system bus frequency with the speed of the system memory. Wait states can only be added in increments equal to the period of the system bus clock frequency. There is also fixed overhead time, due to system components, of about 5ns which must be accounted for. Optimum performance is achieved by selecting a system bus clock where a given number of wait states minus 5ns will access system memory closest to the access time rating of the installed memory modules. This must be done without violating the memory timing margins. For 60ns memory, the best bus clock is around 45 MHz with two wait states. Increasing the bus much beyond 46 MHz will require an additional wait state, reducing performance.

Because wait states can be confusing, the MAXpowr G3 processor upgrade comes with a Control Panel which allows the user to specify the slowest speed of the memory installed. The hardware and software automatically configure the card to use motherboard system memory for best performance. Setting the system bus to something other than the mid-40MHz factory default compromises the ability of the Control Panel to optimize for best performance.

Operating the bus at higher clock rates not only reduces memory performance but may cause other problems. The PowerPC 750 processor has narrow timing margins as compared to the 601, 603 and 604 processors which these systems were designed for. There is much less timing margin to push the system bus faster, and as mentioned above, there is no performance advantage.

New computers use a new kind of DRAM called Synchronous DRAM (SDRAM). These memory modules are optimized for moving blocks of data at full bus speed taking full advantage of the faster clock. They do however require several wait states for random data access. (MAXpowr G3 upgrade cards are designed for the existing Macintosh and MacOS compatibles which use either Fast Page Mode (FPM) or Extended Data Out (EDO), requiring wait states as explained above.)

All upgrade card manufacturers must optimize the bus speed and system memory timing for best performance. Inadvertently pushing the system bus without making the necessary corrections to the system memory timing can over-clock memory, resulting in unreliable operation and/or data corruption.

Not all benchmark tools indicate a performance increase from faster memory access, due to the high cache hit rate of these applications. However, real world applications benefit from optimized system memory timing.