Duo Core Processors
and Multiple Caches

What is the hype all about

 

Doug Willoughby

February 20, 2007

 

Agenda

      What is Duo Core and what are requirements for use

      Description of Cache memory

      History of cache development

      Some recent Developments

      Some Duo Core Specs

 

What is Duo Core

      A Processor Chip that has two processing units instead of one.

      Built into a single package

      Can run two applications or two processes simultaneously

 

News Flash February 12, 2007

      Intel announced an experimental chip design with 80 cores to give enormous calculating power at low power requirements

 

Applications/Processes

      Consist of instruction sequences and data associated with each instruction

      Most are sequential; called a single thread

      Some applications can have multiple threads which run simultaneously.

      A single thread cannot take advantage of dual core; multiple threads can

 

Operating System Requirement

      Must implement task dispatcher that can handle:

      multiple threads of one application or

      multiple applications or

      multiple operating system processes or

      combinations of the above

 

Single Core Processing

      Task Manager implements preemptive multitasking

      One process or application runs until another higher priority task needs the processor; then switch occurs

      Can also switch if one process encounters delay for data (I/O from CD/DVD/Internet)

 

Hyper Threading Technology

      Running multiple single thread applications through a single processor sharing unused cycles

      Compromise technology between single and dual core technologies

 

      http://www.intel.com/products/processor_number/flash/demo.html

 

Duo Core Technology

      Running multiple single thread applications through two processors

      http://www.intel.com/products/processor_number/flash/demo.html

 

Of What Use is Dual Core

      None if running only one single thread application or process

      Not much if running multiple applications or processes with low processor utilization

      Great if you have multiple high processor utilization applications which can run simultaneously

 

News Flash February 19, 2007

      AMD announces new Barcelona Quad Core processor chip.

      Includes four cores with supporting circuits on one chip

      Intel Quad Core puts 2 dual core chips on same module (Woodcrest and Clovertown)

 

L1 and L2 Cache

      Why include a cache at all

      Cache is much smaller, much faster, more expensive per bit memory

      Interfaces to off-chip RAM

      RAM is much larger, much slower, cheaper per bit memory

      Because of Locality of Reference combo appears as faster memory at the lower cost

News Flash Feb 14, 2007

      IBM announces a breakthrough that allows substitution of eDRAM in place of SRAM on the chips.

      Dramatically reduces space requirement on the chip for L2 cache and also L1 cache

 

Locality of Reference

      Theory that when applications operate, only a small kernel of instructions and data are required at any one time.

      If stored in a fast small buffer and only go to larger slower RAM if not in buffer, the combo would operate at an average speed closer to the speed of the buffer at a cost close to the cost of RAM

 

IBM Performance Evaluation

      Complex computer designs required complex simulations driven by instruction streams

      Streams were created by tracing real benchmark workloads

      Traces included addresses of instructions, instructions, addresses of data as well as data itself

 

Cache Development

      IBM Research extracted all the addresses from streams

      Confirmed that a small fast buffer could store most current instructions and data

      All instructions and data stored in slower RAM

      Effect was to appear that all data was accessible at near buffer speed if hit ratio in the buffer was 96 to 99 percent

 

Cache Development

      L1 cache was initially implemented in System 360 Model 85, (circa 1968) then System 370 smaller models

      Later instruction and data caches were implemented.

      Most later computer designs and chip designs incorporate L1 and L2 caches as well as data and instruction separation

 

Cache 1968 vs Today

      System 360 Model 85 > $1,000,000

      32K byte Cache 80 ns cycle time

      4M byte RAM 960 ns access time

      Magnetic Core memory

 

      Pentium 4 PC < $1000

      32K byte L1 Cache 3 cycle access

      4 M byte L2 Cache 12 cycle access

      1G byte RAM

 

Examples of Use

      Adobe Photoshop and Elements can run multiple threads.

      Web Page Servers

      Gaming applications

      Applications with high computing requirements

      Grid Computing applications