IDF Spring 04: Day 3: Keynote

Topic: Enabling the Era of Tera

The future problems which will face the industry is the Processor Power usage, the memory latency, RC delay will curve the speed of the speed increase.

Intel believes that there needs to be an architectural paradigm shift. The current approach is not working, and a fresh approach is needed. The next phase needs to borrow from high performance computing architectures and bring that technology to the masses. The next evolution is going to be from dual cores to multiple cores, while understanding that the changes in the architecture will also require changes outside of the processor to best utilize the increased performance. This includes modifications to size of cache and how cache is addressed, and also modifications to remove bottlenecks at locations in the system. One such bottleneck is memory latency.

Memory latency accounts for half of the total execution time and a key in increasing performance is in reducing latency. One such tool that can be used is the implementation of ‘Helper Threads’. Helper threads sit on the memory bus when it is idle. These threads sit ready to be called when the program needs the information. The helper thread warms up the caches and reduces cache misses. This increases efficiency (up to 8.9%) with the use of the new compiler.

The three important factors are that the solutions need to be adaptable to the platform, including the architecture. Intel wants to have re-configurable Micro Architecture and be able to configure the systems to optimize performance.

Since this change Intel have introduced the reconfigurable PLA and the ability to be able to do GPRS, WiFi on the same silicon has key advantages.

The next adaptive change is that of the package process change – the line rates increase, and the process requirements are needed. For example it takes 21,000 clocks to do a 1KB IP packet analysis on a windows package and 55% of this is spent on system overhead.

Intel has done Network stack affinity with the TCP/IP in the core to reduce the software overheads. The DMA causes latency within the system – direct CPU access will reduce from 3 to 25 memory accesses per packet. This has reduced IP packet processing has dropped to 2,100 clocks, a tenfold improvement over the previous configuration. This gives a Xeon configuration 10GBIT/s network possibility as of today.

The final solution will introduce ‘Multi-Everywhere’ processing – multi-core processors with multi-enabled board layouts that will allow for revolutionary performance increases as opposed to the ‘speed bump’ game of the past.

IDF Spring 04: Day 3: Keynote

Topic: Enabling the Era of Tera

Related Reading

MY HEXUS

EVENTS

INDUSTRY PRESS RELEASES