Your Ad Here

Researchers boost multi-core CPU performance with better prefetching

Researchers boost multi-core CPU performance with better prefetching Piling on cores is a technique to lift performance, but it is not necessarily the best way — researchers at North Carolina State University have developed a brand new prefetching technique for processors that can boost performance by as much as 40-percent. As you will know, any data not stored in a CPU’s cache have to be pulled from RAM, but as more cores are added they could create a bottleneck by competing for memory access. To counter this designers use prefetching to foretell what information might be needed and grab it previous to time, but guessing wrong can hurt performance. Researchers tackled this problem from two fronts: first, by making a better algorithm for divvying up bandwidth, and second, by selectively turning off prefetching when it would slow the CPU. Full PR and an abstract of the study being published June 9th are after the break.

Show full PR text
New Bandwidth Management Techniques Boost Operating Efficiency In Multi-Core Chips
For fast Release

Release Date: 05.25.2011
Filed under Releases

Researchers from North Carolina State University have developed two new techniques to assist maximize the performance of multi-core computer chips by letting them retrieve data more efficiently, which enhances chip performance by 10 to 40 percent.

To try this, the hot techniques allow multi-core chips to address two things more efficiently: allocating bandwidth and “prefetching” data.

Multi-core chips are meant to make our computers run faster. Each core on a chip is its own central processing unit, or computer brain. However, there are things which can slow these cores. As an instance, each core should retrieve data from memory that’s not stored on its chip. There’s a limited pathway – or bandwidth – these cores can use to retrieve that off-chip data. As chips have incorporated progressively more cores, the bandwidth has become increasingly congested – slowing down system performance.

Among the easy methods to expedite core performance is known as prefetching. Each chip has its own small memory component, called a cache. In prefetching, the cache predicts what data a core will need sooner or later and retrieves that data from off-chip memory before the core needs it. Ideally, this improves the core’s performance. But, if the cache’s prediction is incorrect, it unnecessarily clogs the bandwidth while retrieving the inaccurate data. This actually slows the chip’s performance.

“The 1st technique will depend on criteria we developed to choose how much bandwidth must be allotted to every core on a chip,” says Dr. Yan Solihin, associate professor of electric and computer engineering at NC State and co-author of a paper describing the research. Some cores require more off-chip data than others. The researchers use easily-collected data from the hardware counters on each chip to see which cores need more bandwidth. “By better distributing the bandwidth to the proper cores, the factors may be able to maximize system performance,” Solihin says.

“The second one technique is determined by a hard and fast of criteria we developed for determining when prefetching will boost performance and will be utilized,” Solihin says, “in addition to when prefetching would slow things down and will be avoided.” These criteria also use data from each chip’s hardware counters. The prefetching criteria would allow manufacturers to make multi-core chips that operate more efficiently, because all the individual cores would automatically turn prefetching on or off as needed.

Utilizing both sets of criteria, the researchers were ready to boost multi-core chip performance by 40 percent, when compared with multi-core chips that don’t prefetch data, and by 10 percent over multi-core chips that often prefetch data.

The paper, “Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors,” would be presented June 9 on the International Conference on Measurement and Modeling of Desktops (SIGMETRICS) in San Jose, Calif. The paper was co-authored by Dr. Fang Liu, a former Ph.D. student at NC State. The research was supported, partly, by the National Science Foundation.

NC State’s Department of electric and Computer Engineering is a part of the university’s College of Engineering.

-shipman-

Note to Editors: The study abstract follows.

“Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors”

Authors: Fang Liu and Yan Solihin, North Carolina State University

Presented: June 9, 2011, on the International Conference on Measurement and Modeling of Computers, San Jose, Calif.

Abstract: Modern high performance microprocessors widely employ hardware prefetching to cover long memory access latency. While useful, hardware prefetching tends to aggravate the bandwidth wall, an issue where system performance is increasingly limited by the supply of off-chip pin bandwidth in Chip Multi-Processors (CMPs). On this paper, we advise an analytical model-based study to analyze how hardware prefetching and memory bandwidth partitioning impact CMP system performance and the way they have interaction. The model incorporates a composite prefetching metric which can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes under consideration prefetching effects, and a derivation of the weighted speedup-optimum bandwidth partition sizes for various cores. Through model-driven case studies, we discover several interesting observations that may be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance should be achieved by selective prefetching, guided by the composite prefetching metric, coupled with dynamic bandwidth partitioning.

Source

  • Twitter
  • Facebook
  • email
  • PDF
  • Digg
  • del.icio.us
  • Google Bookmarks
  • RSS

This post is tagged: , , , , , ,

Leave a Reply





  • Drexel University turns to 3D scanners, printers to construct robotic dinosaursDrexel University turns to 3D scanners, printers to construct robotic dinosaurs

    3D printers, 3D scanners and robotics are frequently good enough all alone to get us inquisitive about something, but a team of researchers at Drexel University have played a further big trump card with their latest project -- they've thrown dinosaurs into the mixture. As you can most likely surmise, that project involves using a 3D scanner to create models of dinosaur bones, that are… »
  • TiVo releases Q4 results, announces transcoder and IP set-top box at the wayTiVo releases Q4 results, announces transcoder and IP set-top box at the way

    Today TiVo announced its earnings for Q4 2011 and the whole year, but the best note was word of some new boxes at the way. From the sound of factors, the corporate will deliver a four stream transcoder akin to the only we saw demonstrated at CES (pictured above) in a position to dispensing video to multiple devices (phones, tablets, etc.) inside the home simultaneously. Also at the… »

Categories

Subscribe

Enter your email address: