Logs and Lifeguards: Using Chip Multiprocessors to Help Software Behave Corre...

Posted in Conferences, Companies, Science, Development on September 16, 2008

While performance and power-efficiency are both important, correctness is perhaps even more important. In other words, if your software is misbehaving, it is little consolation that it is doing so quickly or power-efficiently. Google has already done a very impressive job of addressing one of the reasons why software may misbehave, which is that the underlying hardware may fail. In the Log-Based Architectures (LBA) project, however, we are focusing on perhaps an even more challenging source of misbehavior, which is that the application itself contains bugs, including obscure bugs that only cause problems during security attacks. Software bugs are difficult to recognize, and they are particularly problematic because they may cause every node in the system to fail (unlike hardware failures, which tend to be more isolated).

To help detect and fix software bugs, we have been exploring techniques for accelerating dynamic program monitoring tools, which we call "lifeguards". Lifeguards are typically written today using dynamic binary instrumentation frameworks such as Valgrind or Pin . Due to the overheads of binary instrumentation, lifeguards that require instruction-grain information typically experience 30X-100X slowdowns, and hence it is only practical to use them during explicit debug cycles. Our goal is to reduce these overheads to the point where lifeguards can run continuously on deployed code. To accomplish this, we create a dynamic log of instruction-level events in the monitored application and stream this information to one or more lifeguards running on separate cores on the same chip multiprocessor (CMP).

In our results so far, we have shown that the basic logging approach typically reduces the slowdown by roughly an order of magnitude from roughly 30X to roughly 3X. In a recent ISCA paper, we demonstrated several hardware-based techniques that can eliminate redundancy in the even-driven lifeguards and reduce the slowdown to just 20%. In our ongoing research, we are attempting to achieve similar performance through software-only techniques (by extending dynamic compiler optimization techniques to eliminate redundancy within the lifeguards), and we are extending our support to parallel and concurrent environments. We believe that our techniques are applicable to any event-driven lifeguards that processes streams of events, and are compatible with sampling-based techniques that can further reduce the power and performance impacts of monitoring. This talk will describe the work that we have done so far, as well as our plans for future research.

Google Tech Talks
September 12, 2008

Speaker: Todd Mowry
Todd C. Mowry is a Professor in the Computer Science Department at Carnegie Mellon University. He received his Ph.D. from Stanford University in 1994. He currently co-leads the Log-Based Architectures project and the Claytronics project. Prof. Mowry recently served as the Director of the Intel Research Pittsburgh lab, and he is currently on sabbatical at Stanford. He is an associate editor of ACM Transactions on Computer Systems.

Watch Video

Tags: Techtalks, Google, Conferences, High Performance, Science, Computer Science, engEDU, Education, Google Tech Talks, Performance Monitor, Logs, Development, Companies