Backtrace provides a turn-key debugging platforms for capturing application state at time of error, whether it is a run-time one or a fatal crash condition. One of the first areas we have focused on since starting the company is improving the foundational technology behind modern symbolic debuggers. This is being achieved with a new debugger designed for automated analysis, that is called ptrace
.
This articles provides a brief overview of the features of the Backtrace debugger. It is the recommended mechanism for integrating x86-64 applications and kernels running on Linux, FreeBSD and IllumOS. For other platforms, Backtrace provides native support for minidump and integrates seamlessly into popular crash reporting libraries such as Google Breakpad and Google Crashpad.
Motivation
The Backtrace debugger improves root cause investigation of application errors in natively compiled software. The design and development was motivated by several key problem areas with existing crash analysis and aggregation technologies including: data access and size, performance and mean time to resolution.
DATA ACCESS AND SIZE
Many modern systems rely on raw memory dumps for the purposes of capturing application state and is a central part of the post-mortem workflow. A core dump includes vital information such as]application memory, allowing a symbolic debugger to reconstitute variable values, callstacks and more.
Unfortunately, core dumps are large. For a lot of workloads, it is not feasible to ship these core dumps off to other systems or even generate one. Backtrace customers have workloads ranging from 24GB of RSS to 500GB+.
In order for a traditional symbolic debugger to reconstitute application state in human readable form, a core dump also requires access to the right debug symbols. Infrastructure is required to manage and download these symbols, or otherwise, developers are required access to the particular machine with the core dump.
This infrastructure requires engineering time for development and maintenance. Absent this infrastructure, developers require direct access to faulting machines, which represents a security risk and is an inefficient process.
Backtrace solves this with a self-contained structured snapshot format that has no reliance on debug symbols after snapshot generation. The size of the snapshot format is orders of magnitude smaller than a typical core dump. It is also possible to generate a snapshot of live processes, foregoing core dump generation entirely. The Backtrace debugger performs automated analysis on the state of the application and will heuristically attempt to include the variables relevant to the fault.
PERFORMANCE
There are two components impacting performance of application recovery in a post-mortem state. One is memory dump generation and the other is symbolic analysis for extracting a callstack and application state.
Let us compare the performance of the Backtrace debugger to that of GDB and LLDB on a complex C++ project, such as Google Chrome. In this experiment, we are using Chromium 35.0.1916.144 with 466 mapped segments and 1 thread. There is approximately 2.6GB worth of debug data in a single executable here. We will request a backtrace of a running process.
GDB takes 2.6GB of resident memory and 54 seconds.
LLDB takes 3.0GB of resident memory and 130 seconds.
Backtrace takes 0.46GB of resident memory and 00.61 seconds.
This demonstrates complexity as the size of debug information scales. Performance is also affected as the number of memory segments and threads scale. Below is a comparison of Backtrace with and without variables (bt
and bt-nv
respectively) compared to GDB, LLDB and Glider below.
The memory dump generation process can be avoided all-together by having Backtrace snapshot live processes, only generating a full dump on disk if necessary. As far as debugger performance is concerned, the Backtrace debugger is orders of magnitude faster than industry-standard debuggers such as GDB and LLDB. This performance allows for faster recovery times and enables the debugger to perform additional analysis.
MEAN TIME TO RESOLUTION
Your traditional symbolic debugger relies on the engineer asking the right question in order to identify the root cause. There are many important details in post-mortem state, that if identified, greatly accelerate root cause investigation. Unfortunately, if application state is complex or domain expertise is lacking in an area, it is easy to miss these details.
The Backtrace debugger automatically analyzes variables, memory, executable code and more in order to highlight important that reduces time to resolution. Learn more in the automated analysis article located here.
TRIAGE AND PRIORITIZATION
Backtrace will automatically analyze application memory, executable state and more to highlight important clues on variables, registers and other process state. These in turn will inform classifiers such as security
and more, in order to help better understand impact beyond simple deduplicated error counts. For example, perhaps a unique crash has 1 occurrence in a month but is actually a security problem. Classifiers help ensure this crucial signal is not missed.
By attaching classifiers to faults, Backtrace allows you to prioritize beyond simple deduplicated counts, factoring in the potential risk factor of a bug.