Platform Overview
This document gives a brief overview of the Backtrace Debugging Platform and its components. If you want to start integrating the platform, please see the Getting Started menu above.
Backtrace is a post-mortem crash debugging platform that helps you triage and fix bugs more effectively in applications and operating systems. When your application fails, Backtrace takes a snapshot of the faulting component and surrounding environment at blazing speeds, analyzes it and archives it in a centralized object store.
At a high-level, Backtrace includes:
- An incredibly fast and extensible snapshot generator to capture critical data on-demand, even at the time of fault.
- A set of analysis modules to extract and highlight anomalous behavior like stack-overflow, null deference, heap mismanagement issues, and much more.
- An object store to aggregate and collaborate on data captured across your environments.
- Workflow integrations that plug the rich data set captured into workflow tools like Slack and JIRA.
- Web and Terminal interfaces to visualize and deep dive into the data captured.
Below is a simple architectural diagram of the Backtrace Platform.
These components bring together a system that allows teams to easily track, prioritize and act on incidents as they happen. Backtrace easily supports software whether it's provided as a service, shrink-wrapped and shipped, or bundled up with hardware as an appliance.
The following sections will give a deeper look into each of these components and how they come together.
Snapshot Generator
Backtrace's Snapshot Generator builds on top of our incredibly fast tracer to capture application state and the surrounding environments. The data is stored into a structured, self-contained format we call a snapshot. A snapshot contains:
- The stack-trace across all threads.
- Regions of memory backing reachable objects on the stack and heap.
- Requested global variables.
- Environmental information like virtual memory stats, CPU stats, process state and more.
- Any contextual meta data you choose. This includes things like data center, customer, version, and environment. Our snapshot format doesn't impose any restrictions on your meta data.
- Annotations and classifiers added by analysis modules to highlight anomalous behavior. You can easily ship your own modules using our LUA or C API.
The Backtrace Snapshot Generator works across a variety of environments and targets. It can attach to a live processes for on-demand snapshots, or generate a snapshot from a user space or kernel coredump (kernel coredumps are only supported on FreeBSD at the moment) The Backtrace Snapshot Generator runs on Linux, FreeBSD and OmniOS.
The chart below compares our snapshot generator to GDB and LLDB generating a simple stack-trace from Chromium. During this test, GDB and LLDB are simply generating a stack trace but The Backtrace Snapshot Generator is gathering all of the data listed above, performing automated analysis and serializing this data to disk.
Analysis Modules
Backtrace's Snapshot Generator takes advantage of our fast core libraries to run automated analysis to assist in root-cause investigation. Our snapshot generator comes with a set of analysis modules to pinpoint the cause of the crash and in some cases peripheral bugs that can cause crashes down the road. For example, our snapshot generator will automatically disambiguate the direction of the fault, highlight aliases of the faulting operands across all threads, check function constraint violations in commonly used code like memcpy and realloc, and annotate common heap mismanagement issues for popular memory allocators like jemalloc and ptmalloc.
We expose the same LUA and C API our analysis modules use to all of our users. You can use this functionality to specialize our snapshot generator for your software, codify common and expert debugging practices and bubble up application state crucial for incident response. There are some example output from our analysis modules in the Hydra section below.
Object Store
Snapshots are sent and aggregated into Backtrace's object store aka coroner. coroner makes the wealth of information across your snapshots easily accessible. You can issue queries to gain insights into the fault data across your entire system like which customers are being affected by certain types of faults, what versions are currently causing incidents and even generate a histogram that shows how frequent certain functions appear in the faulted thread's stack trace.
Web UI
Console, our web UI provides a bird's eye view of the data stored inside coroner.
Immediately when you log in, you are greeted with an activity summary across all of your software projects. You can click on each software project to inspect the different types of faults affecting it.
On the left, you'll see gathered meta data and automated analysis statistics along with controls to define sort order, filters, etc. On the right, you'll see the time range for each fault group along with a color box indicating activity. Groups with darker colors have seen more faults than those without.
More information about each group can be viewed by clicking on View Details. This page shows us specific instances of the fault with the option of graphing these instances on a jitter plot, line graph (# of instances over time) or a time-based heat map.
The left pop-out menu shows meta data statistics associated with the group and the right pop-out menu shows first and last occurrence, added classifiers, and faulted thread's stack trace (if one exists.) Snapshot instances for each group are listed below the graph. You'll notice on the right of each listing a small icon. Clicking on this icon will copy the command to pull down the corresponding snapshot.
Terminal UI
Backtrace's Hydra lets you view the internals of each snapshot in a powerful yet approachable terminal UI.
This is the initial view which immediately focuses on the faulting context. Hydra main outline includes four panes (top to bottom):
- Thread pane: list of threads in the process captured.
- Stack pane: stack trace for the thread select.
- Variable pane: variables for the frame selected.
- Peripheral pane: displays a variety of information based on a menu. This includes meta data and classifiers associated with the snapshot, kernel stack trace associated with the thread selected (Linux only), process memory map, registers, source code with integration into your favorite SCM, annotations from our automated analysis, systems statistics and much more.
Below are some examples of what viewing a snapshot in Hydra looks like. A more in-depth walk through of Hydra can be found here.
Workflow Integrations
Backtrace's Object Store plugs into your existing workflow tools like Slack and JIRA. The rich data set and automated analysis stored in the object store can be sent to these services every time a new fault or fault type is seen. An example of our Slack integration can be seen below.
Conclusion
Backtrace improves system availability and software quality by bringing automation to incident detection, response and resolution. The Backtrace platform automatically snapshots faulting applications and their surrounding environment, then analyzes and archives them in a centralized object store. Our technology captures incidents in unprecedented detail and makes this data accessible so that bugs aren't missed and fixed faster.
If you've already signed up for a trial and received your license number, you can start off by exploring our Getting Started menu above. Otherwise, please feel free to sign up for a trial on our website or reach out to us