Debug Information is Huge and What to do About It
Debug information is used by symbolic debuggers and the Backtrace platform to reconcile process executable state to the source-code form that is familiar to most software engineers. This information is responsible for mapping memory addresses and register values to function names, source code locations and describing variables.
Some environments choose to omit debug information. Of the many reasons, the typically valid reasons are disk utilization and arguably, intellectual property protection.
This article is a pragmatic introduction to debug information and provides insights on how to manage and deploy debug information.
Background
On systems such as Linux, FreeBSD, IllumOS and Mac OS, the underlying executable formats either reference external files with debug information or have debug information directly embedded. The respective file formats for these platforms express debug information in the following sections.
- .debug_aranges A mapping from memory addresses to function compilation units in .debug_info
- .debug_info The primary debug information section. This section contains function names, type, variable, inline information and more.
- .debug_abbrev Describes the shape of debug information in .debug_info
- .debug_line Source-code location information, described as a state machine mapping memory addresses to files and line numbers.
- .debug_line_str A new section introduced in DWARF 5 that contains strings that are used by .debug_line
- .debug_str Strings used by sections such as .debug_infoand .debug_line
- .debug_loc Location lists used to describe the location of values across ranges of memory addresses.
- .debug_loclists A more efficient representation of .debug_loc introduced in DWARF 5.
- .debug_ranges A mapping from memory addresses to debug information.
- .debug_rnglists A more efficient representation of `.debug_ranges, introduced in DWARF 5.
- .debug_types Type specifications with pre-computed type signatures, allowing for efficient merging of common types and omission of unused type information. Introduced in DWARF 4, merged back into .debug_infowith DWARF 5.
- .debug_pubtypes Maps type names to compilation unit debug information.
- .debug_pubnames Maps function names to compilation units debug information. and more...
To learn more about how these sections are used, refer to the additional reading section.
Performance Impact of Symbols
Debug information does not impact performance.
Some users are concerned that directly embedding debug information can result in performance degradation. This is not true. Debug information does not impact executable code or statically initialized memory size. The size of your debug information will generally not impact the performance of your application. Your operating system will only load sections of the executable that are required for execution.
For example, below is a list of sections loaded from our object store server executable during execution. This executable is a total size of 19MB.
$ pmap `pgrep coronerd|head -1`|egrep 'coronerd$'
0000000000400000 3244K r-x-- coronerd
000000000072c000 36K r---- coronerd
0000000000735000 252K rw--- coronerd
The size of the different sections of the file is presented in the graph below. Over 80% the 19MB executable is just debug information. However, only approximately 3.6MB of the file is loaded, corresponding to executable code, read-only data and statically allocated data (which includes mappings that don't correspond to data on disk).
Stripping the executable removes the debug information sections, symbol strings and more. This results in an executable size of 3.5MB. A process activated from this executable has the same virtual memory mappings as the one loaded from the unstripped executable.
$ pmap `pgrep coronerd|head -1`|egrep 'coronerd$'
0000000000400000 3244K r-x-- coronerd
000000000072c000 36K r---- coronerd
0000000000735000 252K rw--- coronerd
Some applications will explicitly wire pages of memory using facilities such as mlockall
. This also has noimpact, sections related to debug information will not be loaded into memory.
Reducing the Size of your Debug Information
Compression
Many debuggers, including our own, supports the notion of compressed debug sections. These sections are identified simply by a .z
prefix to debug section names. For example, .zdebug_info
rather than .debug_info
.
If you are using gcc
, you can enable this functionality by passing the -gz
flag. Learn more here. For other compilers, your objcopy
facility may also expose a --compress-debug-sections
option. Learn more here.
Below, we compress the debug sections of the coronerd
executable above.
$ objcopy coronerd --compress-debug-sections=zlib
This reduces the size of the executable from 19MB to 8.9MB. Your compression ratio will vary. This compression does have a performance impact for many debuggers but is usually a fraction of the cost of actually parsing the debug information.
Debug Fission
DWARF 5 and recent compilers support several extensions referred to as split dwarf, or debug fission. Learn more here.
Tools
The dwz
tool can be used to further reduce the size of your debug information further. It relies on uncompressed debug information. It works by replacing DWARF information with an equivalent smaller representation, where possible. Below, we invoke dwz
on the coronerd
executable from above.
$ dwz coronerd
This reduces the size of the coronerd file from 19MB to 16MB. This can be stacked with compressed debug sections.
$ objcopy coronerd --compress-debug-sections=zlib
This results in an executable size of 7.6MB, down from 8.9MB.
It is also possible to utilize dwz
on multiple files to achieve significantly higher compression ratios with stand-alone debug files.
Stand-Alone Debug Files
On Linux systems or systems with GNU strip, pass the --only-keep-debug
flag to strip
and use objcopy
to add a debug link to the stand-alone file. This will allow you to distribute your debug information independently of your executable. This has no impact on performance. Learn more here.
Removing Debug Information
Depending on your requirements, you may not require all debug information. For example, if you would only like accurate unwinding then you can retain only .debug_frame
and .debug_line
. The following objcopy
invocation removes debug information unrelated to unwinding and source-code locations for the coronerd
executable.
$ objcopy -R .debug_info -R .debug_abbrev -R .debug_aranges -R .debug_ranges -R .debug_loc -R .debug_str coronerd
This results in a 4.6MB executable, down from 19MB. The debugger is still able to unwind correctly but some inline functions may be missing as well as some function names (specifically, if no symbol is omitted for the function). However, the source code location themselves will be accurate.
You are able to use the other techniques mentioned in this article to further compress.