Fast Customization of Log Messages
Treating an integer as its individual bits is useful for manipulating a vector of booleans⧉, a trick known as bit flags or bit fields.
Generally when this is done, the integer is used solely for this purpose and bit fields are masked out, checked, and updated individually. However, there are still uses for reinterpreting the integer as its whole numerical value. I will go over a practical example.
Premise
One might have a logging API with options to customize the output: flags to include source line, file, severity, etc. Different configurations might look like:
// file, line row, message
"parse.cpp(8): null argument\n"
// severity, message
"warning: implicit cast is lossy\n"
// file, severity, code, message
"math.cpp: error 4015: division by zero\n"
It is useful to list out the settings we want. Here are the ones I will use:
Bit | Usage |
---|---|
00001 | File name |
00010 | Line number |
00100 | Row number |
01000 | Severity |
10000 | Warning/error code |
We will try to implement this flexible logging schema ourselves, with the settings being a single unsigned integer whose bits are treated as the flags listed above. Imagine a report structure with the incoming data and corresponding flag enumeration:
struct Report {
std::string Message;
std::string FileName;
std::string Severity;
int LineNumber;
int RowNumber;
int Code;
int Flags;
};
enum Flag {
FileName = 0x01,
LineNumber = 0x02,
RowNumber = 0x04,
Severity = 0x08,
Code = 0x10,
};
First Attempt
An ad hoc implementation would look something like using C++'s std::stringstream
and std::cout
or C#'s StringBuilder
class. Control statements and conditions would be used to build segments of the final message.
Start with the first setting listed, file name:
if (report.Flags & Flag::FileName) {
std::cout << report.FileName;
std::cout << ": ";
}
std::cout << report.Message << "\n";
// possible output:
// "foo.cpp: hello world\n"
Looks alright. Now, try to integrate line/row numbers:
if (report.Flags & Flag::FileName) {
std::cout << report.FileName;
std::cout << ": ";
}
std::cout << "(";
if (report.Flags & Flag::LineNumber) {
std::cout << report.LineNumber;
std::cout << ",";
}
if (report.Flags & Flag::RowNumber) {
std::cout << report.RowNumber;
}
std::cout << "): ";
std::cout << report.Message << "\n";
// possible output:
// "foo.cpp: (0,0): hello world\n"
Obviously, this implementation is starting to run into some issues.
- There is an extra colon between the file name and parenthesis.
- If there is a line number but no row number, there will be a trailing comma.
- If there is no line or row number, empty parentheses will be printed.
Problem #3 hints at a larger issue of missing or optional data. This will be addressed gracefully later. For now, we will focus on the rising complexity for each log message to support the custom schema.
Second Attempt
Conditionally building the punctuation around the report fields based on presence and settings is both tedious for the programmer and slow for the machine. Switching to modern C++20 output via std::format
, we can achieve something much better.
Note: Before C++20, the library {fmt}⧉ may be used as a standin.
Similar to listing the settings, let us list the fields from the report structure we will opt into substitution:
Index (n) | Field |
---|---|
0 | report.Message |
1 | report.FileName |
2 | report.LineNumber |
3 | report.RowNumber |
4 | report.Severity |
5 | report.Code |
With each field substituted with {n}
, write out the different combinations in a list:
/* 00000 */ "{0}\n"
/* 00001 */ "{1}: {0}\n"
/* 00010 */ "({2}): {0}\n"
/* 00011 */ "{1}({2}): {0}\n"
/* 00100 */ "({3}): {0}\n"
/* 00101 */ "{1}({3}): {0}\n"
/* 00110 */ "({2},{3}): {0}\n"
/* 00111 */ "{1}({2},{3}): {0}\n"
/* 01000 */ "{4}: {0}\n"
/* 01001 */ "{1}: {4}: {0}\n"
/* 01010 */ "({2}): {4}: {0}\n"
/* 01011 */ "{1}({2}): {4}: {0}\n"
/* 01100 */ "({3}): {4}: {0}\n"
/* 01101 */ "{1}({3}): {4}: {0}\n"
/* 01110 */ "({2},{3}): {4}: {0}\n"
/* 01111 */ "{1}({2},{3}): {4}: {0}\n"
/* 10000 */ "{5}: {0}\n"
/* 10001 */ "{1}: {5}: {0}\n"
/* 10010 */ "({2}): {5}: {0}\n"
/* 10011 */ "{1}({2}): {5}: {0}\n"
/* 10100 */ "({3}): {5}: {0}\n"
/* 10101 */ "{1}({3}): {5}: {0}\n"
/* 10110 */ "({2},{3}): {5}: {0}\n"
/* 10111 */ "{1}({2},{3}): {5}: {0}\n"
/* 11000 */ "{4} {5}: {0}\n"
/* 11001 */ "{1}: {4} {5}: {0}\n"
/* 11010 */ "({2}): {4} {5}: {0}\n"
/* 11011 */ "{1}({2}): {4} {5}: {0}\n"
/* 11100 */ "({3}): {4} {5}: {0}\n"
/* 11101 */ "{1}({3}): {4} {5}: {0}\n"
/* 11110 */ "({2},{3}): {4} {5}: {0}\n"
/* 11111 */ "{1}({2},{3}): {4} {5}: {0}\n"
Reinterpret each combination of bit flags as its entire numerical value. If we then sort the combinations in ascending order, a lookup table⧉ is created. The idea is simple: interpret the integer not as of bit flags but as of its numerical value and use the integer as an index into this lookup table.
Let's call the list of schemas above Schemas
. The implementation collapses to tidy terms:
std::cout << std::vformat(
Schemas[report.Flags],
std::make_format_args(
report.Message, // {0}
report.FileName, // {1}
report.LineNumber, // {2}
report.RowNumber, // {3}
report.Severity, // {4}
report.Code // {5}
)
);
Note: With C++26, this may be simplified even further with std::print
and std::runtime_format
.
Handling Optional Data
As mentioned in problem #3, if the user allows a particular field to be reported yet that field is missing for this particular report, one might want to handle this case gracefully.
If this is desired, the simplest solution is to filter the flags integer before it is passed to lookup. Unset each flag if the corresponding field in the report is missing:
int flags = report.Flags;
if (report.FileName.empty()) {
flags &= ~Flag::FileName;
}
if (report.LineNumber <= 0) {
flags &= ~Flag::LineNumber;
}
// ....
std::cout << std::vformat(
Schemas[flags],
// ....
This will increase complexity, but it is linear complexity. The code remains flat and maintainable.
Conclusion
Using a bit-field integer as an index is a fast way to obtain data based on the combination of bit flags currently set in the integer.
Here, it is used to format a log message in O(1) complexity. Another possible use is to calculate population counts⧉.