The Perils of Parsing Integers in C

Parsing a string into an integer seems like one of the most trivial tasks in programming. In high-level languages like Python, a simple int("123") suffices. However, in C, this basic operation becomes a minefield of undefined behavior, silent failures, and counterintuitive results. For developers handling untrusted input, the standard library options are not just inconvenient—they are often dangerous.

The Standard Library's Broken Promises

To be considered "correct," an integer parser should ensure that the numerical value stored matches the string provided, that the entire string is consumed (no trailing garbage), and that any failure—whether due to an empty string or an overflow—is explicitly reportable to the caller.

C provides several ways to attempt this, but most fail these basic criteria.

The Failure of `atol()`

atol() is perhaps the most egregious example. It provides no way to signal an error. If you pass it "timmy", it returns 0. If you pass it an empty string, it returns 0. If you pass it a number that exceeds the limits of a long, the behavior is technically undefined according to POSIX and C standards.

As one community member noted, the design of atol() defies basic sanity:

"How could an api for number parsing ever be designed to return 0 for invalid input, for a function where 0 is also a common... return value for a valid input?"

The Complexity of `strtol()`

strtol() is the only standard C function that can be used correctly, but only with significant boilerplate. To use it safely, a developer must:

Manually check for empty strings.
Pre-set errno to 0 to detect range errors.
Check the endptr to ensure no trailing non-numeric characters exist.

While this works for signed longs, the unsigned equivalent, strtoul(), introduces a bizarre behavior: it accepts leading minus signs and converts them into large unsigned values via negation. For example, passing "-1" to strtoul() on a 64-bit system returns 18446744073709551615 ($2^{64}-1$). This makes it nearly impossible to distinguish between a legitimately large unsigned number and a negative number that was silently cast.

The Unreliability of `sscanf()`

sscanf() is often tempting due to its concise syntax, but it suffers from the same issues as strtoul(). It cannot distinguish between a value that is exactly ULONG_MAX and a value that is out of range. This ambiguity makes it unsuitable for any application where data integrity is paramount.

The C++ Alternative

C++ inherits many of these issues through std::stoul() and std::istringstream, which still allow negative numbers to be parsed as unsigned. However, C++17 introduced std::from_chars(), which finally provides a robust solution. It strictly enforces that only a minus sign may appear if the value has a signed type, and it returns a structured result (std::from_chars_result) containing both the pointer to the first unparsed character and an error code.

Practical Workarounds and Implementation

If you are restricted to C and need to parse unsigned longs safely, you cannot rely on the standard library alone. A common workaround is to use strtol() to check for negative values first, then cast the result to unsigned.

For a more comprehensive solution, a custom wrapper is necessary. A proposed robust implementation, strtoul_noneg, follows this logic:

inline bool strtoul_noneg(unsigned long* out, const char *nptr, char **restrict endptr, int base)
{
  if (!*nptr || isspace(*nptr)) return false;
  if (strtol(nptr, endptr, base) < 0) return false;
  
  errno = 0;
  *out = strtoul(nptr, endptr, base);
  if (**endptr || errno) return false;
  
  return true;
}

Why This Matters

Some may argue that these edge cases are rare or that "C is not the C standard library." While it is true that C gives developers the freedom to write their own utilities, the presence of broken defaults in the standard library leads to systemic vulnerabilities.

Incorrect parsing isn't just a bug; it's a security risk. If a program parses a bit field or an Access Control List (ACL) and silently converts a negative number into a massive positive one, it can lead to privilege escalation or memory corruption.

As the saying goes, "Knives should have handles." A programming language can be sharp and powerful, but its basic utilities should provide a safe way to hold them. When the standard tools for a task as simple as integer parsing are broken, the responsibility falls entirely on the developer to build their own safety rails.

The Perils of Parsing Integers in C

The Perils of Parsing Integers in C

The Standard Library's Broken Promises

The Failure of `atol()`

The Complexity of `strtol()`

The Unreliability of `sscanf()`

The C++ Alternative

Practical Workarounds and Implementation

Why This Matters

References

HN Stories

The Perils of Parsing Integers in C

The Standard Library's Broken Promises

The Failure of atol()

The Complexity of strtol()

The Unreliability of sscanf()

The C++ Alternative

Practical Workarounds and Implementation

Why This Matters

References

HN Stories

The Failure of `atol()`

The Complexity of `strtol()`

The Unreliability of `sscanf()`