Why is python rounding wrong?

New python programmers discover something strange the first time they use the built-in round function:

$ python -c 'print(round(0.5), round(1.5), round(2.5))'
0 2 2

Did you expect to see “1 2 3” instead? This isn’t a bug. Python uses a different rounding methodology than the “half up” most of us were taught in school. There are a bunch of ways to round numbers, and Python chose the “half even” approach which means round 0.5 to the nearest even whole number. So 1.5 becomes 2 and 2.5 becomes 2, both rounded towards the even number 2.

The argument for this as “the right round” is that it’s more statistically valid; i.e., where HALF_UP increases the discrepancy (as defined below) over longer lists of numbers, HALF_EVEN does not.

    discrepancy = | sum(round(x[1..n])) - round(sum(x[1..n])) |

This difference between the sum of the rounds and the round of the sum increases with more numbers for HALF_UP because each instance of rounding up can only bias the result up a little. With HALF_EVEN, on average half the 0.5’s will round up and half will round down, so they balance out over the long run. Looking at the numbers from our example code above, we can see the problem:

Note that the discrepancy calculation uses the same rounding type for the “round of the sum”. The problem is even more apparent when comparing the sums directly.

Clearly HALF_EVEN is providing a better result for downstream calculations. Is this upward bias with HALF_UP an issue for most rounding use cases? Is it enough of a problem to change expected behavior of a function common across many tools and languages? For me personally, no.

If I reach for round() in a python, I’m probably displaying a number after a single calculation or aggregating a relatively small set of numbers. There’s an audience for those numbers, and just like me those people are expecting rounding as HALF_UP unless told otherwise.

When I’m aggregating numbers within a single program run, I’m much more likely to round the sum than sum the rounds to decrease error: rounding by definition loses information and is one of the last things you should do in a calculation chain.

I might hit the discrepancy problem when calculating lots of numbers, rounding them to store in a database, and then others performing aggregates on the stored results. However, those numbers are usually amounts of money, so the rounding rules are defined by currency and accounting standards instead of statistical concerns.

Surprise! It’s A Matter of Trust

If there’s no one “right” kind of rounding, and python had to choose a default behavior, why is HALF_EVEN controversial and–dare I say–WRONG? It violates the principle of least astonishment.

Our K12 education, most other programming languages, number factories like Excel and Google Sheets–they all round HALF_UP by default. How many people who discovered python’s choice in the middle of a late night programming marathon said “Wow, that’s just what I wanted!” versus “Wait. What? Why? WHY?!”. I was definitely in the latter camp.

I already have some issues with python because it does things differently. When switching among all the tools in my programming language toolbox, the vast majority of them are based on the C programming language. Python isn’t based on C, so there’s always some pain going back and forth–especially if I spent a long time in one language for a work project and switch for a new project or client.

Adding to my switching consternation are fashion-fad changes like ELIF (OMG I hate that as much as the admittedly older and didn’t-know-better IF/THEN/FI of shell scripting). ELIF never feels right. I look at it and think, what happened in Guido’s programming childhood that he felt he *had* to change this little bit of syntax?

So the fallout of the default round() decision for me is that I trust python less. It takes me out of solving the problem by having to struggle with the tool and ask myself, “Is this going to do what I expect?”

The Right Round() For You

Rounding is like any other tool: You need a good toolbox with enough options to choose the right one for the job. Java provides EIGHT kinds of rounding, originally enumerated in BigDecimal (Java 1.5) but now available as the literal enum RoundingMode. My favorite option is UNNECESSARY: The rounding mode is a property of the math context rather than an argument to a function, and trying to round a number in that context throws an ArithmeticException! No rounding for you!

Being a “once burned, twice shy” kind of programmer, I tend to avoid round() in python. I’m also usually neck-deep in Pandas DataFrame and Series objects which have their own rounding methods. That creates its own level of distrust because now I have several methods with the same name in different contexts. Do they all do the same thing or different things? I can never remember, so I either write little tests or I just throw rounding over the fence by displaying the unrounded-by-python numbers rounded in Excel.

If I have to round in python code, I might pinch off the rounding into a custom method where I TDD rounding cases so Present Me knows, Future Me knows, and maybe Future You knows if you end up in my code. “These numbers look wrong” is not the kind of surprise I want people to find when rummaging through my old bits.

When I care about numeric representation in Java, I’m probably using BigDecimal with a math context defining scale, precision, and rounding as a part of up-front data design. For python, I don’t have a go-to package for this kind of numeric control yet, so suggestions are welcome. Regardless of programming language, here are some things I consider when rounding:

  1. If I’m going to just display the rounded number, use HALF_UP because that’s what almost everybody looking at the numbers will expect.
  2. If I’m doing simple statistics with large numbers of rounded numbers, HALF_EVEN isn’t a bad choice. HALF_ODD works too. Hmm, I wonder if never having zero (even) as the last digit would help keep decimal places lining up in un-fancy display cases.
  3. If I’m expecting negative numbers, I’ll consider a few examples on either side of the number line just to be sure my code does what I expect–or at least what I think I’m telling it to do. Writing test cases forces me to think about it.
  4. If I’m working with dates and times, I’ll reach for ceiling or floor depending on how many weeks 8 days should be. (If those weeks are billable, I certainly have an upward bias!)
  5. If I’m going to show my python-rounded numbers and calculations to a bunch of mathematicians, quants, data scientists, or programmers with python-shaped axes to grind, I guess I’ll stare into the Rounding Wikipedia page for inspiration and wear my most comfy asbestos underwear.

Still not sure what round is right for you? Here’s the definitive RIGHT ROUND:

RIGHT ROUND baby RIGHT ROUND like a record baby RIGHT ROUND ROUND

The Other Side of the Argument

My friend Nicole over at Technically A Blog looks at the other side of this issue (link below). We decided to write contrasting blog posts after I felt obliged to vent at my very pythonic friend about it. Hopefully it will become a regular thing: let us know if there’s a particular (data/software-engineering-related) topic you’d like us to consider. You can always ping me on Twitter @thetmpfiles.

https://ntietz.com/blog/python-rounding/

Epilogue: Hearsain’t

FYI I had heard that rounding in python format strings uses HALF_UP which would have been extra egg on those scaled faces, but this code example demonstrates it using HALF_EVEN with python 3.10.7:

$ python -c 'print("%.0f %.0f %.0f" % (0.5, 1.5, 2.5))'
0 2 2

This is fine. Consistency builds trust–or at least the cold comfort of predictability.