Sunday, April 4, 2010

Perl and The Time It Owes Me

These days I'm working on a my final Verilog project, where I create a simulator for a basic Verilog syntax (netlist only) in Verilog.

The first stage in creating such simulator is parsing the input .v file into two data structures that contain the netlist and the list of cells in the file. This code was given to us by the instructor, and it's written in Perl.

To gather more knowledge about our design, I've decided to create a Python Verilog simulator, similar in concepts to the one we're going to create and verify that it's behaving like the well-known software simulator that we're working with.

After fixing some small bugs, I've noticed that my code generates some events in one time-unit less than the real simulator. After checking my code again and verifying that no bugs lie there I checked the tables that were created by the Perl script that my instructor wrote, and there it was: one line of a bad parsed data.

The parser takes a line that looks like "bla bla bla 1.974 bla bla bla", parses it using a Regular Expression and converts the number to a floating point number (which is always in the format d.ddd) to integer by multiplying it by 1000 and then formatting it using sprintf() (casting to int using the int() function yields the same results).

Looks fine, ain't it? That's what I thought too, but it seems that there's a floating point error that causes a lot of trouble for specific numbers, like 1.007. Then after conversion you get 1006 and some temporary madness.

If you don't believe me, check the following code:

$num = 1.007;
print "Floating number: $num\n";

$newNum = $num * 1000;
print "Possibly integer: $newNum\n";

print sprintf("Real integer: %d\n", $newNum);

Say it's a bug, say it's a feature, I don't care.

This was tested under Perl 5.10.0 under Linux and Windows.

Update: to clear this out, the problem lies in rounding of floating point numbers, and probably exists in every implementation that uses the IEEE representation format. The solution as I see it right now is to use string formatting "%.0f" to convert the floating point variable to string with rounding and not truncation (that's what casting to integer does) and then convert it from string to integer.

Thanks goes to lorg and Michael for clarifying the real facts.

Update 2: it seems that this "feature" doesn't apply to C, because these numbers can be represented in the regular IEEE standards. Now it's a bit weird to me that Perl and Python act exactly the same (any suggestions?).

6 comments:

lorg said...

Well, that's a property of floating point numbers and how they are represented. For example, in Python:

In [1]: x = 1.007

In [2]: print x
1.007

In [3]: y = x*1000

In [4]: print int(y)
1006

The problem seems to me that it uses floating point numbers for parsing something that doesn't behave like a floating point number. A decimal would probably be better, or just using an integer.

StatusReport said...

So I should change the title of the post to "IEEE and The Time It Owes Me" instead. Still, I would expect at least some kind of conversion warning here (it's easy to see it when looking at the mantissa, I'm pretty sure), don't you think the same?

I solved it by a very ugly hack - treated the number as a string, removed the dot and then converted the string to integer. Ugly but it works (and I don't need something good looking here).

Michael said...

At which point did you expect a conversion warning? The loss of precision happens already at the
"x = 1.007" point. I don't think a warning of "x can not precisely represent the assigned value" would be very beneficial there, since that would be true for almost any assignment of that sort.

So I'm sorry, no sympathy here on this one. It's your responsibility to use data types as God (or IEEE) intended.

StatusReport said...

If not issue a conversion warning, at least do not mask the floating point errors caused. Perl does hide it by displaying 1007 after multiplication. Python, however, displays 1006.9999... which is the appropriate floating point value.

Michael said...

It's not masking an error, as such. You need to ask yourself what a user (that is, programmer) usually wants when she prints out a floating-point value.

The modern convention is that the user wants a rounded value. So if the precise value is something like 1006.999999994267, then as long as you're rounding to less than some number of digits after the decimal point, you'll get 1007.00... which will be printed as 1007. Perl, by default, rounds to 6 digits. I believe that's the right thing - the number printed really ought to be 1007. If you want something else, like the precise value, or rounding to some other number of digits - specify it explicitly.

The problems start when instead of using a function that rounds, you use one that truncates. For instance, that's what C's numeric type casts do. And that convention is naturally followed by int(). So, if you want to blame someone, blame K&R.

Regarding sprintf... well, the perl sprintf also explicitly tries to emulate the C behaviour. But in C, if you try to print out a floating point value with %d, you'll get nonsense. So the closest emulation they thought of was probably C-style casting (with int()) and then C-style printing. I haven't checked, but I have the feeling that if instead of %d you use the proper format specifier ("%.0f") you'll get the right thing.

StatusReport said...

I agree with what you said, and yes, "%.0f" does solve the problem.

Actually what I truly needed is to use "%x", which probably converts to integer and then to hex representation, so the workaround here is to use "%.0f" and then convert it to whatever you'd like.