yoy.be "Why-o-Why"

About WideChar and CharInSet...

2015-01-08 23:58  i3097  delphi  [permalink]

I posted this on Google+ a while ago, but didn't think of posting it here, so here it is.

I'm sorry, but I have to vent this: (Attention: what follows is a rant about something with programming in Delphi)
What's up with this?

[dcc32 Warning] W1050 WideChar reduced to byte char in set expressions.  Consider using 'CharInSet' function in 'SysUtils' unit.

CharInSet?! It contains Result:=C in CharSet; so it is actually just a tiny wrapper. But one that causes an extra jump, stack frame and return! Very wasteful of resources, and typically in places where I want iterations to go as fast as possible.

TSysCharSet, by the way, is apparently fixed to a set of byte chars. So I suppose some implicit conversion takes place. I wonder why SysUtils' CharInSet doesn't get a warning then? Not even one like this:
[dcc32 Warning] W1058 Implicit cast with potential data loss from 'WideChar' to 'Char'

But more importantly, I think it goes against the grain of the language. With Pascal being firmly rooted in the academic and mathematic, it saddens me deeply that I can not describe a set of WideChar literals. They're ordinal constants like any other, or am I mistaken?

In other words: if I take some old code where s is still just string, is it too much to expect s[i] in ['0'..'9'] to compile to something that checks if the 16-bit value falls between these two 16-bit limit values or not? As far as I know, it's easily done in both 32 and 64 bits machine code. (*1)

Oh, and one more thing. All of this is darkly overshadowed by this: http://utf8everywhere.org/ with which I most strongly agree. Regretfully I do a lot of work on a platform that has decidedly chosen for the 16-bit-character way of handling text, so I have to work with it up to where I can decide to do it otherwise. In practice this means my programming is a mix of WideString and UTF8String complexed by the ennerving equivalence of the latter with AnsiString.

*1: I know, I know, I should be using Unicode's IsDigit, but I have a lot of existing code for parsing script that uses a lot of while s[i] in [..] do inc(i); (*2)

*2: I know, I know, I should be using a lexer/tokenizer/compiler-compiler. See also: a lot of existing code.

*3: Skipping to the last paragraph are we? All right then:
tl;dr: I strongly regret that "WideChar reduced to byte char in set expressions"

https://plus.google.com/+StijnSanders/posts/LRqD794jd7q

twitter reddit linkedin facebook