Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Zig's new bitCast semantics and LLVM back end improvements (ziglang.org)

161 points by kouosi 5 hours ago | 52 comments

zamadatix 3 hours ago [-]

This change + the existing packed struct logic will be great for working with bit packed binary headers w/o having to manually twiddle so much about the bit handling along the way.

allthetime 2 hours ago [-]

Zig is already great for this with ‘packed struct’ and arbitrary size ints. Allows for very clean protocol creation between systems with known properties. This is another great step in that direction.

ulbu 5 minutes ago [-]

you need different packed structs for little- and big-endian data. and casting with little-endian data is a nightmare - you need to reverse-cascade your struct fields to be in accordance with the little-endian bit-pattern. (or have a comptime function that does it for you, of course. but then you lose all declarations for the struct). what should be a simple writing down of a protocol is now a pedantic and error-prone ordeal.

ozgrakkurt 4 hours ago [-]

> Consider, for instance, bitcasting a [2]u8 to a u16. Under the old semantics, the result of this operation depends on the target endian: on big-endian targets, the first array element became the 8 most significant bits, whereas on little-endian targets, the first array element became the 8 least significant bits. Under the new semantics, because we only care about logical bit representation (which is endian-agnostic), the operation behaves identically on every target:

This is a huge mistake. You would never expect something like bitCast to do this.

I don't understand this approach. Why change something so simple and low level to be complicated and high level?

Just don't allow casting to u24, as it makes no sense unless you define u24 to be u32 sized as I think c standard does.

I think this approach as an idea is bad but at least just add another built-in that implements this higher level idea to not break a simple expectation and current behavior?

jjmarr 2 hours ago [-]

> Just don't allow casting to u24, as it makes no sense unless you define u24 to be u32 sized as I think c standard does.

The reason u32->u24 casting must be well defined is because some hardware (e.g. many GPUs, microcontrollers) only have floating point multipliers. A 24 bit unsigned integer (stored in a 32 bit register) can be losslessly converted to a 32 bit float by the hardware, multiplied, then converted back.

This is much faster than doing 32 bit multiplication in software, however, you still need to tell the compiler about this constraint.

ozgrakkurt 29 minutes ago [-]

I am criticizing the part where they allowed [3]u8 to u24 bitCast in the first place. It doesn't make sense logically as u24 is likely not 24 bits in any targets let alone portably on every target.

Interpreting u24 like it is actually 24 bits sounds like programming in crazy land since it is not 24 bits in any relevant architecture afaik.

They didn't allow []u24 with a similar rationale as far as I can remember. I agree with this as someone programming at this level should be able to understand there is no real u24 layout and they should use []u32. Going with the same magical rational they went with here, compiler should generate unaligned u24 loading code when you use []u24 since it is "logically 24 bits"

mathisfun123 39 minutes ago [-]

> many GPUs

Citation please - every single GPU in the literal world supports integer arithmetic for operating on tid, gid, etc.

kevin_thibedeau 2 hours ago [-]

GCC has had __int24 for the AVR backend for some time. Useful for larger integers than int16_t while saving 25% over a 32-bit value. C23 does not mandate padding for _BitInt types. It is wrong to assume that will happen or is the optimal implementation for portable code.

ozgrakkurt 38 minutes ago [-]

Thanks for the context, but what I am criticising is this part:

> it became allowed to use @bitCast to reinterpret a [3]u8 as a u24

This cant't make sense unless u24 is defined to be 24bits in the first place. It is just silly to allow something like this. It would make so much more sense to me if they started disallowing this or just even print a deprecation notice for it for one release version.

> Useful for larger integers than int16_t while saving 25% over a 32-bit value

You can't even do []u24 in zig as far as I can remember and understand anyway so this is only happening in a packed struct context.

C doesn't mandate padding but C compilers allow having pointers and arrays of irregular _BitInt types as far as I can understand.

In this [1] document, in Abi considerations section, it writes that it is defined to have next-power-of-two layout size.

Also here (for RISCV) [2] it seems like it is defined with next-power-of-two layout.

Also the document here (for x86_64) defines it similarly [3]

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/...

[3] https://gitlab.com/x86-psABIs/x86-64-ABI/-/tree/master?ref_t...

tialaramex 3 hours ago [-]

> This is a huge mistake. You would never expect something like bitCast to do this.

Is there at least some sort of @transmute or something ? If Zig wants to say "bitCast" means this odd operation, but provides the thing most people actually want under some plausible name that's just an extra thing to learn which seems OK.

boricj 3 hours ago [-]

If I understand it correctly, it basically boils down to copying bits from the source to the destination, in order from the least significant bit to the most significant bit. It's not equivalent to C++'s reinterpret_cast.

I'm no Zig expert, but if you want endian-dependent semantics I'd assume either @ptrCast or a packed union would do the job.

tremon 3 hours ago [-]

But doesn't that show why this is a bad idea? If I understand correctly, this code:

  const MyUnion = packed union {
    full: u16,
    bytes: [2]u8,
  };
  const value: u16 = 0x55aa;
  const in_union: MyUnion = @bitCast(value);
  const without_union: [2]u8 = @bitCast(value);
  std.debug.assert(without_union[0] == in_union.bytes[0]);
  std.debug.assert(without_union[1] == in_union.bytes[1]);

...will now succeed or fail depending on the endianness of the target. That looks like the type of footgun that will bring decades of joy.

peesem 1 hours ago [-]

zig does not allow arrays in packed structs/unions specifically for endianness reasons (there may be other reasons as well but endianness is what i know of)

AlienRobot 2 hours ago [-]

I wonder if packed union also got/will get the same "logical bits" treatment?

SkiFire13 37 minutes ago [-]

My understanding is that the "logical bits" view breaks down for unions, because the nth logical bit could be at different offsets depending on the union variant that's considered active.

nvme0n1p1 3 hours ago [-]

You don't need to use @bitCast for the behavior you're talking about. @ptrCast still exists.

ozgrakkurt 1 hours ago [-]

@ptrCast,

> Converts a pointer of one type to a pointer of another type. [1]

[1] https://ziglang.org/documentation/master/#toc-ptrCast

So it is not the same.

You could use it to define a function that implements bitCast. Which defeats the purpose of having any @bitCast intrinsic instead of using @mempcy for everything

nvme0n1p1 32 minutes ago [-]

Take the address and deref afterwards, and it's exactly the same. Or to say another way: if you want bits to be reinterpreted raw as if they're in memory, then... put them in memory, then reinterpret them.

> You could use it to define a function that implements bitCast. Which defeats the purpose of having any @bitCast intrinsic

Yes, and this is one reason @bitCast was changed to have different semantics that are not trivially achieved with @ptrCast.

ozgrakkurt 12 minutes ago [-]

> Take the address and deref afterwards, and it's exactly the same.

It is significantly worse to take address and deref afterwards.

You have to do something like:

@as(const u32, @ptrCast(&x)).

instead of just

@bitCast(x)

> Yes, and this is one reason @bitCast was changed to have different semantics that are not trivially achieved with @ptrCast.

This makes sense except breaking existing code that properly handled endianness by doing a conditional @byteSwap. And what you end up with is a more complicated intrinsic compared to something that reinterprets values with same layout size

3 hours ago [-]

AlienRobot 2 hours ago [-]

To me it makes sense. If you don't know what endianness is, it doesn't make sense that a program you write in one programming language works for one target but doesn't work for the other.

I think endianness is the footgun that Zig is solving, rather than Zig being the one introducing a footgun when you deal with endianness.

ozgrakkurt 34 minutes ago [-]

> If you don't know what endianness is

It is not feasible for someone to write endian portable code in a language like Zig without understanding what endianness is imo. Regardless of how they change @bitCast there will be other cases that break this like doing @ptrCast + @memcpy.

Also this breaks currently written code that is endian portable and uses @byteSwap like it is done in most other programming languages that do these things.

fithisux 23 minutes ago [-]

These posts make you want not only to use Zig, but also to marry it.

No jokes aside, these posts are the best advertisements of the language.

And I truly like their AI stance.

simonask 4 hours ago [-]

Interesting read, even as someone who isn't using Zig.

I wonder, these arbitrary-width integers... Is it actually even really worth it? My intuition is to prefer manually packing/unpacking things instead (in any language, even C that has bit width for struct fields), because it gives me a better mental picture of the code that is actually generated. Particularly for something like an signed odd-bit integer - what kind of code gets generated for sign-extension, a presumably common operation?

Does anybody have other experiences with them, one way or the other?

hansvm 4 hours ago [-]

IIRC, for "normal" bit widths the codegen basically uses the next larger machine type and preserves zero bits on the high end. An i3 is an i8 with five MSB zeroes (with more custom behavior for "packed" i3 values). It's UB to fill those with non-zero values. For larger bit widths, like u729, you concatenate many large machine types, the compiler generates instructions in an unrolled loop, and the LLVM optimization pass usually doesn't clean that up (though, now that integers are apparently not using the LLVM u729 implementation, perhaps there are some more optimization opportunities).

They're situationally useful, especially when performance isn't an enormous concern. That u729 example above came from a variant sudoku solver I wrote to aid developing new puzzles (easy to check the rough magnitude of the solution space for whatever idea I was mulling over and examine how restricted the board actually was -- just an intermediate step in puzzle design). It's not optimal (hard on the icache, can be hard on registers, other issues abound), but it's dead simple to use, and the assembly isn't terrible, beating all the normal solvers I saw floating around. It's a nice point on the laziness/correctness/good-enough-perf pareto curve.

Another comment mentioned this, but they're great in packed structs for representing weird numeric entities (I think I have a logarithmic number system floating around which does that).

One thing the language does quite a lot is use them to guard against certain classes of human error at compile time. It doesn't perfectly make impossible actions unrepresentable, but shoving a full u32 into a shift argument usually doesn't make sense, so the types are constrained to be smaller.

nvme0n1p1 3 hours ago [-]

I can't imagine any situation where I'd use a u729 instead of a StaticBitSet. For size 729, it would end up backed by a bit_set.Array, not a bit_set.Integer.

https://ziglang.org/documentation/master/std/#std.bit_set.St...

AlotOfReading 3 hours ago [-]

I don't program zig, so it's not clear to me if you can use zig's bitsets arithmetically.

Sometimes it's just more clear to work with integers than other representations. Most situations with a state space of N bits have meaningful integer representations, where arithmetic functions on those representations are also meaningful.

For example, CRCs can be written as the remainder from long division of the message by the polynomial. Defining nontrivial cyclic permutations is also much more straightforward as functions on integers than on bitsets.

nvme0n1p1 2 hours ago [-]

For other situations like a CRC on an arbitrarily-sized message, a big int would be better, surely? You can do long division on those. https://ziglang.org/documentation/0.16.0/std/#std.math.big.i...

I was talking about GP's u729, which is 9*9*9, the state space of a sudoku board. Can you come up with a situation where dividing that number by anything is meaningful?

hansvm 3 hours ago [-]

Old habits :)

If I had to steel-man the idea, I'm pretty sure the integer-based solution has better codegen with many kinds of sparse, comptime-known masks. I think you're right though, StaticBitSet looks better.

nvme0n1p1 2 hours ago [-]

For your specific case, even a simple `[9][9]u16` might perform better (where you make use of nine bits in each u16). For each entry, the nine mask bits would be in the same bit positions, so the compiler won't have to do a bunch of shifts to extract/align the bits. CPUs love consistency. I doubt it's worth the additional codegen complexity to save 70 bytes in your data model.

flohofwoe 2 hours ago [-]

It's pretty great in my toy emulator project (https://github.com/floooh/chipz) as 'system bus' where each bit is a 'wire' which is then mapped to chip input/output pins.

The bus-width is a generic parameter and can be below or above 64 bits (depending on the emulated system). With arbitrary-width integers the high level code remains the same no matter what the bus-width is, and from looking at the compiler output, as long as bit operations don't straddle the underlying 64-bit integer boundary, those bit operations are just as efficient as working on a simple 64-bit int.

Also AFAIK LLVM supports arbitrary-width integers since pretty much forever, Zig just 'exposed' them in the language (as later did Clang via _ExtInt(N), which is now deprecated in favour of C23's _BitInt(N)).

The other nice usage (also in emulators) is for chip registers and counters, those often have odd widths (like 5 bits), and writing those as u5 instead of u8 in the code is just nicer since it matches the chip documentation, and when reading the code it's immediately clear that this u5 is a 5-bit counter or register.

ismailmaj 4 hours ago [-]

It's great for defining fancy floats used in machine learning

e.g. https://github.com/zml/zml/blob/33ced8fa078b3c7c8c709bd526ae...

y1n0 4 hours ago [-]

As an fpga engineer dealing with bitwidths that are non-byte multiples is very normal and when I end up writing software for various reasons, I often miss it. Usually when trying to slice and parse or construct messages.

Obviously there are ways around pretty much everything, but it’s nice to have first class language support for bit slices.

NooneAtAll3 4 hours ago [-]

except it isn't bit slice, it isn't indexing within a range - it's just integer type that only allows values up to 2^width, with same alignment rounding up as with the rest

hmry 3 hours ago [-]

It's a bit slice if you put it in a packed struct.

I like them, they're nicer than C's bitfields: The order isn't implementation-defined, and the types remember their range rather than being converted to a power-of-two size upon read. (Maybe that's possible with C23 _BitInt(n), I haven't tried if those work in bitfields)

grayhatter 4 hours ago [-]

> Quite long devlog coming up, apologies—I got a little carried away with this one!

mlugg, please don't apologize for creating something I actually want to read. I'm drowning in low effort garbage, the in depth technical explanation is a refreshing breath of fresh air.

Might as well apologize for creating a language without a garbage collector, sure most people are unwilling to think, but some of us like nice things and are actually willing to apply effort.

mlugg 3 hours ago [-]

I appreciate the kind words :)

jeffbee 4 hours ago [-]

It wasn't even long! It seemed much shorter than the typical LLM-expanded drivel that crosses the HN front page daily.

frail_figure 4 hours ago [-]

> sure most people are unwilling to think, but some of us like nice things and are actually willing to apply effort.

Sir, you seem to have dropped your fedora.

grayhatter 3 hours ago [-]

ryan_n 2 hours ago [-]

Think theyre just implying that the quoted text comes off as a bit pretentious..

grayhatter 3 minutes ago [-]

oh, I think it's mostly frustration over how eager everyone is to delegate their thinking to literally anything else, accelerated by [gestures at reality]. Is frustration with apathy really pretentious?

3 hours ago [-]

2 hours ago [-]

epolanski 3 hours ago [-]

OT: I'm always surprised at how popular Zig discussions get here, or Youtube and other medias.

Don't get me wrong, I love Zig and I think it's a great C replacement, but I'm very confused on why C3 or Odin rarely get any attention at all, despite being in the same C-replacement crowd.

But still surprised at what Zig does better than these other projects? Is Andrew much better at marketing/promoting the language? He's very hard to dislike.

Iridescent_ 3 hours ago [-]

I think Andrew is a big part of it, and the people he surrounded himself with are the other part. What kind of pre-1.0 language hosts conventions? Crazy that they manage to do that. Andrew's vision has always been clear and inspiring to me. I think this got Zig its initial following, and they have capitalized extremely well on it to grow as a community.

nickmonad 3 hours ago [-]

Andrew doesn't strike me as someone who does any marketing at all. He just wants to make the language he wants to use, and does it well.

Sometimes its just right time, right place. But also, Zig has received attention via projects like Ghostty, TigerBeetle, and Bun (prior to rewrite of course)

csb6 2 hours ago [-]

They have definitely done a lot of marketing through social media and forums like HN. There have been large numbers of posts here by Zig's developers for years, and a few releases of LLVM even mentioned Zig prominently in their release notes.

AlienRobot 2 hours ago [-]

With Zig, I can just import SDL.h and use it without writing a binding.

Can I do that in C3 or Odin?

pyrolistical 1 hours ago [-]

And then you can get AI do a nicer port of SDL.zig and you get way better decls.

Proper enums, proper tagged unions, and often reading the docs can allow the AI to distinguish T * to one of

1. [*]T

2. [:0]T

3. ?T

4. *T

And these are just the most common ones. If you know it’s a read only pointer/array then you can add the const modifier

flumpcakes 1 hours ago [-]

Odin has SDL built into the language (shipped as a vendored library).

AlienRobot 48 minutes ago [-]

That's not what I mean...

There is a mountain of code written in C that you can simply include in Zig without a wrapper dependency and without having to create the wrapper yourself.

kI3RO 2 hours ago [-]

Yes, Andrew did a lot of internet cult marketing over the years, and then you have exponential free cult marketing.

QuaternionsBhop 2 hours ago [-]

Is pasting em-dashes everywhere some kind of inside joke?

mlugg 2 hours ago [-]

Uh, no? My writing style just happens to include a lot of em-dashes, as is very common. And it's not like I'm pasting a weird Unicode codepoint all over the place, that's just (rightly) how my Markdown gets rendered...

Rendered at 19:36:48 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.