@thecsb@mastodon.cloud the real issue is that the ai can also implant state sponsored backdoors
Post
Remote status
Context
18@DavidB@noauthority.social @thecsb@mastodon.cloud LLMs aren't open source unless they give you all the data needed to regenerate the model weights
@DavidB@noauthority.social @thecsb@mastodon.cloud that's like saying you don't need any source code when you can get a compiled binary but it's still open source somehow
@DavidB@noauthority.social @thecsb@mastodon.cloud a binary executable/library is a "compressed" rendition of its source code in the same way weights are a "compressed" rendition of its training data.
@DavidB@noauthority.social @thecsb@mastodon.cloud the training process "translates" the training data into optimized weights using gradient decent and back propagation in the same way a compiler "translates" source code using parsing lexing building an AST, transforming the AST to IR, IR is optimized, IR is transformed into machine code.
both are "translating" and "compressing" fat enriched source data into a thin opaque binary output in a one way manner.
@DavidB@noauthority.social @thecsb@mastodon.cloud if you genuinely think compilation is deterministic you clearly never used -O3
@DavidB@noauthority.social @thecsb@mastodon.cloud gcc uses the inode of intermediate object files in the final compiled executable among other non deterministic data sources (e.g. timestamps) , you're just wrong.
@DavidB@noauthority.social @thecsb@mastodon.cloud additionally the final binary layout under -O3 is non deterministic
@jeff @thecsb if you give the compiler the same inputs, including flags, you will get the same out.
In the situation you're describing the inputs are being changed on each run. You can think of a compiler as a function. And your source code is not the only input.
Some compilers make controlling inputs impossible, thus you can't get a bit-for-bit binary that's identical for each compiler run.
Compilers are deterministic. For a given set of inputs you will get the same output.
@DavidB@noauthority.social @thecsb@mastodon.cloud "if you give the compiler the same inputs, including flags, you will get the same out. "
no, no it won't. compile a program with multiple compilation units using -O3, sha256sum it, remove all the compiled binary and object files and repeat. you'll find the hashes do not match.
We need a single threaded KVM implementation that is reproducible. And probably also should be able to create a log of inputs, so you can launch a VM and fiddle around in it, then when you hit a bug, you can re-launch and replay the same inputs and arrive at the same state.
@cjd@pkteerium.xyz @thecsb@mastodon.cloud @DavidB@noauthority.social debian has gone thru literal hell on earth to get deterministic builds working, to say that compilation is always deterministic is just so fucking ignorant of the actual reality of software infra.
that's why this thread exists
Replies
8All software is deterministic - if you get rid of those pesky inputs...
You may enjoy reading about "haveged"
There are plenty of sources which are good enough that attacking them is unrealistic, and you don't really need to make something more impossible than impossible...
@cjd@pkteerium.xyz @DavidB@noauthority.social @thecsb@mastodon.cloud cloudflare uses a webcam pointed at a wall of lava lamps
@DavidB@noauthority.social @cjd@pkteerium.xyz @thecsb@mastodon.cloud that's factually wrong, it took a decade to get gcc to output deterministic builds.
@DavidB@noauthority.social @cjd@pkteerium.xyz @thecsb@mastodon.cloud compilers are not deterministic because the inputs CANNOT be deterministic without doing insanity like mocking inode values AND certain modes of operation are internally non deterministic, see -O3