Egregoros

Signal feed

Timeline

Post

Remote status

Replies

50
[CW]

Content warning

AI fails

Show
@icedquinn > While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

so just x86, RISC-V or ARM don't need that
[CW]

Content warning

re: AI fails

Show
@newt @phnt @lain @sun call me when it does something other than lander stolen GPL code. :senko_sleep:

I've been saying for a while these things only code good when what you are doing wasn't valuable to begin with.

Someone tried to send ChatGPT at me for some parsing tasks for Godot and it was like "have you tried pasting strings together" and it's like :neocat_gun:

I have learned the names of some algorithms from the bots though. Talking about architecture designs seems to be OK, which is incidentally a task that a very sophisticated search engine is the right solution for anyhow
@lain @icedquinn @phnt @sun @newt I just want to say while I am on lain's side on this one, there technically is no Claude-written compiler because in addition to using gcc as an "online oracle", i.e. test every output against gcc, they also used gcc's entire torture test suite.
That doesn't mean it's not interesting or we can't debate the merits but I think calling it a Claude-written compiler is a bit disingenuous in the first place.
You couldn't use this to write a compiler from scratch for something you need, you can still only use it to fill in the blanks between what very talented people already figured out.
@WandererUber @icedquinn @phnt @sun @newt okay, but i feel that this is going too far into the other direction. sure, having the test suite is a huge advantage. but if someone wrote a new fediverse server and tested it against mastodon's and pleroma's test suite, i wouldn't have said they didn't write a fediverse server.

either way i think this is, at some point, splitting hairs. we went from "this can't even code a simple website" to, "yeah it can compile a linux kernel but how hard is that, really?"
@lain @icedquinn @phnt @sun @newt >"yeah it can compile a linux kernel but how hard is that, really?"
I'm not saying that. But it can't.
I can compile a linux kernel by hand and output assembly. "It won't boot but what does that matter, really?"

> if someone wrote a new fediverse server and tested it against mastodon's and pleroma's test suite, i wouldn't have said they didn't write a fediverse server.
That's true but THEY are obviously trying to say it can solve unsolved problems in the programming space, i.e. supplant or at least accelerate a human programmer. If it needs the entire compiler test suite AND the compiler, it "not even" does clean room re-implementation. Which is easier obviously.
It's a different thing than what they say it is.

Fedi server is similar. That *MIGHT* also break down once the docs and test suites are exhausted and it is asked to implement something ref doesn't have.

If it doesn't break down on such tasks, then why not make it do THOSE instead and and prove it?
It's more a lies in marketing problem than anything else.
@WandererUber @icedquinn @phnt @sun @newt yeah, i don't disagree. i do think there's a huge difference between "it really can't do anything" and "anthropic lied", and those are kind of taken for the same thing by many. frankly i don't even know why antrhopic does this, for daily users of these systems, there's no question that they can 'do it all' by now, with the right setup. so why do such a weird publicity stunt that barely compiles hello world?
@lain @WandererUber @newt @phnt @sun it still overall boggles me how much leverage software people leave on the table (ignoring metamodeling for years, absolute terrible explaning basic algorithms, celebrating terrible syntaxes with bad tools for decades) like

idk software feels like its largely in the 1970s with one or two pockets of mild competence (jetbrains?) but for some reason the token jumbler gets uncritically worshipped and its like what
@lain @icedquinn @phnt @sun @newt Also, I am a shit programmer, but I am pretty sure if they gave me unlimited access to the entire internet (including basically any book on compiler design) plus gcc and it's test suite, plus a bunch of money then I would produce a compiler whose linux binary actually boots.
Not for twenty thousand dollars though, and not in two weeks. But in some sense Claude also didn't because they spent months training this thing beforehand. Then again You can cp claude.safetensors claude2.safetensors and you can't do that with a human.
idk
just a more nuanced economic problem than people want it to be.
[CW]

Content warning

re: AI fails

Show
@icedquinn @lain @sun @newt Everybody is making fun of Claude for failing in funny ways when writing a compiler from scratch in one of the worst languages, yet nobody seems to realize that if you would force senior engineers from a middle-sized software company to write a C compiler written from scratch, the result would probably be similar. Very little people now know how to build a functioning compiler from scratch.
[CW]

Content warning

re: AI fails

Show
@newt @icedquinn @lain @sun Yeah, that's not reality.

And I don't mean a compiler just as a thing that generates assembly. I mean the whole stack, assembler, linker and the compiler/code generator. Now make it support all the weird x86 extensions, gcc extensions and the whole C standard from C89 to C11. And have it build sqlite successfully. Because that's what Claude did. You have to compare it the same and not pick some subset of the task.
@lain @icedquinn @phnt @WandererUber @sun @newt
AFAIR, unlike that of Pleroma and Mastodon, GCC test suite is pretty thorough — you can keep throwing noise at it and once it passes you get a compiler, and this isn't even too far off from how this was carried out.
On paper it might seem impressive, because "Wow, a compiler!"— but essentially this is even less impressive than creating a tiny website as no one defines requirements for said website down to tiny detail, unlike with compiler when they are well-formalised.
True, there is test-driven development, but I think grandparent poster's point is, correct me if I'm wrong, that GCC's test suite wouldn't have existed if GCC never existed. That test suite isn't just great help, I think it's the defining moment here — it's something a lot of work went in, outside it there isn't much to automate. Once something is clearly defined, you can proceed to the billion monkeys with typewriters stage.
Again, their marketing department would be milking all the hype they can from it: "Look, it's a compiler that can build Linux, imagine what's coming next!"— but on a closer look it's the worst possible case as neural networks' forte — fuzziness is at a disadvantage here. Starting work on that website its developer might not have a clear picture of what it should be, they keep making hints in natural language through continuous prompting — it's like navigating through fog, might not be the shortest route, but at least you made it and managed to avoid some obstacles before actually bumping into them.
And this compiler is like taking a journey through a garden maze that's already been charted by people who already know if not the shortest, but at least a very good route, and still bumping into lots of walls — it's a lot like brute forcing, and given vast computing resources not that impressive.

Their angle would be emphasising the complexity of the result in combination with requiring very little manual intervention, but if we take a step back and look at the bigger picture we can see that all the babysitting was simply done beforehand — it's an astounding amount of work, describing the result in tiny detail *in code*, but the fact that it looks different from the iterative process that most have in mind when dealing with coding assistants, might make one think that it's not there.