Egregoros

Signal feed

Timeline

Post

Remote status

Replies

10

But really, relicensing a GPL codebase to MIT is uninteresting.

Let's do the interesting one, which is: vibe code a "clean room" reimplementation of an entire proprietary codebase! After all, Microsoft released a "shared source" proprietary version of Windows. Now try seeing what happens if you run THAT through the "turn it into public domain" machine

Win-win outcome, no matter how it goes

@SprocketClown

The way I read it in this context is that an existing codebase has license (whether GPL, LGPL, or proprietary or whatever), and that by "laundering" the codebase through an LLM, the output no longer retains the retains the license terms. In the US at least, the Supreme Court has ruled that LLM output is uncopyrightable.

So as @cwebber highlights, either the licensewashing works, in which case LLMs can scrub licenses off proprietary codebases giving a leg up on "reproducing" proprietary codebases into the public domain; or it doesn't work, in which case LLM-produced code becomes subject to the licensing of the original code.

@cwebber

> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

The human didn't write the code, the LLM did. "They" which had "ample exposure to the originally licensed code" does not exist; "they" are ephemeral.

1. Start a fresh session / clean context, make it meticulously document the architecture, APIs, etc

2. keep those documents, throw away the code, start a new session with an LLM that has clean context and tell it to build off those documents.

That's clean room. If the original code was not in the LLM's context, it's not violating the license.

This is how you can do this. Proving beyond a reasonable doubt he didn't do it this way is going to require a lot of evidence nobody will have.
@vv @cwebber proving the original was trained by the model or is in the model is quite difficult to do and is questionable whether or not it really matters anyway.

Chris Lattner was "trained on" GCC when he wrote LLVM. He studied it a lot. GCC compiles code C/C++ successfully, LLVM compiles C/C++ code successfully.

Both produce completely working bytecode and generally you don't *need* one compiler over the other to get an end result that is acceptable.

Should LLVM be allowed to have an Apache license because of this?

These are tough questions.