Policies on AI-generated code

Trilby · Today 12:16:31

EDIT: for anyone else with similar goals / questions as mine, this thread would suggest that the archlinux distro is not appropriate for meeting these goals (at present), and the community - at least as represented in this thread - is even quite hostile to the goal. But Gentoo and NetBSD both have clear policies against LLM-generated code and may be good alternatives. Fedora reportedly also has policies on disclosing any AI-generated content. Original post below.

I'm curious whether there are any policies on "AI-generated" code used among arch devs particularly whether there are guidelines about including / not-including such content in main repo packages.

I am not (presently, in this thread) looking for discussion on the merits or concerns of such inclusion. But for transparency, my personal goal is to avoid any "AI" generated code or software from "AI-assisted" coding. I feel this goal could be analogous to some users preferring a purely free / libre system; for that goal we'd note that arch linux would not likely be a good choice, but Parabola could be a suitable alternative.

For those with the goal of avoiding "AI" content in their software are there any similar options / recommendations for using arch linux or alternative distros that do have rigorous policies on the topic?

---

Note I put "AI" in quotes for multiple reasons which are beyond the scope of this topic. Semantic arguments about that term are out of scope for this discussion. Feel free to replace any use of "AI" in this post with "LLM" or any / all specific examples of the LLM-backed coding assistant tools.

Last edited by Trilby (Today 17:31:45)

noesoespanol · Today 12:29:55

People really need to understand this thing isn't A.I. and it's just a glorified search engine. Definitely need to stay clear off fully generated and no human in the middle to check it.

seth · Today 13:40:05

You mean the packaged software would be AI-free, not any archlinux projects?
Any promise of 100% made by real human fools would have to be an extension of such promise made for every packaged project what doesn't sound realistic (even if purely going by blind trust, you're not getting such commitments from each and every small project)
Any policy of "we don't tolerate LLM generated code" (ie. flag for removal) would get you into a tough spot when some critical project accepts LLM generated code (arch doesn't apply downstream patches, so the only fix would be a fork or complete substitution or feature loss) and you might have to make contentious calls about what's considered LLM-generated (eg. when some dev uses an LLM to write some bulky code and the manually checks and fixes it - is that still LLM generated?)

My *advice* on the matter is to be as agnostic as possible and have more faith in evolution.
If a project gets taken over by an AI the (reasonable) concern is that it will turn to shit and then people will just stop using it because it's shit as they would if it turned to shit for other reasons.
If otoh you have some high quality tool that does an excellent job and is actively maintained and has a responsive community - do you really want to drop it because there's somewhere some LLM involved and therefore it lost its "purity"?

papajoke · Today 15:11:40

0% Ai makes no sense nowadays!

The real issue is the proliferation of mini-projects where over 80% of the code is AI-generated; however, nothing compels us to use them—in my view, they are essentially proofs of concept. It is difficult to assign any meaningful copyright to such projects.

The Linux policy: https://docs.kernel.org/process/coding-assistants.html
Are you planning to exclude kernels with Arch Linux ?

Trilby wrote:

Not because of "purity" but because of security.

The same problem has always existed with an average developer.

Last edited by papajoke (Today 15:31:24)

Trilby · Today 15:22:48

seth wrote:

do you really want to drop it because there's somewhere some LLM involved and therefore it lost its "purity"?

Not because of "purity" but because of security. Countless projects with LLM-generated code have been found to be beyond-irresponsible with user data in many cases shipping it off to data brokers. When some slop coder gets an output from the LLM that "works" then ships it off to users without understanding what the code actually does, the potential for harm is immeasurable. But the harms of LLM use go far far beyond the impact on the product / user.

seth wrote:

Any policy of "we don't tolerate LLM generated code" (ie. flag for removal) would get you into a tough spot

The same could be said of any policy against closed-source binary blobs. That doesn't mean it cant and hasn't been done. There are several software projects that have very clear and strictly-enforced prohibitions against any use of AI coding agents or other LLM code-generation tools.

papajoke wrote:

0% Ai makes no sense nowadays!

Perhaps not to you. Some people would argue that politicians not molesting children makes no sense nowadays. That doesn't mean we just accept what is popular as being okay.

papajoke wrote:

nothing compels us to use them

Of course nothing compels us to use software we don't want to use. Nothing compels anyone to use proprietary binary blobs either. That doesn't mean a project like Parabola has no place. The place of a project like Parabola is not that it is the only way to avoid being "compelled" to used closed-source material, but rather that it is a curated set of content that is designed to be only free / libre. Why is such a project for people who want to be free of LLM content so odd to you?

I'm not suggesting anyone is or should be compelled to do anything. I'm asking about resources for users to make informed decisions.

But it seems this community is all-in on the LLM bullshit. So I'm all out.

And before someone tries further moron-splaining I've taught machine learning classes for graduate students at MIT. I know precisely how these technologies work. I am ethically opposed to their use. I don't have any interest in preaching about my ethics - but this community has always been supportive of people having their own individual goals for their software use.

Last edited by Trilby (Today 15:34:04)

SimonJ · Today 15:33:52

It would be nice if anything that is AI generated had some note of this, the trouble is of course as with everything the ethical devs would do this, the unethical would hide it and we would need some system to check.

Trilby · Today 15:44:19

FYI, apparently both Gentoo and NetBSD have implemented a ban on AI-generated content: https://www.tomshardware.com/software/l … rated-code

So for everyone saying this is impossible - you clearly are disconnected from reality, perhaps from too much chat bot usage.

mpan · Today 15:48:55

Trilby, focusing on your question, a non-authoritative information from observation/interaction:

So far I did not encounter any official policy on that. By that I don’t mean only one prominently advertised, but even one that would be simply circulated around.
Among contributors there is some concern about negative impact of such content. This may not reflect the position of all contributors. This concern doesn’t mean that any use of “AI” algorithms is seen as automatically unwanted. But certainly, if Arch had to publish an official policy now, you will see in it a strong push-back against many uses.

Do you know how to tell replies in this thread aren’t from LLMs? Any LLM would folow this simple prompt: “I am not looking for discussion on the merits or concerns of such inclusion.” Humans failed to do so.

Scimmia · Today 16:03:43

Trilby wrote:

So for everyone saying this is impossible - you clearly are disconnected from reality, perhaps from too much chat bot usage.

Sure it's possible, just switch to a different kernel. Simple enough, right? Oh, but of course, you back it up with an article from over 2 years ago that has little to do with anything, so there's that.

Disconnected from reality is an apt phrase here, but the other posters aren't the ones guilty of it.

Last edited by Scimmia (Today 16:08:12)

Trilby · Today 16:34:40

Thanks mpan for an on-topic and useful reply.

Scimmia, you think an article from two year ago is no longer accurate? It's quite easy to confirm that these policies are still in place for both Gentoo and NetBSD. So I'm not sure what you're even getting at. But in any case, since you don't seem to want to contribute productively, but rather troll on the one thing I said this thread should not be about, please feel free to leave this thread and take your bullshit elsewhere.

Last edited by Trilby (Today 16:35:19)

Scimmia · Today 16:45:23

NetBSD's development is completely and totally different from a Linux distro, so no, no relevance there, and Gentoo's policy has nothing to do what so ever with packaging AI generated code, so again, no relevance.

And how you've added "archlinux distro is not appropriate for meeting these goals". Have you really not realized yet that using linux at all cannot meet your goals? Disconnected from reality indeed.

Last edited by Scimmia (Today 16:46:04)

papajoke · Today 17:00:05

The reason you give is security...

To me, that's actually a weak argument, because no developer is perfect.

The real issue with open source is this:
> This is not about the quality but of the legality of AI contributions. If a model was trained on copyrighted code and/or code under a non-permissive license that is incompatible with the target project, then it may not be legal to include its output.

This is a huge issue. So yes, if that's your concern, then your topic is absolutely welcome.

My goal is to avoid any code or software generated by AI or created with AI-assisted programming.

The request needs to be clearly defined.
I'm not necessarily opposed to that. But do you really make no distinction between using AI and "vibe coding"? What is the minimum acceptable use? 1%? 0%?

Nowadays, after writing a script, I often ask my AI to review it. If it points out two or three minor issues, then yes, I used AI — but as a tool, not as a general-purpose code generator. In the end, it may have contributed only five lines of code. Should I be excluded?

In the past, developers used to copy code from a famous programming forum...
Recently, I wrote a small GUI application with more than 100 functions. For one of them, I had AI generate the code. It was just a visual effect that adds nothing essential to the application itself. I didn't want to spend an entire day on something so trivial. Should I be excluded? In terms of security, there is certainly zero risk; however, I cannot answer regarding the license.

For translations, I asked AI to generate 50 sentences in five languages. Should I be excluded?

I also asked AI to generate a man page. Should I be excluded?

Same for writing tests?

Last edited by papajoke (Today 17:20:35)

gromit · Today 17:20:15

There currently is not policy on this topic for the overall project, although some subprojects (i.e. buildbtw) have their own policies.
A few people currently are working towards an RFC on the topic, the exact wording & guidance is not yet clear right now

Trilby · Today 17:29:32

Papajoke, I should not have mentioned the security point. I have quite a long list of reasons for my views - but I never wanted this thread to be about those reasons as good people could have different views on those reasons. The goal of the thread is to learn whether there were policies in place or if other distros / OSs had such policies.

The "percent" of generated content that I'd be okay with would be hard for me to answer, but it is also not actually relevant to the present thread. My own threshold for comfort is beside the point. The point is whether, or to what degree, the archlinux project / devs have guidelines or policies that they (aim to) adhere to on this topic.

Thanks gromit - that's the kind of answer I was seeking (it's not the answer I was hoping for, but it is an informative response to the question). I look forward to seeing any such RFC and where it goes.

Last edited by Trilby (Today 17:30:17)

5hridhyan · Today 17:29:45

distros that do have rigorous policies on the topic?

since Gentoo and NetBSD got spotlights in this thread, Fedora, like they do allow AI-assisted contributions but requires a disclosure tag (Assisted-by:) and forces the human submitter to take full accountability/responsibility for the code
https://docs.fedoraproject.org/en-US/co … on-policy/
and Debian haven't decided or adopted a "rule" yet https://people.debian.org/~lucas/debian … esolution.

V1del · Today 17:39:59

I do think a lot of the misunderstandings in this thread could be clarified if you mention what your scope is here, if it's about upstream code then this is nigh impossible to guarantee without a huge effort to patch things. If it's just about developments within/for Arch's tooling then I'd see such a policy to be a more reasonable and more enforceable endeavor.

seth · Today 17:46:27

Trilby wrote:

When some slop coder gets an output from the LLM that "works" then ships it off to users without understanding what the code actually does, the potential for harm is immeasurable.

You're arguing the results, not the tools.

And wtr seth wrote:

If a project gets taken over by an AI the (reasonable) concern is that it will turn to shit and then people will just stop using it because it's shit as they would if it turned to shit for other reasons.

Trilby, aber a grand total of two replies wrote:

But it seems this community is all-in on the LLM bullshit.

https://bbs.archlinux.org/viewtopic.php?id=313959
There's a difference between being all into something and pointing out obvious problems w/ the practical implementation of suggestions.

FYI, apparently both Gentoo and NetBSD have implemented a ban on AI-generated content

https://wiki.gentoo.org/wiki/Project:Council/AI_policy

This policy affects Gentoo contributions and the official Gentoo projects. It does not prohibit adding packages for AI-related software or software that is being developed with the help of such tools upstream.

Edit, @V1del

The OP wrote:

my personal goal is to avoid any "AI" generated code or software from "AI-assisted" coding. I feel this goal could be analogous to some users preferring a purely free / libre system

Last edited by seth (Today 17:47:57)

clfarron4 · Today 19:33:34

This might be a little out of scope of the discussion, but the Linux Kernel Developers recently published their stance on Coding Assistants, which is basically that the human submitter is accountable for what the Coding Assistants produce.

Arch Linux

#1 Today 12:16:31

Policies on AI-generated code

#2 Today 12:29:55

Re: Policies on AI-generated code

#3 Today 13:40:05

Re: Policies on AI-generated code

#4 Today 15:11:40

Re: Policies on AI-generated code

#5 Today 15:22:48

Re: Policies on AI-generated code

#6 Today 15:33:52

Re: Policies on AI-generated code

#7 Today 15:44:19

Re: Policies on AI-generated code

#8 Today 15:48:55

Re: Policies on AI-generated code

#9 Today 16:03:43

Re: Policies on AI-generated code

#10 Today 16:34:40

Re: Policies on AI-generated code

#11 Today 16:45:23

Re: Policies on AI-generated code

#12 Today 17:00:05

Re: Policies on AI-generated code

#13 Today 17:20:15

Re: Policies on AI-generated code

#14 Today 17:29:32

Re: Policies on AI-generated code

#15 Today 17:29:45

Re: Policies on AI-generated code

#16 Today 17:39:59

Re: Policies on AI-generated code

#17 Today 17:46:27

Re: Policies on AI-generated code

#18 Today 19:33:34

Re: Policies on AI-generated code

Board footer