Page MenuHomeFreeBSD

committers: add AI policy
Needs RevisionPublic

Authored by dch on Mon, Jun 2, 1:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jun 15, 6:57 AM
Unknown Object (File)
Fri, Jun 13, 4:13 AM
Unknown Object (File)
Thu, Jun 12, 12:06 AM
Unknown Object (File)
Tue, Jun 10, 10:03 PM
Unknown Object (File)
Mon, Jun 9, 9:43 AM
Unknown Object (File)
Tue, Jun 3, 10:04 AM
Unknown Object (File)
Tue, Jun 3, 5:57 AM

Details

Reviewers
glebius
olivier
lwhsu
Group Reviewers
Core Team
Summary
Test Plan

open questions & comments

  • should we allow documentation translation via AI?
  • it's permitted already to contribute AI tools to ports
  • should we be more clear in the general committers guide that you need to be 100% clear and transparent on the origin of your code/patches/contributions
  • can I *use* AI/LLM tooling to help me with commit messages, checking my language and style?

Diff Detail

Repository
R9 FreeBSD doc repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 64595
Build 61479: arc lint + arc unit

Event Timeline

dch requested review of this revision.Mon, Jun 2, 1:27 PM
dch created this revision.
dch edited the test plan for this revision. (Show Details)
This revision is now accepted and ready to land.Mon, Jun 2, 6:40 PM
olivier requested changes to this revision.EditedMon, Jun 9, 10:47 AM
olivier added a subscriber: olivier.

About the documentation, comments in code or commit message: Is using AI to fix my English forbidden too ?

This first sentence was written by non-native-English me, but for documentation or commit message, I might ask the AI to "fix my English," and the AI result will be something like this:

"Am I also prohibited from using AI to correct my English?"

I'm asking because I have dyslexia, which is a serious issue when you need to write in French (where correct writing is mandatory in French culture). Therefore, I'm accustomed to using software to check for all grammar and orthographic errors. However, since these tools are now AI-based, does that mean we can't use them either?

This revision now requires changes to proceed.Mon, Jun 9, 10:47 AM
lwhsu requested changes to this revision.Mon, Jun 9, 3:48 PM
lwhsu added a subscriber: lwhsu.

I fully agree the biggest and must be solved issue is the license concern, but putting "expressly forbidden" on a tool because of its current limitation is too narrow. I believe the spirit the project is being more inclusive, as long as the contribution can meet the requirements, e.g., the license, quality, convention, etc.

Rather saying in a negative way, I do prefer to draw a clear line of the requirement for any kind of the contributions to the project, and it is all the contributor's responsibility to follow it, and committer's responsibility to verify it.
It's not due the tool itself, but depends on the contributor (committer is also a kind of contributor) can use it in a correct way. You must have full knowledge and responsibility of what you commit to the project repository.

https://www.apache.org/legal/generative-tooling.html
https://www.linuxfoundation.org/legal/generative-ai

About the documentation, comments in code or commit message: Is using AI to fix my English forbidden too ?

This first sentence was written by non-native-English me, but for documentation or commit message, I might ask the AI to "fix my English," and the AI result will be something like this:

"Am I also prohibited from using AI to correct my English?"

I'm asking because I have dyslexia, which is a serious issue when you need to write in French (where correct writing is mandatory in French culture). Therefore, I'm accustomed to using software to check for all grammar and orthographic errors. However, since these tools are now AI-based, does that mean we can't use them either?

Me too! The core issue here is not "what tool am I using?" but:

  • does this change the provenance of this contribution?
  • can I still provide a personal commitment that the attribution is still mine?

I think orthographic, spelling, and similar "assistive" tooling is fair, assuming it still meets that bar.

dch edited the test plan for this revision. (Show Details)
dch edited the summary of this revision. (Show Details)

I fully agree the biggest and must be solved issue is the license concern, but putting "expressly forbidden" on a tool because of its current limitation is too narrow. I believe the spirit the project is being more inclusive, as long as the contribution can meet the requirements, e.g., the license, quality, convention, etc.

If you leave a crack open then that crack will be exploited, intentionally or otherwise, and then our "97% BSD licensed, fairly attributed" codebase becomes "YOLO AI-License".

Our lessons learned from 5600 words of GPL licensing is that clarity & simplicity matter a great deal.

When you read the 1100 word ASF one closely, you will find that in *every* case, it's still No, unless you can reasonably show that its actually OK. It just takes more examples and many more words to say so:

https://www.apache.org/legal/generative-tooling.html

## Can contributions to ASF projects include AI generated content?
...

Given the above, code generated in whole or in part using AI can be contributed if
the contributor ensures that:

    1. The terms and conditions of the generative AI tool do not place any restrictions
        on use of the output that would be inconsistent with the Open Source Definition.
    2. At least one of the following conditions is met:
        2.1. The output is not copyrightable subject matter (and would not be even if
            produced by a human).
        2.2. No third party materials are included in the output.
        2.3. Any third party materials that are included in the output are being used with
            permission (e.g., under a compatible open-source license) of the third party
            copyright holders and in compliance with the applicable license terms.
    3. A contributor obtains reasonable certainty that conditions 2.2 or 2.3 are met if the AI
            tool itself provides sufficient information about output that may be similar to
            training data, or from code scanning results.
...
## What About Documentation?
The above text applies to documentation as well. 
...
## What About Images?
As with documentation, the above principles would still apply.

Same for the Linux Foundation one:

https://www.linuxfoundation.org/legal/generative-ai

If any pre-existing copyrighted materials (including pre-existing open source code) authored
or owned by third parties are included in the AI tool’s output, prior to contributing such
output to the project, the Contributor should confirm that they have have permission from the
third party owners–such as the form of an open source license or public domain declaration
that complies with the project’s licensing policies–to use and modify such pre-existing
materials and contribute them to the project. Additionally, the contributor should provide
notice and attribution of such third party rights, along with information about the applicable
license terms, with their contribution.

Rather saying in a negative way, I do prefer to draw a clear line of the requirement for any kind of the contributions to the project, and it is all the contributor's responsibility to follow it, and committer's responsibility to verify it.

Yes, we should have this in the contributors / committers guide. I think in this case, clarity
and simplicity matter. Having a decent FAQ with a bunch of examples is fine, but we should end
up in the same place:

*if you can't be certain 100% of provenance and attribution, then this is not suitable for inclusion*

It's not due the tool itself, but depends on the contributor (committer is also a kind of contributor) can use it in a correct way. You must have full knowledge and responsibility of what you commit to the project repository.

I disagree. The *tool* is everything. If you didn't produce this content (code, docs, whatever) yourself then how can you *guarantee* the provenance & attribution? How can you present this content as under *your* copyright when you didn't even produce it?

If and when there are AI tools that can provide provenance & attribution, then we should revisit this position, but as of today, I am not aware of any of these. If somebody made an LLM entirely off the BSD licenced history of this project, then arguably that would be fair play for inclusion & usage.