This commit introduces a new script `check_spdx.lua` to support automated SPDX license verification and matching for FreeBSD ports. The goal is to improve the accuracy and consistency of LICENSE_FILE entries by comparing their normalized content against official SPDX license templates. Key features: - Supports `LICENSE_FILE`, `LICENSE_FILE_<LICENSE>`, and fallback SPDX header scanning (`SPDX-License-Identifier`) within WRKSRC sources. - Normalizes both the LICENSE_FILE and SPDX license templates using a Python-compatible preprocessing pipeline (implemented in flua) that: - Removes URLs, comments, copyright notices - Canonicalizes typographic and spelling variants (e.g., licence → license, organisation → organization) - Replaces fancy quotes and normalizes whitespace - Computes similarity via the Sørensen–Dice coefficient and reports the top matches. - Gracefully handles: - Missing or improperly defined LICENSE_FILE - Multiple LICENSE values with LICENSE_COMB and no per-license file mapping - Ports without any declared license metadata (using `-s` to search source files) SPDX license templates are cached in /var/db/ports-licenses/normalized/ to avoid repeated downloads. The matcher is modeled on prior work from the spdx-license-matcher[1] project and reimplemented here in flua for native use in the ports tree. This tool relies on `libucl` for parsing the official SPDX license JSON index. FreeBSD 15 includes `libucl` in the base system with flua bindings at `/usr/lib/flua/ucl.so`. On earlier FreeBSD versions, users must install `textproc/libucl` from ports to provide the required Lua bindings (typically in `/usr/local/lib/lua/5.4/ucl.so`). Makefile integration logic automatically: - Invokes the checker based on LICENSE / LICENSE_FILE settings - Falls back to scanning source files for SPDX headers if no license file is found - Skips SPDX matching when LICENSE_COMB=dual/multi is used with a single LICENSE_FILE This addition helps validate license declarations, identify misattributed or legacy licenses, and assist maintainers in migrating toward SPDX-aligned metadata. If the similarity score is not exactly 1.000, the file is not a verbatim copy of any SPDX license template. Manual review is advised in such cases. [1] https://github.com/spdx/spdx-license-matcher
Details
Run make check-spdx-license inside the directory of any ports that has LICENSE_FILE*.
Diff Detail
- Repository
- R11 FreeBSD ports repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
I've tested few random packages in ports.
- It's bit confusing that check-spdx-license compiles things when one might just want to check license. What I mean is that it should just extract and check license file in current port directory not try to compile whole package?
- Another one what might be easier to user to understand would be if match is 1.0 (I understood that is 100%) it should not be relevant to show other possible licenses?
- It would be nice to see unified diff if license differs from license that is mentioned in ports Makefile.
I think that is what this is doing. It's extracting and checking the license. There is no compiling involved.
- Another one what might be easier to user to understand would be if match is 1.0 (I understood that is 100%) it should not be relevant to show other possible licenses?
Yes I have that code hidden and commented around somewhere still in the scripts and waiting for some other's feedback
- It would be nice to see unified diff if license differs from license that is mentioned in ports Makefile.
This is not easy to do. Initially we are actually normalizing the entire documents into a single line to avoid any ambiguity. So the unified diff is not helpful as it prints two huge lines. To have something really readable we have to break into lines again. I will look into it but no promises.
Then is just me.. I find it confusing in some tested ports like Go ones just print so much stuff and it looks like it starts compile something.
I've commented code little bit. It they don't seem to be relevant just ignore them. Code is clean and seems to do what it should without much hassle. Some more commenting would not hurt though.
Mk/Scripts/check_spdx.lua | ||
---|---|---|
41 | Where these colors come from as they don't seems to be ANSI escape colors | |
61 | Should there be global dprint for every Lua scripts in Mk/Scripts | |
167 | I didn't idea behind this one? | |
180 | Is this faster done in Big regex than for-loop? I admit it would be large and I don't know if Lua can handle nested Regex? But would be only one run to make every change. | |
236 | I understand this is only used here but does it cause unnecessary complication to have nested function just for this? It could be normal function. Did I get it right that it makes array with every char of string? | |
254 | This should be in global library if one would be formed. Could be beatifically for other Lua scripts also. | |
255 | There should be check that it goes fine or does it stop executing if this fails? | |
266 | For reading it would easier to have something like eprint-function or something that prints colors.red as not have it every time that it's needed. Just like dprint. | |
273 | Should some error be returned or nil? | |
296 | As this is second time mainly the same code. I would move it to function. | |
326 | Why this is not used for normalization of licenses above? And why read LICENSE-file is not normalized same way that downloaded licenses? | |
396 | Would this be easier to be not case sensitive? | |
402 | Same a with red color. Some function would be easier to maintain. | |
444 | There is no similar than getopt available? |
My only other feedback from looking at the draft standard was that while we should have LicenseRef-Foo as described in the document, each time we have to do that because the matching score isn't high enough, we should encourage the maintainer to submit that to the SPDX legal team so that they can either tweak the markup for license Foo, or create a new Foo variant if the changes are legally different enough.
Beyond that, and my lack of time to give this a super-close look, I really like how the policy has evolved after the initial proposal. I was worried it would take several iterations. I must have had a good day describing the changes, or Moin is a good mind reader :). In all seriousness, I'm super glad this is being done. I can also introduce people around to the SPDX folks I have a relationship with.
Mk/Scripts/check_spdx.lua | ||
---|---|---|
41 | They match https://en.wikipedia.org/wiki/ANSI_escape_code Coming from 'C' \27 looks wrong, but in LUA \ddd is a decimal number, not an octal one. Lua is not quite the same as 'C' in these details. |