This commit introduces a new script `check_spdx.lua` to support
automated SPDX license verification and matching for FreeBSD ports. The
goal is to improve the accuracy and consistency of LICENSE_FILE entries
by comparing their normalized content against official SPDX license
templates.
Key features:
- Supports `LICENSE_FILE`, `LICENSE_FILE_<LICENSE>`, and fallback SPDX
header scanning (`SPDX-License-Identifier`) within WRKSRC sources.
- Normalizes both the LICENSE_FILE and SPDX license templates using a
Python-compatible preprocessing pipeline (implemented in flua and
optional `sed`) that:
- Removes URLs, comments, copyright notices
- Canonicalizes typographic and spelling variants (e.g., licence →
license, organisation → organization)
- Replaces fancy quotes and normalizes whitespace
- Computes similarity via the Sørensen–Dice coefficient and reports the
top matches.
- Gracefully handles:
- Missing or improperly defined LICENSE_FILE
- Multiple LICENSE values with LICENSE_COMB and no per-license file
mapping
- Ports without any declared license metadata (using `-s` to search
source files)
SPDX license templates are cached in /var/db/ports-licenses/normalized/
to avoid repeated downloads. The matcher is modeled on prior work from
the spdx-license-matcher[1] project and reimplemented here in flua for
native use in the ports tree.
This tool relies on `libucl` for parsing the official SPDX license JSON
index. FreeBSD 15 includes `libucl` in the base system with flua
bindings at `/usr/lib/flua/ucl.so`. On earlier FreeBSD versions, users
must install `textproc/libucl` from ports to provide the required Lua
bindings (typically in `/usr/local/lib/lua/5.4/ucl.so`).
Makefile integration logic automatically:
- Invokes the checker based on LICENSE / LICENSE_FILE settings
- Falls back to scanning source files for SPDX headers if no license
file is found
- Skips SPDX matching when LICENSE_COMB=dual/multi is used with a single
LICENSE_FILE
This addition helps validate license declarations, identify
misattributed or legacy licenses, and assist maintainers in migrating
toward SPDX-aligned metadata.
If the similarity score is not exactly 1.000, the file is not a verbatim
copy of any SPDX license template. Manual review is advised in such
cases.
[1] https://github.com/spdx/spdx-license-matcher