Page MenuHomeFreeBSD

Doc toolchain PO translation support
ClosedPublic

Authored by wblock on Jun 29 2015, 9:10 PM.

Details

Summary

Give the doc toolchain a way to create PO files, and generate translated XML files from them.

Translators generate PO files with 'make po', then enter translated strings with poedit or the like. After enough strings have been translated to make a useful document, 'make tran' is used to generate the translated DocBook XML file from the PO file.

Translators commit their PO files, and the generated XML is only used temporarily for creating the final output format.

devel/gettext-tools and textproc/itstool are required. These will be added as dependencies to textproc/docproj.

Diff Detail

Repository
rD FreeBSD doc repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

wblock updated this revision to Diff 6559.Jun 29 2015, 9:10 PM
wblock retitled this revision from to Doc toolchain PO translation support.
wblock updated this object.
wblock edited the test plan for this revision. (Show Details)
wblock added reviewers: doceng, bcr, tabthorpe, rene, kwm, feld.
wblock set the repository for this revision to rD FreeBSD doc repository.
hrs added a subscriber: hrs.Jun 29 2015, 11:35 PM

These are quick comments and probably I will make more concrete suggestions later. I feel, however, what is the best depends on the expected translation workflow. Are po files created and then left without being committed? If possible, could you please summarize a typical procedure which a translator will follow into the SUMMARY section?

share/mk/doc.docbook.mk
275 ↗(On Diff #6559)

Please do not use $PWD and use $DOC_PREFIX. Current doc.*.mk are also broken in this regard, but current directory should be carefully handled because make(1) uses OBJDIR.

283 ↗(On Diff #6559)

Please consider to define a proper makefile target for a po file. Once we have such a target, make(1) can detect an old .po file compared to the xml file and kick a target for merge.

wblock updated this object.Jun 30 2015, 12:31 AM
wblock updated this revision to Diff 6564.Jun 30 2015, 1:53 AM

Add back ${DOC}.po target. Actually had that before and removed it in a flash of misinspiration.

wblock marked an inline comment as done.Jun 30 2015, 2:38 AM
wblock updated this revision to Diff 6586.Jun 30 2015, 2:35 PM

DTD modifications for <buildtarget> split off to https://reviews.freebsd.org/D2958.

wblock updated this object.Jun 30 2015, 2:37 PM
bcr edited edge metadata.Jul 4 2015, 2:19 PM

I think we need to consider a few cases when doing the msgmerge. I can think of the following:

  1. String is not present in the translations's PO file. This could be due to the following reasons in the en_US reference:

a) the string was added (new sentence/paragraph). This is probably the default case and is handled well by msgmerge.
b) the string was changed (can msgmerge detect and map these in the translation PO file accordingly?). What happens when a string is split up into two (or more) sentences? Are new IDs being generated for both strings or is the original one preserved and an additional one for the second sentence created?
c) the string was deleted (and so should be in the PO file to be removed in the translation as well)

I remember Kris Moore had to write a script to delete all strings in their Pootle database each time they imported a new en_US version for translators to avoid keeping old strings around.

These cases might be a bit to much for this review to cover. I think the effort should be made in trying out gettext tools and when we do, we should determine how to best deal with these cases. I hope this can be automated to cover these cases and let translators focus on the actual translation process, not having to ask "Is this string actually relevant anymore?".

wblock updated this revision to Diff 6794.Jul 8 2015, 11:20 PM
wblock edited edge metadata.
wblock removed rD FreeBSD doc repository as the repository for this revision.

Vastly reworked. PWD is not used. The normalized file is generated in the non-English directory where the user is working. Both itstool or po4a are supported, with po4a the default. Entities are not expanded. xmlns:xlink attributes are filtered out (crudely, that needs work), so that translators do not see them.

wblock updated this object.Jul 8 2015, 11:22 PM
wblock updated this object.
wblock marked an inline comment as done.Jul 8 2015, 11:36 PM

PWD is not needed with the latest version.

wblock updated this revision to Diff 6795.Jul 9 2015, 12:49 AM

Remove English parent doc dependencies that could not be satisfied. This was obscured in testing by previous copies of those files.

Also expand on notes on use of xmllint to remove redundant xmlns:xlink attributes. That is fragile and needs to be replaced with a better method.

wblock updated this revision to Diff 6876.Jul 12 2015, 3:05 PM

Add options to keep entities or expand them. Add a simple filter to remove redundant attributes so translators do not have to deal with them.

feld resigned from this revision.Jul 12 2015, 10:23 PM
feld removed a reviewer: feld.
hrs requested changes to this revision.Jul 13 2015, 7:56 PM
hrs added a reviewer: hrs.

I tried po4a-gettextize for 8-bit encodings such as EUC-JP and GB2312, but it seems {article,book}.translate.xml contains localized entity references expanded in wrong encoding. ${XMLLINT} for localized directory includes XML_CATALOG_FILES for localized versions, not for en_US.ISO8859-1. So the rendering results will have problem with expansion of entity reference (i.e. localized entities are used instead of English versions). This has to be fixed.

This revision now requires changes to proceed.Jul 13 2015, 7:56 PM
hrs added a comment.Jul 13 2015, 8:26 PM

Changes for doc.docbook.mk is not commit ready because most of rules are broken in terms of their dependency chain. I tried to fix them and put a diff to the following URL:

http://people.allbsd.org/~hrs/FreeBSD/doc.docbook.mk.20150714-1.diff

This still needs more improvement but should be much better than the current one.

hrs added inline comments.Jul 13 2015, 8:59 PM
share/mk/doc.docbook.mk
296 ↗(On Diff #6876)

Please do not edit XML file by using sed or something which does not understand document structure. It can break DTD conformance and generally is not allowed in XML processing the doc tree. Eliminating redundant namespace attributes can be done --nsclean option in xmllint, for example.

307 ↗(On Diff #6876)

Why can we assume UTF-8 here?

wblock updated this revision to Diff 7490.Jul 29 2015, 10:17 PM
wblock updated this object.
wblock edited edge metadata.

Remove the option to preserve entities. More than one source has said that is a mistake for translators.
Remove the option to use po4a, itstool works as long as there is no need to preserve entities.
Use ISO_LANG as the name of the two-character language code.

Rework patch based on hrs@'s suggestions.

wblock updated this revision to Diff 7886.Aug 11 2015, 10:49 PM
wblock edited edge metadata.

Rework.

This new diff addresses the worst problems. The part I don't like is that I had to wrap the translation targets in a .if to keep them from being built when non-translation FORMATS like html are being built. There might be a better way to do that.

If this diff is not close to being acceptable, please let me know and I'll just rewrite these functions as scripts to go in the Tools directory. Trying to get this Makefile to do the job is not worth spending much more time. We need to be getting translation teams involved in testing the translation functions. Thanks!

wblock updated this revision to Diff 8077.Aug 20 2015, 2:23 AM
wblock set the repository for this revision to rD FreeBSD doc repository.

Locale-style language codes (xx_YY) are used, and have been tested with the latest version of poedit (thanks to rodrigo@ for the updated port). The POSET_CMD script now correctly sets the language type in the header. Makefile changes have been reduced and simplified.

This revision was automatically updated to reflect the committed changes.