[new port] biology/p5-trimgalore: Wrapper around Cutadapt and FastQC for adapter and quality trimming
ClosedPublic

Authored by jwb on Dec 6 2017, 9:35 PM.

Details

Summary

[new port] biology/p5-trimgalore: Wrapper around Cutadapt and FastQC for adapter and quality trimming
Approved by: jrm (mentor) or wen (mentor)
Differential to be added

Test Plan

portlint -C: looks fine
Passed poudriere on {10.3,11.1}-{amd64,i386}
Tested by researchers

Diff Detail

Repository
rP FreeBSD ports repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
jwb created this revision.Dec 6 2017, 9:35 PM
jwb updated this revision to Diff 36317.Dec 6 2017, 9:43 PM

Add missing USES=python arg

jrm added inline comments.Dec 6 2017, 11:06 PM
biology/p5-trimgalore/Makefile
4 ↗(On Diff #36317)

DISTVERSION

14 ↗(On Diff #36317)

Maybe RUN_DEPENDS=cutadapt:biology/py-cutadapt@${PY_FLAVOR} \ then you could remove python:env below. Let's see what @koobs says.

koobs requested changes to this revision.Dec 7 2017, 1:13 AM
koobs added inline comments.
biology/p5-trimgalore/Makefile
14 ↗(On Diff #36317)

Don't count on me (yet) for flavors syntax/compliance, I haven't caught up on any fallout/considerations/advice since it was committed.

Since @mat is subscribed to this review im sure he can provide advice on it if necessary

30 ↗(On Diff #36317)

${COPYTREE_SHARE} ${WRKSRC}/test_files ${STAGEDIR}${DATADIR}

This revision now requires changes to proceed.Dec 7 2017, 1:13 AM
jwb added a comment.Dec 7 2017, 1:49 AM

Thanks for the tips. Can't believe I missed DISTVERSION... ;-)

I tend to prefer dependencies on package names over plist entries, as we have control over it. It's conceivable, though I think very unlikely, that the cutadapt script could be renamed in a future package, breaking trimgalore if it depends on the script name. E.g. they develop an incompatible cutadapt 2.x interface and decide to rename the legacy version cutadapt1 someday. Easy to fix, but but means there's no binary package for at least a few days and somebody's research is delayed...

The COPY_TREE suggestion actually does this:

ls work/stage/usr/local/share/trimgalore/usr/ports/biology/p5-trimgalore/work/TrimGalore-0.4.5/test_files/
4_seqs_with_Ns.fastq.gz illumina_100K.fastq.gz truncated.fq.gz
colorspace_file.fastq nextera_100K.fastq.gz
empty_file.fastq smallRNA_100K.fastq.gz

Almost every example of COPY_TREE follows a cd ${WRKSRC}...

jwb updated this revision to Diff 36326.Dec 7 2017, 1:50 AM

Fix issues noted by mentors

jrm added a comment.Dec 7 2017, 3:23 AM
In D13400#280001, @jwb wrote:

Almost every example of COPY_TREE follows a cd ${WRKSRC}...

My guess is that @koobs is pointing out that you should either use parentheses or not use cd. The reason is explained in MAKE(1) in the paragraph starting with "Makefiles should be written so that the mode of make operation does not change their behavior."

jwb added a comment.Dec 7 2017, 5:00 AM

Actually, parens would have no effect in this case. The cd only affects other commands on the same line anyway, since each line is executed under a new shell process that's a direct child of the make process. The Makefile below will show the different shell processes for each line and different output from pwd:

all:

ps && cd /tmp && pwd
ps && pwd
ps && (cd /tmp) && pwd
mat added inline comments.Dec 7 2017, 12:02 PM
biology/p5-trimgalore/Makefile
3 ↗(On Diff #36326)

This should be trim_galore.

14 ↗(On Diff #36317)

USES=python is required for any port that depends on a Python port[1]. USES=python:env is there when you don't need python but only the environment. Here, USES=python:run would work too as the Python module you depend on needs it anyway.

1: unless you depend on a Python module that works with all versions of Python, so that the default flavor can always be used, but it gets a bit wonky to assume that it always will, and if one day it doesn't, it will break the build, so add USES=python, it will not hurt.

jwb added a comment.Dec 7 2017, 2:22 PM

Thanks for clarifying the python dependency issue. I wasn't sure about that one... I updated my checklist accordingly.

The primary script installed is "trim_galore", but the project is "TrimGalore". I've always named my ports after the project/dist name, reducing caps to lower case so people don't have to waste time looking up what to capitalize in "pkg install name" or "cd /usr/ports/category/name". Is this not the norm?

mat added a comment.Dec 7 2017, 2:39 PM
In D13400#280071, @jwb wrote:

Thanks for clarifying the python dependency issue. I wasn't sure about that one... I updated my checklist accordingly.

The primary script installed is "trim_galore", but the project is "TrimGalore". I've always named my ports after the project/dist name, reducing caps to lower case so people don't have to waste time looking up what to capitalize in "pkg install name" or "cd /usr/ports/category/name". Is this not the norm?

The port should be named PORTNAME=TrimGalore, the only exception is when it installs a single binary and that it makes more sense to have it referenced by the thing you want to run than then name of the project, so PORTNAME=trim_galore. pkg install is case insensitive so it is of no matter.

jwb updated this revision to Diff 36338.Dec 7 2017, 3:05 PM

Change PORTNAME to match upstream dist

jwb added a comment.Dec 7 2017, 3:13 PM

Good stuff, thanks! I reran it through poudriere just to be safe.

jrm added a comment.EditedDec 7 2017, 4:08 PM

Should the directory now be biology/TrimGalore? There are port directories that do not match the case of PORTNAME, but many more do match.

Correction: biology/p5-TrimGalore

jrm added inline comments.Dec 7 2017, 4:15 PM
biology/p5-trimgalore/Makefile
20 ↗(On Diff #36338)

Based on @mat's comment, python or python:run here?

jwb added a comment.Dec 7 2017, 6:54 PM

I think mat's comment about python was confirming that USES=python:env is correct, but python:run wouldn't do any harm in this case. Since this port doesn't use python, but needs PYTHON_PKGNAMEPREFIX and PY_FLAVOR, I think python:env is the best pick.

I was in a discussion about port naming years ago and landed on the side of using all lower case to make things easier to remember and type. I didn't realize that "pkg" is case insensitive until mat mentioned it, so it's irrelevant for PORTNAME, but I'd still prefer lower case for the directory name which is case sensitive.

If anyone has a strong opinion to the contrary, let me know.

Thanks again...

jrm accepted this revision.Dec 7 2017, 8:27 PM

I don't have a strong option for either naming scheme [1], and there are many examples of both, so we should be safe from abuse. :)

jwb added a comment.Dec 8 2017, 3:59 AM

Alrighty, so I have a powerful ally...

Approved as is? Feel free to sleep on it, as a tree service knocked out my Internet today and I won't be able to commit until late tomorrow morning at the earliest.

mat added a comment.Dec 8 2017, 11:01 AM
In D13400#280331, @jrm wrote:

He has his own rules in his head that are not the official rules. Please keep PKGBASE and the directory name consistent.

jwb updated this revision to Diff 36401.Dec 9 2017, 3:11 AM

Rename with caps to match PORTNAME.

This does seem to be the pattern for p5 ports.

jrm accepted this revision.Dec 9 2017, 3:28 AM
This revision was automatically updated to reflect the committed changes.
jrm added a comment.Jan 16 2018, 8:16 PM
In D13400#280026, @jwb wrote:

Actually, parens would have no effect in this case. The cd only affects other commands on the same line anyway, since each line is executed under a new shell process that's a direct child of the make process. The Makefile below will show the different shell processes for each line and different output from pwd:

all:

ps && cd /tmp && pwd
ps && pwd
ps && (cd /tmp) && pwd

I stumbled upon this issue again today and I think I re-discovered why parentheses should be used to avoid side effects. Using make -j2 with a similar Makefile demonstrates why.

jrm@phe ~ % cat Makefile
all:
        cd /tmp && pwd
        pwd
jrm@phe ~ % make
cd /tmp && pwd
/tmp
pwd
/usr/home/jrm
jrm@phe ~ % make -j2
--- all ---
cd /tmp && pwd
/tmp
pwd
/tmp
jrm@phe ~ % cat Makefile
all:
        (cd /tmp && pwd)
        pwd
jrm@phe ~ % make
(cd /tmp && pwd)
/tmp
pwd
/usr/home/jrm
jrm@phe ~ % make -j2
--- all ---
(cd /tmp && pwd)
/tmp
pwd
/usr/home/jrm
jwb added a comment.Jan 17 2018, 1:54 PM

Fascinating...

I think you've discovered a BSD make bug. Using -j should not change the results of the build no matter how the makefile is coded.

FreeBSD imacbsd.acadix bacon ~ 180: make
cd /tmp && pwd
/tmp
pwd
/usr/home/bacon
FreeBSD imacbsd.acadix bacon ~ 181: make -j2

  • all ---

cd /tmp && pwd
/tmp
pwd
/tmp
FreeBSD imacbsd.acadix bacon ~ 182: gmake
cd /tmp && pwd
/tmp
pwd
/usr/home/bacon
FreeBSD imacbsd.acadix bacon ~ 183: gmake -j2
cd /tmp && pwd
/tmp
pwd
/usr/home/bacon

mat added a comment.Jan 17 2018, 2:04 PM
In D13400#292567, @jwb wrote:

Fascinating...

I think you've discovered a BSD make bug. Using -j should not change the results of the build no matter how the makefile is coded.

Scratch my head. I guess this is why MAKE_JOBS_UNSAFE does not exist.

Of course -j can changes the result of the build, if the Makefile are badly coded.

For instance, when using -j, multiple shell commands in a target can be parallelized.

jwb added a comment.Jan 17 2018, 3:06 PM

Apparently this *is* by design:

-j max_jobs
        Specify the maximum number of jobs that make may have running at
        any one time.  The value is saved in .MAKE.JOBS.  Turns compati-
        bility mode off, unless the B flag is also specified.  When com-
        patibility mode is off, all commands associated with a target are
        executed in a single shell invocation as opposed to the tradi-
        tional one shell invocation per line.  This can break traditional
        scripts which change directories on each command invocation and
        then expect to start with a fresh environment on the next line.
        It is more efficient to correct the scripts rather than turn
        backwards compatibility on.

So speed was chosen over consistent behavior with regard to -j. I guess I'll be adding parens to my makefiles from now on...