#1 Fully modernize the packaging
Merged 2 years ago by zbyszek. Opened 2 years ago by music.
rpms/ music/python-hdfs pyproject-rpm-macros  into  rawhide

file added
+41
@@ -0,0 +1,41 @@ 

+ From 697c460a3a58da299fc7e7d257984988c032abc1 Mon Sep 17 00:00:00 2001

+ From: "Benjamin A. Beasley" <code@musicinmybrain.net>

+ Date: Sun, 10 Oct 2021 11:41:34 -0400

+ Subject: [PATCH] Use unittest.mock where available

+ MIME-Version: 1.0

+ Content-Type: text/plain; charset=UTF-8

+ Content-Transfer-Encoding: 8bit

+ 

+ In Python 3.3 and later, unittest.mock belongs to the standard library,

+ and the PyPI backport module “mock” is not needed.

+ ---

+  doc/conf.py          | 5 ++++-

+  doc/requirements.txt | 2 +-

+  2 files changed, 5 insertions(+), 2 deletions(-)

+ 

+ diff --git a/doc/conf.py b/doc/conf.py

+ index 1b1d44e..7726848 100644

+ --- a/doc/conf.py

+ +++ b/doc/conf.py

+ @@ -14,7 +14,10 @@

+  

+  import os

+  import sys

+ -import mock

+ +try:

+ +    from unittest import mock

+ +except ImportError:

+ +    import mock

+  

+  MOCK_MODULES = ['fastavro', 'pandas', 'requests_kerberos']

+  for mod_name in MOCK_MODULES:

+ diff --git a/doc/requirements.txt b/doc/requirements.txt

+ index 6d3b812..3f2873f 100644

+ --- a/doc/requirements.txt

+ +++ b/doc/requirements.txt

+ @@ -1,4 +1,4 @@

+  avro

+  docopt

+  requests>=2.0.1

+ -mock

+ +mock;python_version<"3.3"

file added
+102
@@ -0,0 +1,102 @@ 

+ .TH HDFSCLI "1" "October 2021" "" "User Commands"

+ .SH NAME

+ .B hdfscli\-avro

+ \(en an Avro extension for HdfsCLI

+ .SH SYNOPSIS

+ .B hdfscli\-avro schema

+ .RB [ \-a\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .I HDFS_PATH

+ .P

+ .B hdfscli\-avro read

+ .RB [ \-a\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .RB [ \-F\fR\ \fIFREQ \ |\  \-n\fR\ \fINUM ]

+ .RB [ \-p\fR\ \fIPARTS ]

+ .I HDFS_PATH

+ .P

+ .B hdfscli write

+ .RB [ \-fa\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .RB [ \-C\fR\ \fICODEC ]

+ .RB [ \-S\fR\ \fISCHEMA ]

+ .I HDFS_PATH

+ .P

+ .B hdfscli\-avro

+ .BR \-L \ |\  \-h

+ .SH OPTIONS

+ .SS COMMANDS

+ .TP

+ .B schema

+ Pretty print schema.

+ .TP

+ .B read

+ Read an Avro file from HDFS and output records as JSON to standard out.

+ .TP

+ .B write

+ Read JSON records from standard in and serialize them into a single Avro file

+ on HDFS.

+ .SS ARGUMENTS

+ .TP

+ .I HDFS_PATH

+ Remote path to Avro file or directory containing Avro part-files.

+ .SS OPTIONS

+ .TP

+ .BR \-C\fR\ \fICODEC \  \-\-codec=\fICODEC

+ Compression codec.

+ Available values are among:

+ .BR null ,

+ .BR deflate ,

+ .BR snappy .

+ [default:

+ .BR deflate ]

+ .TP

+ .BR \-F\fR\ \fIFREQ \  \-\-freq=\fIFREQ

+ Probability of sampling a record.

+ .TP

+ .BR \-L \  \-\-log

+ Show path to current log file and exit.

+ .TP

+ .BR \-S\fR\ \fISCHEMA \  \-\-schema=\fISCHEMA

+ Schema for serializing records.

+ If not passed, it will be inferred from the first record.

+ .TP

+ .BR \-a \ \fIALIAS \-\-alias=\fIALIAS

+ Alias of namenode to connect to.

+ .TP

+ .BR \-f \  \-\-force

+ Overwrite any existing file.

+ .TP

+ .BR \-h \  \-\-help

+ Show a usage message and exit.

+ .TP

+ .BR \-n \ \fINUM \-\-num=\fINUM

+ Cap number of records to output.

+ .TP

+ .BR \-p \ \fIPARTS \-\-parts=\fIPARTS

+ Part-files to read.

+ Specify a number to randomly select that many, or a comma-separated list of

+ numbers to read only these.

+ Use a number followed by a comma (e.g.

+ .BR 1, )

+ to get a unique part-file.

+ The default is to read all part-files.

+ .TP

+ .BR \-v \  \-\-verbose

+ Enable log output.

+ Can be specified up to three times (increasing verbosity each time).

+ .SH EXAMPLES

+ .EX

+ .B hdfscli\-avro\ schema\ /data/impressions.avro

+ .EE

+ .EX

+ .B hdfscli\-avro\ read\ \-a\ dev\ snapshot.avro\ >snapshot.jsonl

+ .EE

+ .EX

+ .B hdfscli\-avro\ read\ \-F\ 0.1\ \-p\ 2,3\ clicks.avro

+ .EE

+ .EX

+ .B hdfscli\-avro\ write\ \-f\ positives.avro\ <positives.jsonl\ \-S\ "$(cat\ schema.avsc)"

+ .EE

+ .SH "SEE\ ALSO"

+ .BR hdfscli (1)

file added
+115
@@ -0,0 +1,115 @@ 

+ .TH HDFSCLI "1" "October 2021" "" "User Commands"

+ .SH NAME

+ .B hdfscli

+ \(en a command line interface for HDFS

+ .SH SYNOPSIS

+ .B hdfscli

+ .RB [ interactive ]

+ .RB [ \-a\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .P

+ .B hdfscli download

+ .RB [ \-fsa\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .RB [ \-t\fR\ \fITHREADS ]

+ .I HDFS_PATH

+ .I LOCAL_PATH

+ .P

+ .B hdfscli upload

+ .RB [ \-sa\fR\ \fIALIAS ]

+ .RB [ \-v ...]

+ .RB [ \-A \ |\  \-f ]

+ .RB [ \-t\fR\ \fITHREADS ]

+ .I LOCAL_PATH

+ .I HDFS_PATH

+ .P

+ .B hdfscli

+ .BR \-L \ |\  \-v \ |\  \-h

+ .SH OPTIONS

+ .SS COMMANDS

+ .TP

+ .B download

+ Download a file or folder from HDFS.

+ If a single file is downloaded,

+ .B \-

+ can be specified as

+ .I LOCAL_PATH

+ to stream it to standard out.

+ .TP

+ .B interactive

+ Upload a file or folder to HDFS.

+ .B \-

+ can be specified as

+ .I LOCAL_PATH

+ to read from standard in.

+ Start the client and expose it via the python interpreter (using

+ .BR ipython (1)

+ if available).

+ .TP

+ .B upload

+ Upload a file or folder to HDFS.

+ .B \-

+ can be specified as

+ .I LOCAL_PATH

+ to read from standard in.

+ .SS ARGUMENTS

+ .TP

+ .I HDFS_PATH

+ Remote HDFS path.

+ .TP

+ .I LOCAL_PATH

+ Path to local file or directory.

+ .SS OPTIONS

+ .TP

+ .BR \-A \  \-\-append

+ Append data to an existing file.

+ Only supported if uploading a single file or from standard in.

+ .TP

+ .BR \-L \  \-\-log

+ Show path to current log file and exit.

+ .TP

+ .BR \-V \  \-\-version

+ Show version and exit.

+ .TP

+ .BR \-a \ \fIALIAS \-\-alias=\fIALIAS

+ Alias of namenode to connect to.

+ .TP

+ .BR \-f \  \-\-force

+ Allow overwriting any existing files.

+ .TP

+ .BR \-s \  \-\-silent

+ Don\(cqt display progress status.

+ .TP

+ .BR \-t \ \fITHREADS \-\-threads=\fITHREADS

+ Number of threads to use for parallelization.

+ .B 0

+ allocates a thread per file.

+ [default:

+ .BR 0 ]

+ .TP

+ .BR \-v \  \-\-verbose

+ Enable log output.

+ Can be specified up to three times (increasing verbosity each time).

+ .SH "EXIT\ STATUS"

+ HdfsCLI exits with return status

+ .B 1

+ if an error occurred and

+ .B 0

+ otherwise.

+ .SH EXAMPLES

+ .EX

+ .B hdfscli\ \-a\ prod\ /user/foo

+ .EE

+ .EX

+ .B hdfscli\ download\ features.avro\ dat/

+ .EE

+ .EX

+ .B hdfscli\ download\ logs/1987\-03\-23\ \-\ >>logs

+ .EE

+ .EX

+ .B hdfscli\ upload\ \-f\ \-\ data/weights.tsv\ <weights.tsv

+ .EE

+ .SH "SEE\ ALSO"

+ .BR hdfscli\-avro (1)

+ .P

+ .BR ipython (1)

file modified
+182 -126
@@ -1,158 +1,214 @@ 

- # https://fedoraproject.org/wiki/Packaging:DistTag?rd=Packaging/DistTag#Conditionals

- %if 0%{?fedora} < 30

- %global with_py2 1

- %else

- %global with_py2 0

+ # Sphinx-generated HTML documentation is not suitable for packaging; see

+ # https://bugzilla.redhat.com/show_bug.cgi?id=2006555 for discussion.

+ #

+ # We can generate PDF documentation as a lesser substitute.

+ %bcond_without doc_pdf

+ 

+ Name:           python-hdfs

+ Version:        2.5.8

+ Release:        10%{?dist}

+ Summary:        API and command line interface for HDFS

+ 

+ License:        MIT

+ URL:            https://github.com/mtth/hdfs

+ Source0:        %{url}/archive/%{version}/hdfs-%{version}.tar.gz

+ # Downstream man pages in groff_man(7) format. These were written for Fedora

+ # based on the tools’ --help output and should be updated if the command-line

+ # interface changes.

+ Source1:        hdfscli.1

+ Source2:        hdfscli-avro.1

+ 

+ # Use unittest.mock where available

+ # https://github.com/mtth/hdfs/pull/177

+ Patch0:         https://github.com/mtth/hdfs/pull/177.patch

+ 

+ # The base package is arched because extras metapackages requiring fastavro are

+ # not available on 32-bit architectures

+ # (https://bugzilla.redhat.com/show_bug.cgi?id=1943932).

+ %ifnarch %{arm32} %{ix86}

+ %global fastavro_arch 1

  %endif

- %global srcname     hdfs

- %global sum         HdfsCLI: API and command line interface for HDFS

+ # Of the binary RPMs, only the conditionally-enabled extras metapackages

+ # python3-fastavro+avro and python3-fastavro+dataframe are arched.

+ #

+ # Since there is no compiled code, there are no debugging symbols.

+ %global debug_package %{nil}

  

- Name:       python-%{srcname}

- Version:    2.5.8

- Release:    9%{?dist}

- Summary:    %{sum}

+ BuildRequires:  python3-devel

+ 

+ # Extra dependencies for documentation

+ %if %{with doc_pdf}

+ BuildRequires:  make

+ BuildRequires:  %{py3_dist sphinx}

+ BuildRequires:  python3-sphinx-latex

+ BuildRequires:  latexmk

+ %endif

+ 

+ %global _description %{expand:

+ %{summary}.

+ 

+ Features:

+ 

+ • Python bindings for the WebHDFS (and HttpFS) API, supporting both secure and

+   insecure clusters.

+ • Command line interface to transfer files and start an interactive client

+   shell, with aliases for convenient namenode URL caching.

+ • Additional functionality through optional extensions:

+   ○ avro, to read and write Avro files directly from HDFS.

+   ○ dataframe, to load and save Pandas dataframes.

+   ○ kerberos, to support Kerberos authenticated clusters.}

  

- License:    MIT

- URL:        https://github.com/mtth/%{srcname}

- Source0:    https://github.com/mtth/%{srcname}/archive/%{version}/%{srcname}-%{version}.tar.gz

+ %description %{_description}

+ 

+ 

+ %package -n python3-hdfs

+ Summary:        %{summary}

  

  BuildArch:      noarch

  

- %global _description Python (2 and 3) bindings for the WebHDFS (and HttpFS) \

- API, supporting both secure and insecure clusters.  Command line interface to \

- transfer files and start an interactive client shell, with aliases for \

- convenient name-node URL caching.  Additional functionality through optional \

- extensions: Avro, to read and write Avro files directly from HDFS.  data-frame, \

- to load and save Pandas data-frames.  Kerberos, to support Kerberos \

- authenticated clusters.

- 

- %description

- %{_description}

- 

- %if %{with_py2}

- %package -n python2-%{srcname}

- Summary:        %{sum}

- BuildRequires:  python2-devel

- BuildRequires:  %{py2_dist setuptools}

- BuildRequires:  %{py2_dist six}

- BuildRequires:  %{py2_dist fastavro}

- BuildRequires:  %{py2_dist pandas}

- BuildRequires:  %{py2_dist requests-kerberos}

- BuildRequires:  %{py2_dist nose}

- BuildRequires:  %{py2_dist mock}

- Requires:       %{py2_dist six}

- Requires:       %{py2_dist requests}

- Requires:       %{py2_dist docopt}

- Requires:       %{py2_dist fastavro}

- Requires:       %{py2_dist pandas}

- Requires:       %{py2_dist requests-kerberos}

- Requires:   %{py2_dist mock}

- 

- %{?python_provide:%python_provide python2-%{srcname}}

- 

- %description -n python2-%{srcname}

- %{_description}

- %endif

+ %description -n python3-hdfs %{_description}

  

- %package -n python3-%{srcname}

- Summary:        %{sum}

- BuildRequires:  python3-devel

- BuildRequires:  %{py3_dist setuptools}

- BuildRequires:  %{py3_dist six}

- BuildRequires:  %{py3_dist fastavro}

- BuildRequires:  %{py3_dist pandas}

- BuildRequires:  %{py3_dist requests-kerberos}

- BuildRequires:  %{py3_dist nose}

- BuildRequires:  %{py3_dist mock}

- Requires:       %{py3_dist six}

- Requires:       %{py3_dist requests}

- Requires:       %{py3_dist docopt}

- Requires:       %{py3_dist fastavro}

- Requires:       %{py3_dist pandas}

- Requires:       %{py3_dist requests-kerberos}

- Requires:   %{py3_dist mock}

- %{?python_provide:%python_provide python3-%{srcname}}

- 

- %description -n python3-%{srcname}

- %{_description}

  

  %package doc

- Summary:    Documentation for %{name}

- BuildRequires:  %{py3_dist sphinx}

- # Should docs require the main package?

+ Summary:    Documentation and examples for %{name}

+ 

+ BuildArch:      noarch

+ 

+ %description doc %{_description}

  

- %description doc

- %{_description}

  

  %prep

- %autosetup -n %{srcname}-%{version}

- rm -rf *.egg-info

+ %autosetup -n hdfs-%{version} -p1

  

- %build

- %if %{with_py2}

- %py2_build

- %endif

- %py3_build

+ # Remove shebangs from non-script sources. The find-then-modify pattern keeps

+ # us from discarding mtimes on sources that do not need modification.

+ find . -type f ! -perm /0111 \

+     -exec gawk '/^#!/ { print FILENAME }; { nextfile }' '{}' '+' |

+   xargs -r -t sed -r -i '1{/^#!/d}'

  

- pushd doc

-     PYTHONPATH=../ sphinx-build-3 . html

-     rm -fvr html/{.buildinfo,.doctrees}

- popd

+ cp -p '%{SOURCE1}' %{?fastavro_arch:'%{SOURCE2}'} .

  

- # Remove shebang from examples in doc

- for example in examples/*.py; do

-     sed '1{\@^#!/usr/bin/env python@d}' $example > $example.new &&

-     touch -r $example $example.new &&

-     mv $example.new $example

- done

  

- %install

- %if %{with_py2}

- %py2_install

+ %generate_buildrequires

+ %pyproject_buildrequires -x kerberos%{?fastavro_arch:,avro,dataframe}

+ 

+ 

+ # We manually write out the python3-hdfs+kerberos metapackage so that it (like

+ # python3-hdfs) can be noarch even though the base package is arched. The

+ # definition is based on:

+ #

+ #   rpm -E '%%pyproject_extras_subpkg -n python3-hdfs kerberos

+ %package -n python3-hdfs+kerberos

+ Summary:        Metapackage for python3-hdfs: kerberos extras

+ 

+ BuildArch:      noarch

+ 

+ Requires:       python3-hdfs = %{version}-%{release}

+ 

+ %description -n python3-hdfs+kerberos

+ This is a metapackage bringing in kerberos extras requires for python3-hdfs.

+ It makes sure the dependencies are installed.

+ 

+ %files -n python3-hdfs+kerberos -f %{_pyproject_ghost_distinfo}

+ 

+ 

+ %if 0%{?fastavro_arch}

+ 

+ # Note that this subpackage is arched because it is not available on 32-bit

+ # architectures.

+ #

+ # We manually write out the python3-hdfs+avro subpackage so that it can contain

+ # the hdfscli-avro CLI entry point, and so that its summary and description can

+ # be tweaked to reflect this.  The definition is based on:

+ #

+ #   rpm -E '%%pyproject_extras_subpkg -n python3-hdfs avro

+ %package -n python3-hdfs+avro

+ Summary:        Package for python3-hdfs: avro extras

+ 

+ Requires:       python3-hdfs = %{version}-%{release}

+ 

+ %description -n python3-hdfs+avro

+ This is a package bringing in avro extras requires for python3-hdfs.

+ It makes sure the dependencies are installed.

+ 

+ It also includes the avro-specific command-line tool, hdfscli-avro.

+ 

+ %files -n python3-hdfs+avro -f %{_pyproject_ghost_distinfo}

+ %{_bindir}/hdfscli-avro

+ %{_mandir}/man1/hdfscli-avro.1*

+ 

+ 

+ # Note that this metapackage is arched because it is not available on 32-bit

+ # architectures.

+ %pyproject_extras_subpkg -n python3-hdfs dataframe

+ 

  %endif

- %py3_install

- 

- # Remove shebang from libraries

- # probably easier to use find, but the wiki suggests a for loop

- %if %{with_py2}

- for lib in %{buildroot}%{python2_sitelib}/%{srcname}/*.py %{buildroot}%{python2_sitelib}/%{srcname}/ext/*.py %{buildroot}%{python2_sitelib}/%{srcname}/ext/avro/*.py;

- do

-     echo "Working on $lib"

-     sed '1{\@^#!/usr/bin/env python@d}' $lib > $lib.new &&

-     touch -r $lib $lib.new &&

-     mv $lib.new $lib

- done

+ 

+ 

+ %build

+ %pyproject_wheel

+ 

+ %if %{with doc_pdf}

+ PYTHONPATH="${PWD}" sphinx-build -b latex doc _latex %{?_smp_mflags}

+ %make_build -C _latex

  %endif

  

- for lib in %{buildroot}%{python3_sitelib}/%{srcname}/*.py %{buildroot}%{python3_sitelib}/%{srcname}/ext/*.py %{buildroot}%{python3_sitelib}/%{srcname}/ext/avro/*.py;

- do

-     echo "Working on $lib"

-     sed '1{\@^#!/usr/bin/env python@d}' $lib > $lib.new &&

-     touch -r $lib $lib.new &&

-     mv $lib.new $lib

- done

  

- # Ignore tests - require a hadoop cluster setup

+ %install

+ %pyproject_install

+ %pyproject_save_files hdfs

+ install -t '%{buildroot}%{_mandir}/man1' -D -p -m 0644 \

+     hdfscli.1 %{?fastavro_arch:hdfscli-avro.1}

+ 

+ 

+ %check

+ # Ignore upstream tests - require a hadoop cluster setup

  # https://github.com/mtth/hdfs/blob/master/.travis.yml#L10

  

- %if %{with_py2}

- %files -n python2-%{srcname}

- %license LICENSE

- %{python2_sitelib}/%{srcname}-%{version}-py?.?.egg-info

- %{python2_sitelib}/%{srcname}/

+ %{py3_check_import hdfs

+   hdfs.client

+   hdfs.config

+   hdfs.util

+   hdfs.ext

+   hdfs.ext.kerberos}

+ %if 0%{?fastavro_arch}

+ %{py3_check_import hdfs.ext.avro hdfs.ext.dataframe}

  %endif

  

- %files -n python3-%{srcname}

- %license LICENSE

- %{python3_sitelib}/%{srcname}-%{version}-py%{python3_version}.egg-info

- %{python3_sitelib}/%{srcname}/

- %{_bindir}/%{srcname}*

+ 

+ %files -n python3-hdfs -f %{pyproject_files}

+ # pyproject-rpm-macros handles the license file; verify with rpm -qL -p …

+ %{_bindir}/hdfscli

+ %{_mandir}/man1/hdfscli.1*

+ # This is packaged in python3-hdfs+avro on 64-bit architectures; it is not

+ # packaged at all on 32-bit architectures.

+ %exclude %{_bindir}/hdfscli-avro

+ 

  

  %files doc

- %doc examples AUTHORS CHANGES README.md doc/html

  %license LICENSE

+ %doc AUTHORS

+ %doc CHANGES

+ %doc README.md

+ %if %{with doc_pdf}

+ %doc _latex/hdfs.pdf

+ %endif

+ %doc examples

+ 

  

  %changelog

+ * Sun Oct 10 2021 Benjamin A. Beasley <code@musicinmybrain.net> - 2.5.8-10

+ - Fully modernize the packaging

+ - Switch to “new guidelines” / pyproject-rpm-macros

+ - Drop conditionals for Python 2 on obsolete Fedora releases

+ - Rely on Python dependency generator (no manual Requires)

+ - Build PDF instead of HTML documentation due to guideline issues

+ - Drop dependencies on deprecated nose and mock

+ - Properly handle extras metapackages and dependency on unported

+   python-fastavro for 32-bit architectures; move the hdfscli-avro entry point

+   into the new python3-hdfs+avro package

+ - Add man pages for command-line tools

+ 

  * Fri Jul 23 2021 Fedora Release Engineering <releng@fedoraproject.org> - 2.5.8-9

  - Rebuilt for https://fedoraproject.org/wiki/Fedora_35_Mass_Rebuild

  

  • Switch to “new guidelines” / pyproject-rpm-macros
  • Drop conditionals for Python 2 on obsolete Fedora releases
  • Rely on Python dependency generator (no manual Requires)
  • Build PDF instead of HTML documentation due to guideline issues
  • Drop dependencies on deprecated nose and mock
  • Properly handle extras metapackages and dependency on unported
    python-fastavro for 32-bit architectures; move the hdfscli-avro entry point
    into the new python3-hdfs+avro package
  • Add man pages for command-line tools

The work on separating the fastavro dependency into the appropriate extras metapackages and conditionalizing them will resolve the intermittent Koschei failures that occur whenever this package hits a 32-bit builder.

# Downstream man pages in groff_man(7) format:

It'd be nice to give a full url and/or comment where those came from, and how to update them. I have no idea what "downstream" means here.

Why not just:

find . -type f ! -perm /0111 \
    -exec grep -q '/^#!/' {} +  -exec sed -r -i '1{/^#!/d}' {} +

?

It'd be nice to give a full url and/or comment where those came from, and how to update them. I have no idea what "downstream" means here.

It means I wrote them for Fedora by hand, based on the output of hdfscli --help and hdfscli-avro --help. There are a few options here:

  • Keep as-is and clarify their origin with a more detailed comment, noting that they need to be updated if the command syntax changes.
  • Try to get the man pages upstream. Experience shows that only a tiny minority of Python upstreams are willing to maintain man pages in parallel with any existing documentation, and there is a lack of decent tooling to automatically generate them, so the upstream take rate on man pages for Python command-line tools is very low. Upstreams also often have a hard time accepting that there’s no generally-reliable way to install man pages with setuptools, and the best they can do is usually just to put them in the source distribution. However, I have very occasionally encountered Python upstreams that were willing to maintain contributed man pages.
  • Use man pages generated at build time by help2man instead. These wouldn’t have to be maintained. They are of lower quality than the hand-written ones, but are generally legible, except that the examples all run together into one line.
  • Decide not to bother, and ship no man pages.

Why not just:

That’s a great idea. I had never thought about whether find would accept more than one command to execute, but it makes sense now that I see it.

That’s a great idea. I had never thought about whether find would accept more than one command to execute, but it makes sense now that I see it.

The suggested expression:

find . -type f ! -perm /0111 \
    -exec grep -q '/^#!/' {} +  -exec sed -r -i '1{/^#!/d}' {} +

…has to be changed a little to be correct. Here grep will get an arbitarily-sized batch of arguments, and will exit with code zero if any of them has a line that looks like a shebang in it. In a practical application in a package, there’s probably only one grep invocation, so all the files get sed’ed. The following corrects this:

find . -type f ! -perm /0111 \
    -exec grep -q '/^#!/' {} ;  -exec sed -r -i '1{/^#!/d}' {} +

Another difference from my original version in this PR is that this version will match shebang-like constructions after the first line, as in the following entirely contrived file contents:

cat <<EOF
#!/usr/bin/python3
print("Hello world")
EOF

I have seen this sort of thing, but it’s rare, and since the sed expression is written to operate only on the first line the consequence of an error of this type is only to unnecessarily discard the upstream timestamp on a file that did not need to be modified. Still, it’s possible to reconstruct the original behavior exactly in the spirit of your suggestion:

find . -type f ! -perm /0111 \
    -exec awk '{exit !/^#!/}' '{}' ';' -exec sed -r -i '1{/^#!/d}' '{}' '+'

This should behave exactly the same as my original expression. It is a little shorter; it is arguably simpler, since it drops xargs and packs everything into the find command; and it should not require GNU awk, not that that matters here. The trade-off is that it does invoke awk once per candidate file rather than letting awk operate on batches of files, which I expect would make it slightly slower on a giant source tree.

It means I wrote them for Fedora by hand, based on the output of hdfscli --help and hdfscli-avro --help. There are a few options here:

Keep as-is and clarify their origin with a more detailed comment, noting that they need to be updated if the command syntax changes.

Oh, OK. So maybe just add a little comment. "Downstream" suggests something like this, but it's not entirely obvious.

find . -type f ! -perm /0111 \ -exec awk '{exit !/^#!/}' '{}' ';' -exec sed -r -i '1{/^#!/d}' '{}' '+'

Oh, that's quite complicated. Sorry for sending you on a wild goose chase ;) The original version wasn't bad, but I thought it could be made a bit better without much work.

1 new commit added

  • Clarify man page origin
2 years ago

Oh, OK. So maybe just add a little comment. "Downstream" suggests something like this, but it's not entirely obvious.

Done! Thanks for the feedback.

Oh, that's quite complicated. Sorry for sending you on a wild goose chase ;)

It’s fine; I did learn something useful from your suggestion, even if it didn’t end up helping much in this particular case.

The tests look good and CI passes, let's merge this.

Pull-Request has been merged by zbyszek

2 years ago

It's building for rawhide. Let me know if I should do more builds.

Thanks. I’m inclined to go ahead and roll this out to all Fedora releases since it includes some user-visible improvements and solves the noisy periodic Koschei failures. However, I’m able to handle that myself as a @neuro-sig member.

(Perhaps obviously, the use of a PR here was to allow for review of major packaging changes, even though I have group commit privileges.)