#12 Make %py_byte_compile terminate build on SyntaxErrors (#1616219)
Merged 5 years ago by tibbs. Opened 5 years ago by churchyard.
rpms/ churchyard/python-rpm-macros py_byte_compile_fatal  into  master

file modified
+12 -20
@@ -3,25 +3,17 @@ 

  # Python's compile_all module only works on directories, and requires a max

  # recursion depth

  

- # Note that the py_byte_compile macro should work for all Python versions

- # Which unfortunately makes the definition more complicated than it should be

+ # Usage:

+ #    %py_byte_compile <interpereter> <path>

+ # Example:

+ #    %py_byte_compile %{__python3} %{buildroot}%{_datadir}/spam/plugins/

+ 

+ # This will terminate build on SyntaxErrors, if you want to avoid that,

+ # use it in a subshell like this:

+ #    (%{py_byte_compile <interpereter> <path>}) || :

  

  %py_byte_compile()\

- py2_byte_compile () {\

-     python_binary="%1"\

-     bytecode_compilation_path="%2"\

-     find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -c 'import py_compile, sys; [py_compile.compile(f, dfile=f.partition("$RPM_BUILD_ROOT")[2]) for f in sys.argv[1:]]' || :\

-     find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -O -c 'import py_compile, sys; [py_compile.compile(f, dfile=f.partition("$RPM_BUILD_ROOT")[2]) for f in sys.argv[1:]]' || :\

- }\

- \

- py3_byte_compile () {\

-     python_binary="%1"\

-     bytecode_compilation_path="%2"\

-     find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -O -c 'import py_compile, sys; [py_compile.compile(f, dfile=f.partition("$RPM_BUILD_ROOT")[2], optimize=opt) for opt in range(2) for f in sys.argv[1:]]' || :\

- }\

- \

- # Get version without a dot (36 instead of 3.6), bash doesn't compare floats well \

- python_version=$(%1 -c "import sys; sys.stdout.write('{0.major}{0.minor}'.format(sys.version_info))") \

- # The bytecompilation syntax has changed between Python 3.4 and Python 3.5, so for 3.4 and earlier we use the "Python 2" syntax \

- [ "$python_version" -ge 35 ] && py3_byte_compile "%1" "%2" || py2_byte_compile "%1" "%2" \

- %{nil}

+ python_binary="%1"\

+ bytecode_compilation_path="%2"\

+ find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -O -m py_compile\

+ find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -m py_compile

file modified
+4 -1
@@ -1,6 +1,6 @@ 

  Name:           python-rpm-macros

  Version:        3

- Release:        36%{?dist}

+ Release:        37%{?dist}

  Summary:        The unversioned Python RPM macros

  

  License:        MIT
@@ -71,6 +71,9 @@ 

  

  

  %changelog

+ * Wed Aug 15 2018 Miro Hrončok <mhroncok@redhat.com> - 3-37

+ - Make %%py_byte_compile terminate build on SyntaxErrors (#1616219)

+ 

  * Wed Aug 15 2018 Miro Hrončok <mhroncok@redhat.com> - 3-36

  - Make %%py_build wokr if %%__python is defined to custom value

  

no initial comment

Since you asked in IRC about it, the %nil at the end just ensures that after the macro has been processed, a newline is emitted. RPM will strip trailing blank lines from a macro expansion unless something was expanded in them, even if that something just expands to nothing. Which is weird, but whatever.

This matters if someone does something like%{py_byte_compile a b} c. With the trailing %nil the c ends up on its own line. Without it, it ends up being appended to the last line of the shell code output by the macro. Depending on what the last line of code looks like, this can have either bizarre or potentially useful effects. In this particular case, the comment block you added absolutely relies on the %nil not being there, as || : on a line by itself is a syntax error.

However, note that this trick of appending ||: is rather narrowly applicable to the way the last line of the macro has been formatted.

[ "$python_version" -ge 35 ] && py3_byte_compile "a" "b" || py2_byte_compile "a" "b" || :

What's smarter is to use a subshell: ( %{py_byte_compile a b} ) || : as this now does not care about newlines or program flow.

Finally, now that py3_byte_compile() can fail, the last line of the macro really doesn't do the right thing. It will try to do python3 byte compilation and then try to do python2 byte compilation. Which will probably just fail as well, but the result will be confusing. And if that unwanted second compilation succeeds, then that would be pretty bad.

So I think the last line needs to get refactored to use proper control flow. That will break the ||: trick but wrapping in a subshell should work OK (unless someone somehow expects those two functions to remain defined after the macro is done).

Finally, now that py3_byte_compile() can fail, the last line of the
macro really doesn't do the right thing. It will try to do python3
byte compilation and then try to do python2 byte compilation. Which
will probably just fail as well, but the result will be confusing. And
if that unwanted second compilation succeeds, then that would be
pretty bad.

Ha, that's exactly what I stumbled upon even before this change
-- it used to be harmless, at least.

Well, there was another bug where the sense of the test was backwards. Before this change, the py3_byte_compile function simply doesn't fail so it's tough to imagine a situation where you would have seen what I've described.

But in any case, there's just no need to try to cram that into a single line.

so the proper way is to keep the nil thing and recommend ( %{py_byte_compile a b} ) || : for the "I don't care about failures, right?

Perhaps what I'd like to see is that SyntaxError are listed all
at once. Consider mass rebuild with new Python version that occupies
even more previously valid identifiers as reserved keywords
-- one would certainly want to see all these occurrences right away,
not just the first one. Definitely a time saver for the maintainers.

I.e., something like:

... | xargs -0 -I{} sh -c '{ %{__python3} -m py_compile {} && %{__python3} -O -m py_compile {}; echo "EC=$?"; } 2>&1
  | tee -a /dev/stderr | grep -E "^SyntaxError:|^EC=" | sed "/EC=0/q;/SyntaxError/q1;q255"'

EDIT: Hmm, perhaps all that's really needed is:

... | xargs -0 sh -c '%{__python3} -m py_compile "$@"
  && %{__python3} -O -m py_compile "$@"'

It shouldn't stop with the first failure, yet the summary exit status is per expectation
and the diagnostics reported (for more than SyntaxError but should be generally
fine).

%nil doesn't really matter at all.

Recommending wrapping the call in ( ) || : does appear to do the right thing for me.

@jpokorny: I'm not sure it's worth overcomplicating the thing but if there's something simple which would make it easier for packagers then great.

rebased onto def9ba6dc57397ec264764bcf8a5c23c734bd6e7

5 years ago

OK, I've pushed another version. It works the same for all Pythons (>= 2.7, 3.2), so no need for the double function again. It will compile everything and only fail at the end if there was a problem.

If I'm not missing something, this solution is elegant and removes a lot of cruft.

Thanks for leveraging the -m py_compile idea, though I see two issues
with your trimmed down variant:

  • negligible: inefficiency regarding repeated path traversal and
    file discovery (putting aside another thing, which may not be
    regulated in the packaging guidelines, but new lines in
    the file names are naturally not endorsed in the distro
    -- rpm would perhaps be the main choking point, anyway)

  • questionable: why the optimization round is attempted even if the
    plain bytecompilation failed? apparently it is possible to hide
    real syntax errors in the assert statements[*] that would then get
    optimized away, hence masking the real problem -- note that
    Python based programs are typically not run with -O, AFAIK

[*] see:

$ python3 -c 'assert (sdfs)' && echo OK
 python3 -c 'assert (sdfs)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'sdfs' is not defined

$ python3 -Oc 'assert (sdfs)' && echo OK
OK

inefficiency regarding repeated path traversal

well, do we really care? it was always the case for python2 anyway

new lines in the file names

the print0 thing was used before, we can go back to that

why the optimization round is attempted even if the plain bytecompilation failed

the main idea here is that if the packager does the || : thing, we want the Ok files to be bytecompiled with both optimized Python and not optimized.

apparently it is possible to hide real syntax errors in the assert statements[*] that would then get optimized away, hence masking the real problem

OK, we can do it in opposite order. Unless you can find and example where -O would fail while it would succeed without.

BTW assert(0boom) would not compile even with -O, assert(boom) would compile with both.

rebased onto d8f74272b40c67266637d75eafb2ed93ead08ecd

5 years ago

OK, switched the order and back to -print0 | xargs -0.

the main idea here is that if the packager does the || : thing, we
want the Ok files to be bytecompiled with both optimized Python and
not optimized.

Oh, I see now!

Since the macro is (at least currently, IIUIC) to be used in a naked
form, not as a part of another one, I'd actually propose to resolve
all these use cases better:

  1. standard use -> do not attempt optimizing run when non-optimizing
    one failed

  2. best-effort one (per above) -> self-explanatory

by the means of interpreter passed to %py_byte_compile:

  1. %{__python2} or %{__python3} as usual

  2. py2_compile_keepgoing or py3_compile_keepgoing that could
    be shell functions defined like py*_byte_compile previously
    were, with a body like this:

py3_compile_keepgoing () { %{__python3} "$@" || :; }

Alternatively (and even better, IMHO), the above could be just an
implementation detail related to a new, binary (O/1), optional,
3rd argument to %py_byte_compile macro. It would default to
a previous behaviour, indeed.

Inner command in meta syntax would then be:

<figured out failfast/keepgoing Python command> -m py_compile <input> \
&& <figured out failfast/keepgoing Python command> -O -m py_compile <input>

Over time, it would become apparent which packages are not satisfied
with recursive tree traversal granularity as they would be using
py3_compile_keepgoing. And those that would silence errors from
py_byte_compile would transition into "doing it wrong" position
-- they should either not use that macro at all or turn that into
an explicit keepgoing mode, preferably with a justification.

My 1 Kč :-)


BTW assert(0boom) would not compile even with -O, assert(boom) would
compile with both.

Sorry, jumped into conclusion about mere parsing based on the
full-fledged interpretation :-/

I'd rather not change the API very much. We are trying to fix a bug here, not save the world.

I honestly think the solution is good enough now.

Please note that if you wrap multiple shell statements in ( ... ) ||: then failure will be suppressed and all statements executed regardless of which statement fails. I think that's slightly counterintuitive but the presence of ||: on the end changes the value of -e in the subshell.

Just open a specfile and stick this at the beginning of %prep and do fedpkg prep:

echo XXXXXXXXXX; false; echo YYYYYYY; true; echo ZZZZZZZZZ

Then wrap that in parentheses. Still fails at the first false. Then append ||:. Now all statements in the subshell are executed.

So I would think that if you want to capture failures when they happen (unless things are wrapped in ( ... ) ||:), you would not use ||: inside the macro at all, or at least not at places where it should be allowed to fail.

The ||: in the macro is there not to end the execution on the failure, even if the packager does not use || : or (...) || : at all.

the dies here is: we don't care if the -O compilation failed or not. We do the non--O thing anyway and it will fail if -O failed.

%install
echo 'assert(0boom)' > %{buildroot}/bad.py
echo 'assert(0)' > %{buildroot}/good.py
%py_byte_compile python3 %{buildroot}
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.D4gCqD
+ umask 022
+ cd /home/churchyard/rpmbuild/BUILD
+ '[' /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 '!=' / ']'
+ rm -rf /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
++ dirname /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ mkdir -p /home/churchyard/rpmbuild/BUILDROOT
+ mkdir /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ echo 'assert(0boom)'
+ echo 'assert(0)'
+ python_binary=python3
+ bytecode_compilation_path=/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ find /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -O -m py_compile
  File "/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

+ :
+ find /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -m py_compile
  File "/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

error: Bad exit status from /var/tmp/rpm-tmp.D4gCqD (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.D4gCqD (%install)
%install
echo 'assert(0boom)' > %{buildroot}/bad.py
echo 'assert(0)' > %{buildroot}/good.py
%{py_byte_compile python3 %{buildroot}} || :
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.jXqY0V
+ umask 022
+ cd /home/churchyard/rpmbuild/BUILD
+ '[' /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 '!=' / ']'
+ rm -rf /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
++ dirname /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ mkdir -p /home/churchyard/rpmbuild/BUILDROOT
+ mkdir /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ echo 'assert(0boom)'
+ echo 'assert(0)'
+ python_binary=python3
+ bytecode_compilation_path=/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
+ xargs -0 python3 -O -m py_compile
+ find /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 -type f -a -name '*.py' -print0
  File "/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

+ :
+ find /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -m py_compile
  File "/home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

+ :
+ /usr/lib/rpm/check-rpaths
+ /usr/lib/rpm/check-buildroot
+ /usr/lib/rpm/redhat/brp-ldconfig
/sbin/ldconfig: Warning: ignoring configuration file that cannot be opened: /etc/ld.so.conf: No such file or directory
+ /usr/lib/rpm/brp-compress
+ /usr/lib/rpm/brp-strip /usr/bin/strip
+ /usr/lib/rpm/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
+ /usr/lib/rpm/brp-strip-static-archive /usr/bin/strip
+ /usr/lib/rpm/brp-python-hardlink
+ /usr/lib/rpm/redhat/brp-mangle-shebangs '' ''
/usr/share/lmod/lmod/init/bash: line 15: __lmod_vx: unbound variable
Processing files: reproducer-2.0.0-1.fc28.x86_64
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/churchyard/rpmbuild/BUILDROOT/reproducer-2.0.0-1.fc28.x86_64
error: Installed (but unpackaged) file(s) found:
   /__pycache__/good.cpython-36.opt-1.pyc
   /__pycache__/good.cpython-36.pyc
   /bad.py
   /good.py

I think this works as expected.

Ok, thanks to comments by @tibbs, I think I have an acceptable solution that
satisfies both 1. + 2. in my previous comment AND is free from the
complexity you want to (understandably!) avoid:

%global _python_bytecompile_extra 0

%define py_byte_compile()\
python_binary="%1"\
bytecode_compilation_path="%2"\
find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -O -m py_compile\
find $bytecode_compilation_path -type f -a -name "*.py" -print0 | xargs -0 $python_binary -m py_compile

Name:           foo
Version:        1
Release:        1%{?dist}
Summary:        foo
License:        GPLv2

%description

%install
echo 'assert(0boom)' > %{buildroot}/bad.py
echo 'assert(0)' > %{buildroot}/good.py
# see 1., standard, fail-fast solution, no obtuse redundancy of syntax errors
#{py_byte_compile python3 %{buildroot}}
# see 2., keep-going solution
( %{py_byte_compile python3 %{buildroot}} ) || :
echo "so far so good"

%files
/__pycache__/good.cpython-37.opt-1.pyc
/__pycache__/good.cpython-37.pyc
/bad.py
/good.py

Excerpt from build per 1.:

+ python_binary=python3
+ bytecode_compilation_path=/home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64
+ find /home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -O -m py_compile
  File "/home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

error: Bad exit status from /var/tmp/rpm-tmp.7QYAES (%install)

and per 2.:

+ python_binary=python3
+ bytecode_compilation_path=/home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64
+ find /home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -O -m py_compile
  File "/home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

+ find /home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64 -type f -a -name '*.py' -print0
+ xargs -0 python3 -m py_compile
  File "/home/jpokorny/rpmbuild/BUILDROOT/foo-1-1.fc30.x86_64/bad.py", line 1
    assert(0boom)
            ^
SyntaxError: invalid token

+ :
+ echo 'so far so good'
so far so good

And if someone accidentally uses would-become-discouraged
%{py_byte_compile python3 %{buildroot}} || : ... they would
become the same as in case 1., which is not that bad, trivially
fixable when guidelines/appendix gets updated, and I figure this
keep-going cases should be very very exceptional, anyway.

So, to summarize:

  • get rid of the internal || :
  • document the "keep going" as ( %{py_byte_compile <interpreter> <path>} ) || :

Yes, that's TL;DR of that proposal.

rebased onto 8f067ff

5 years ago

Seems good to me. It's a nice simplification.

Where do we want this to land? I would say that it's a bugfix and that we'd want it everywhere but maybe we should let it soak in rawhide/F29 for a bit.

I'd put it everywhere where the macros are delivered from this package. That is rawhide, f29, epels. Would stick it in rawhide and f29 now to soak in, epels with next update or in couple of weeks.

Pull-Request has been merged by tibbs

5 years ago