Enable Profile-guided optimization for all arches, not just x86
This increases the build time (to ~2 hours on armv7hl Koji),
but should bring more optimized Python to architectures other than x86.
The build time overhead is not so big on Python 3.8,
as only a limited number of tests is used.