1634914 scripts/pythondistdeps: Notes from an attempted rewrite to importlib.metadata

Authored and Committed by torsava 4 years ago
    scripts/pythondistdeps: Notes from an attempted rewrite to importlib.metadata
    
    Notes from an attempted rewrite from pkg_resources to importlib.metadata in 2020:
    1. While pkg_resources can open a metadata on a specified path
       (Distribution.from_location()), importlib provides access only to
       "installed package metadata", i.e. the the dist-info or egg-info directory
       must be "discoverable", i.e. on the sys.path.
       - Thankfully only the dist/egg-info directory must exist, the
         corresponding Python module does not have to be present.
       - The problems this causes:
         (a) You have to manipulate the sys.path to add the specific location of
             the site-packages directory inside the buildroot
         (b) If you have package "foo" in this newly added directory on sys.path
             and there is some problem and its dist/egg-info metadata are not found,
             importlib.metadata continues searching the sys.path and may discover a
             package with the same name (possibly same version) outside the
             buildroot.
             To get around this, you can manipulate the sys.path to remove all
             other "site-packages" directories. But you have to leave the
             standard library there, because importlib may import other modules
             (in my testing: base64, quopri, random, socket, calendar, uu)
         (c) I have not tested how well it works if you're ispecting metadata of
             different Python versions than the one you run the script with
             (especially Python 2 vs Python 3). This might also cause problems with
             dependency specifiers (i.e. python_version != "3.4")
    2. Handling of dependencies (requires) is problematic in importlib.metadata
       - pkg_resources provides a way to separately list standard requires and a
         requires for each "extras" category. importlib does not provide this, it
         only spits out a list of strings, each string in the format:
         - 'packaging>=14',
         - 'towncrier>=18.5.0; extra == "docs"', or
         - 'psutil<6,>=5.6.1; (python_version != "3.4") and extra == "testing"
         you can either parse these with a regex (fragile) or use the external
         `packaging` Python module. `packaging`, however, also doesn't have a great
         support for figuring out extra dependencies, it provides the marker api:
         - <Marker(\'python_version != "3.4" and extra == "testing"\')>
         you can use Marker api to evaluate the condition, but not to parse.
         For parsing you can access the private api Marker._markers:
         - marker._markers=[[(<Variable('python_version')>, <Op('!=')>, \
               <Value('3.4')>)], 'and', (<Variable('extra')>, <Op('==')>, \
               <Value('testing')>)]
         which beyond the problem of being private is also not very useful for
         parsing due to its structure.
       - pkg_resources also provides version parsing, which importlib does not
         and `packaging` needs to be used
       - importlib is part of the standard library, but packaging and its
         2 runtime dependencies (pyparsing and six) are not, and therefore we
         would go from 1 dependency to 3
    3. A few minor issues, more in the next section about equivalents.
    
    importlib.metadata.distribution equivalents of pkg_resources.Distribution attributes:
    - pkg_resources: dist.py_version
      importlib: # not implemented (but can be guessed from the /usr/lib/pythonXX.YY/ path)
    - pkg_resources: dist.project_name
      importlib: dist.metadata['name']
    - pkg_resources: dist.key
      importlib: # not implemented
    - pkg_resources: dist.version
      importlib: dist.version
    - pkg_resources: dist.requires()
      importlib: dist.requires  # but returns strings with almost no parsing done, and also lists extras
    - pkg_resources: dist.requires(extras=dist.extras)
      importlib: # not implemented, has to be parsed from dist.requires
    - pkg_resources: dist.get_entry_map('console_scripts')
      importlib: [ep for ep in importlib.metadata.entry_points()['console_scripts'] if ep.name == pkg][0]
                 # I have not found a better way to get the console_scripts
    - pkg_resources: dist.get_entry_map('gui_scripts')
      importlib: # Presumably same as console_scripts, but untested
    
        
file modified
+7 -0