Blob Blame History Raw
# The import name is pdfminer. The upstream project name (as specified in
# setup.py) is pdfminer.six, which results in a canonical project name of
# pdfminer-six.

%bcond_without python3_other

Name:           python-pdfminer
Version:        20160614
Release:        6%{?dist}
Summary:        Tool for extracting information from PDF documents

# The entire source is MIT except:
#
# Public Domain:
#   pdfminer/arcfour.py
#     - If this is a bundled library, its origin is unclear
#   pdfminer/ascii85.py
#     - If this is a bundled library, its origin is unclear
#   pdfminer/rijndael.py
#     - Based on https://www.efgh.com/software/rijndael.htm; however, we do not
#       treat it as a bundled dependency since it is a total rewrite from C to
#       Python
#
# APAFML:
#   pdfminer/fontmetrics.py
#     - Data extracted and converted from the AFM files:
#       https://www.ctan.org/tex-archive/fonts/adobe/afm/
#
# BSD:
#   pdfminer/cmap/*
#     - Both the original bundled data and the data generated from the
#       adobe-mappings-cmap package are BSD-licensed.
#
# Note that pdfminer/glyphlist.py contains data extracted and converted from
# https://partners.adobe.com/public/developer/en/opentype/glyphlist.txt under
# the Adobe Glyph List License; but that this license is just an MIT variant
# (https://fedoraproject.org/wiki/Licensing:MIT?rd=Licensing/MIT#AdobeGlyph).
License:        MIT and Public Domain and APAFML and BSD
URL:            https://github.com/pdfminer/pdfminer.six
# This has the samples/ directory stripped out. While upstream claims the
# sample PDFs are “freely distributable”, they have unclear or unspecified
# licenses, which makes them unsuitable for Fedora. This applies especially,
# but not exclusively, to the contents of samples/nonfree.
#
# Generated with ./get_source.sh %%{version}
Source0:        pdfminer.six-%{version}-filtered.tar.xz
# Script to generate Source0; see comments above.
Source1:        get_source.sh
# Downstream man pages (written for Fedora based on --help output) in
# groff_man(7) format
Source2:        dumppdf.1
Source3:        pdf2txt.1
Source10:       %{url}/raw/0cb13983f7975e1bbca42778882f3eaaca3420e1/LICENSE

BuildArch:      noarch

BuildRequires:  python2-devel
BuildRequires:  python2-setuptools
BuildRequires:  python2-six

%if 0%{?with_python3_other}
BuildRequires:  python%{python3_other_pkgversion}-devel
BuildRequires:  python%{python3_other_pkgversion}-setuptools
BuildRequires:  python%{python3_other_pkgversion}-six
%endif

BuildRequires:  python%{python3_pkgversion}-devel
BuildRequires:  python%{python3_pkgversion}-setuptools
BuildRequires:  python%{python3_pkgversion}-six

BuildRequires:  make
# We use the Japan1, Korea1, GB1, and CNS1 CMaps:
BuildRequires:  adobe-mappings-cmap-devel

# Helper tools and dependencies.
BuildRequires:  dos2unix

%global common_description %{expand: \
Pdfminer.six is a community maintained fork of the original PDFMiner. It is a
tool for extracting information from PDF documents. It focuses on getting and
analyzing text data. Pdfminer.six extracts the text from a page directly from
the sourcecode of the PDF. It can also be used to get the exact location, font
or color of the text.

It is built in a modular way such that each component of pdfminer.six can be
replaced easily. You can implement your own interpreter or rendering device
that uses the power of pdfminer.six for other purposes than text analysis.

Check out the full documentation on Read the Docs
(https://pdfminersix.readthedocs.io/).

Features:

  • Written entirely in Python.
  • Parse, analyze, and convert PDF documents.
  • PDF-1.7 specification support. (well, almost).
  • CJK languages and vertical writing scripts support.
  • Various font types (Type1, TrueType, Type3, and CID) support.
  • Support for extracting images (JPG, JBIG2, Bitmaps).
  • Support for various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode,
    FlateDecode, RunLengthDecode, CCITTFaxDecode)
  • Support for RC4 and AES encryption.
  • Support for AcroForm interactive form extraction.
  • Table of contents extraction.
  • Tagged contents extraction.
  • Automatic layout analysis.}

%description
%{common_description}


%package -n     python2-pdfminer
Summary:        %{summary}

Requires:       python2-six

%{?python_provide:%python_provide python2-pdfminer}
%{?python_provide:%python_provide python2-pdfminer.six}

%description -n python2-pdfminer
%{common_description}


%if %{with python3_other}
%package -n python%{python3_other_pkgversion}-pdfminer
Summary:        %{summary}

# ...the package was called "pdfminer.six". Whoops.
Obsoletes:      python34-pdfminer.six <= 20160614-3.el7

Requires:       python%{python3_other_pkgversion}-six
Requires:       python%{python3_other_pkgversion}-chardet

%{?python_provide:%python_provide python%{python3_other_pkgversion}-pdfminer}
%{?python_provide:%python_provide python%{python3_other_pkgversion}-pdfminer.six}

%description -n python%{python3_other_pkgversion}-pdfminer
%{common_description}
%endif


%package -n python%{python3_pkgversion}-pdfminer
Summary:        %{summary}

# ...the package was called "pdfminer.six". Whoops.
Obsoletes:      python34-pdfminer.six <= 20160614-3.el7

Requires:       python%{python3_pkgversion}-six
Requires:       python%{python3_pkgversion}-chardet

%{?python_provide:%python_provide python%{python3_pkgversion}-pdfminer}
%{?python_provide:%python_provide python%{python3_pkgversion}-pdfminer.six}

%description -n python%{python3_pkgversion}-pdfminer
%{common_description}


%package doc
Summary:        Documentation for pdfminer
# See the base package License field for non-MIT sources; it appears that none
# of these contribute to the documentation.
License:        MIT

%description doc
%{common_description}


%prep
%autosetup -n pdfminer.six-%{version}
# Remove bundled egg-info
rm -rf pdfminer.six.egg-info

mkdir -p '_man'
cp -p '%{SOURCE2}' '%{SOURCE3}' '_man/'

cp -p '%{SOURCE10}' .

# Unbundle cmap data; it will be replaced in %%build.
rm -vf cmaprsrc/* pdfminer/cmap/*

# Remove shebang lines in non-script sources
find pdfminer tests -type f -name '*.py' \
    -exec gawk '/^#!/ { print FILENAME }; { nextfile }' '{}' '+' |
  xargs -r sed -r -i '1{/^#!/d}'

# Fix some end-of-line and +x issues.
dos2unix tools/*
dos2unix docs/*


%build
# Symlink the unbundled CMap resources and convert to the pickled format.
for cmap in Japan1 Korea1 GB1 CNS1
do
  ln -s "%{adobe_mappings_rootpath}/${cmap}/cid2code.txt" \
      "cmaprsrc/cid2code_Adobe_${cmap}.txt"
done
%make_build cmap PYTHON='%{__python3}'

%py2_build
%if %{with python3_other}
%py3_other_build
%endif
%py3_build
cp -rp docs html


%install
# Fix some more end-of-line issues
dos2unix build/lib/pdfminer/*.py

# Must do the subpackages' install first because the scripts in /usr/bin are
# overwritten with every setup.py install.

# Also, ship symlinks of the scripts without the .py syntax.

%py2_install
for bin in 'pdf2txt' 'dumppdf' 'latin2ascii'
do
  cp -p "%{buildroot}%{_bindir}/${bin}.py" "%{buildroot}%{_bindir}/${bin}.py-2"
  ln -sf "${bin}.py-2" "%{buildroot}%{_bindir}/${bin}.py-%{python2_version}"
done

%if %{with python3_other}
%py3_other_install
for bin in 'pdf2txt' 'dumppdf' 'latin2ascii'
do
  cp -p "%{buildroot}%{_bindir}/${bin}.py" \
      "%{buildroot}%{_bindir}/${bin}.py-%{python3_other_version}"
  sed -r -i 's/python2 -s/python%{python3_other_version} -s/' \
      "%{buildroot}%{_bindir}/${bin}.py-%{python3_other_version}"
done
%endif

%py3_install
for bin in 'pdf2txt' 'dumppdf' 'latin2ascii'
do
  sed -r -i 's/python2 -s/python3 -s/' "%{buildroot}%{_bindir}/${bin}.py"
  cp -p "%{buildroot}%{_bindir}/${bin}.py" "%{buildroot}%{_bindir}/${bin}.py-3"
  ln -sf "${bin}.py-3" "%{buildroot}%{_bindir}/${bin}.py-%{python3_version}"
  ln -sf "${bin}.py-3" "%{buildroot}%{_bindir}/${bin}"
done

install -d '%{buildroot}%{_mandir}/man1'
install -t '%{buildroot}%{_mandir}/man1' -p -m 0644 _man/*


%files -n python2-pdfminer
%license LICENSE
%{_bindir}/pdf2txt.py-2
%{_bindir}/pdf2txt.py-%{python2_version}
%{_bindir}/dumppdf.py-2
%{_bindir}/dumppdf.py-%{python2_version}
%{_bindir}/latin2ascii.py-2
%{_bindir}/latin2ascii.py-%{python2_version}
%{python2_sitelib}/pdfminer
%{python2_sitelib}/pdfminer.six-%{version}-py%{python2_version}.egg-info


%if %{with python3_other}
%files -n python%{python3_other_pkgversion}-pdfminer
%license LICENSE
%{_bindir}/pdf2txt.py-%{python3_other_version}
%{_bindir}/dumppdf.py-%{python3_other_version}
%{_bindir}/latin2ascii.py-%{python3_other_version}
%{python3_other_sitelib}/pdfminer
%{python3_other_sitelib}/pdfminer.six-%{version}-py%{python3_other_version}.egg-info
%endif


%files -n python%{python3_pkgversion}-pdfminer
%license LICENSE
%{_bindir}/pdf2txt
%{_bindir}/pdf2txt.py
%{_bindir}/pdf2txt.py-3
%{_bindir}/pdf2txt.py-%{python3_version}
%{_mandir}/man1/pdf2txt.1*
%{_bindir}/dumppdf
%{_bindir}/dumppdf.py
%{_bindir}/dumppdf.py-3
%{_bindir}/dumppdf.py-%{python3_version}
%{_mandir}/man1/dumppdf.1*
%{_bindir}/latin2ascii
%{_bindir}/latin2ascii.py
%{_bindir}/latin2ascii.py-3
%{_bindir}/latin2ascii.py-%{python3_version}
%{python3_sitelib}/pdfminer
%{python3_sitelib}/pdfminer.six-%{version}-py%{python3_version}.egg-info


%files doc
%license LICENSE
%doc README.md
%doc html


%changelog
* Mon Oct 18 2021 Benjamin A. Beasley <code@musicinmybrain.net> - 20160614-6
- General packaging improvements
- Add a -doc subpackage
- Add a python3.4 (python3_other) version
- Add man pages for command-line tools
- Build with adobe-mappings-cmap instead of cmap-resources
- Improved summary/descriptions
- Filter questionably-licensed sampled PDFs from source tarball
- Correct License from “MIT” to “MIT and Public Domain and APAFML and BSD”

* Fri Mar 08 2019 Troy Dawson <tdawson@redhat.com> - 20160614-5
- Rebuilt to change main python from 3.4 to 3.6

* Sat Oct 22 2016 Ben Rosser <rosser.bjr@gmail.com> - 20160616-4
- Add missing Requires on python-six, python-chardet.
- Rename python34 subpackage to python34-pdfminer, which is what it should have been initially.

* Tue Jul 19 2016 Fedora Release Engineering <rel-eng@lists.fedoraproject.org> - 20160614-3
- https://fedoraproject.org/wiki/Changes/Automatic_Provides_for_Python_RPM_Packages

* Tue Jun 21 2016 Ben Rosser <rosser.bjr@gmail.com> 20160614-2
- I forgot to actually apply the patch to remove chbangs from library files. Apply said patch.

* Tue Jun 14 2016 Ben Rosser <rosser.bjr@gmail.com> 20160614-1
- Update to latest upstream version of package.
- Use local version of patch.

* Sat Feb 27 2016 Ben Rosser <rosser.bjr@gmail.com> 20160202-3
- Added a patch to remove the chbangs from all library files.
- Write correct sed command to make python3 scripts run with python3.

* Sat Feb 27 2016 Ben Rosser <rosser.bjr@gmail.com> 20160202-2
- Through the use of some gratuitious sed, the python2 package only depends on /usr/bin/python2.
- The python3 version is still a little weird; it pulls in /usr/bin/python and I'm not sure why.
- Also, make the python 3 scripts be the default ones.

* Fri Feb 26 2016 Ben Rosser <rosser.bjr@gmail.com> 20160202-1
- Update to latest upstream release.

* Thu Feb 04 2016 Fedora Release Engineering <releng@fedoraproject.org> - 20151013-6
- Rebuilt for https://fedoraproject.org/wiki/Fedora_24_Mass_Rebuild

* Fri Jan 1 2016 Ben Rosser <rosser.bjr@gmail.com> 20151013-5
- Version bump to silence rpmlint.

* Fri Jan 1 2016 Ben Rosser <rosser.bjr@gmail.com> 20151013-4
- Upgrade path; obsolete and provide the pdfminer-six package in the COPR.
- Now replace the original python-pdfminer package with this one.

* Fri Jan 1 2016 Ben Rosser <rosser.bjr@gmail.com> 20151013-3
- Upgrade path; obsolete and provide python-pdfminer up until rawhide.

* Sat Dec 19 2015 Ben Rosser <rosser.bjr@gmail.com> 20151013-2
- Ship symlinks of the pdfminer scripts without the .py suffix.

* Fri Dec 18 2015 Ben Rosser <rosser.bjr@gmail.com> - 20151013-1
- Initial package of the pdfminer.six fork using pyp2rpm.

* Thu Jun 18 2015 Fedora Release Engineering <rel-eng@lists.fedoraproject.org> - 20140328-3
- Rebuilt for https://fedoraproject.org/wiki/Fedora_23_Mass_Rebuild

* Sat Aug 23 2014 Ben Rosser <rosser.bjr@gmail.com> 20140328-2
- Replaced /usr/bin with bindir macro in install section.

* Sat Aug 16 2014 Ben Rosser <rosser.bjr@gmail.com> 20140328-1
- Updated to latest version of pdfminer.
- Changed specfile to depend on the correct cmap-* packages.

* Thu Sep 20 2012 Ben Rosser <rosser.bjr@gmail.com> 20110515-4
- Removed bundled cmap, changed to depend on cmap package instead

* Thu Jul 05 2012 Ben Rosser <rosser.bjr@gmail.com> 20110515-3
- Removed BuildRoot, clean, and first line of install
- Fixed issue with cmap data not being copied into package
- Fixed license (cmap is under BSD, not MIT)

* Tue May 22 2012 Ben Rosser <rosser.bjr@gmail.com> 20110515-2
- Fixed unowned directory issue and cleaned up the spec file

* Fri May 18 2012 Ben Rosser <rosser.bjr@gmail.com> 20110515-1
- Initial version of the package