#7 Use charset_normalizer instead of cchardet
Closed 2 years ago by churchyard. Opened 2 years ago by churchyard.
rpms/ churchyard/calibre charset-normalizer  into  rawhide

file added
+37
@@ -0,0 +1,37 @@ 

+ From a266f068c03fc404316f6f7d9c7607b1365efdf2 Mon Sep 17 00:00:00 2001

+ From: =?UTF-8?q?Miro=20Hron=C4=8Dok?= <miro@hroncok.cz>

+ Date: Sat, 16 Jul 2022 07:55:47 +0200

+ Subject: [PATCH] Low effort port to charset_normalizer

+ 

+ See https://github.com/PyYoshi/cChardet/issues/77

+ ---

+  src/calibre/ebooks/chardet.py | 2 +-

+  src/calibre/test_build.py     | 2 +-

+  2 files changed, 2 insertions(+), 2 deletions(-)

+ 

+ diff --git a/src/calibre/ebooks/chardet.py b/src/calibre/ebooks/chardet.py

+ index 53fe6c51087..92a8accdafb 100644

+ --- a/src/calibre/ebooks/chardet.py

+ +++ b/src/calibre/ebooks/chardet.py

+ @@ -103,7 +103,7 @@ def substitute_entites(raw):

+  

+  

+  def detect(bytestring):

+ -    from cchardet import detect as implementation

+ +    from charset_normalizer import detect as implementation

+      ans = implementation(bytestring)

+      enc = ans.get('encoding')

+      if enc:

+ diff --git a/src/calibre/test_build.py b/src/calibre/test_build.py

+ index c0f50292160..ba89dea28d0 100644

+ --- a/src/calibre/test_build.py

+ +++ b/src/calibre/test_build.py

+ @@ -73,7 +73,7 @@ def test_pychm(self):

+          del CHMFile, chmlib

+  

+      def test_chardet(self):

+ -        from cchardet import detect

+ +        from charset_normalizer import detect

+          raw = 'mūsi Füße'.encode()

+          data = detect(raw)

+          self.assertEqual(data['encoding'].lower(), 'utf-8')

file modified
+7 -2
@@ -18,6 +18,11 @@ 

  # This is so gnome-software only 'sees' calibre once.

  Patch3:         calibre-nodisplay.patch

  

+ # Use charset_normalizer instead of cchardet

+ # Downstream only, upstream plans to fork and maintain cchardet instead

+ # see https://bugzilla.redhat.com/show_bug.cgi?id=2021804

+ Patch4:         https://github.com/kovidgoyal/calibre/pull/1690.patch

+ 

  ExclusiveArch:  %{qt5_qtwebengine_arches}

  

  # https://fedoraproject.org/wiki/Changes/RetireARMv7
@@ -72,7 +77,7 @@ 

  BuildRequires:  python3dist(pyqt-builder)

  BuildRequires:  python3dist(pychm)

  BuildRequires:  python3dist(pycrypto)

- BuildRequires:  python3dist(cchardet)

+ BuildRequires:  python3dist(charset-normalizer)

  BuildRequires:  python3dist(sgmllib3k)

  BuildRequires:  python3-speechd

  BuildRequires:  python3-jeepney
@@ -131,7 +136,7 @@ 

  Requires:       python3dist(html2text)

  Requires:       python3dist(markdown) >= 3.0

  Requires:       python3dist(pychm)

- Requires:       python3dist(cchardet)

+ Requires:       python3dist(charset-normalizer)

  Requires:       python3dist(pyqt5-sip) >= 12.8, python3dist(pyqt5-sip) < 13

  Requires:       udisks2

  Requires:       /usr/bin/jpegtran

cchardet seems unmaintained and does not install/work with Python 3.11

rebased onto 517bfa7

2 years ago

Note that upstream says they will rather fork and maintain cchardet than use a pure Python implementation (IIRC they wanted to maintain Python 2 in the past rather than port to Python 3).

I consider charset_normalizer quite fast (although truly not as fast as cchardet) and I consider a slightly slower calibre better than no calibre. Will however leave the decision to the maintainers here and won't use my provenpacakger powers to merge this.

Looks like upstream switched to uchardet? That seems to be already packaged, so I would prefer to stick with that so we don't diverge.

Many thanks for unsticking this though!

Looks like upstream switched to uchardet? That seems to be already packaged, so I would prefer to stick with that so we don't diverge.

On it.

Pull-Request has been closed by churchyard

2 years ago
Metadata