inject: Decode subject line before storing the email.
authorRichard W.M. Jones <rjones@redhat.com>
Fri, 7 Jul 2017 09:30:10 +0000 (10:30 +0100)
committerRichard W.M. Jones <rjones@redhat.com>
Fri, 7 Jul 2017 09:31:09 +0000 (10:31 +0100)
Decoding Subject lines is a PITA in Python.  Do it once when injecting
the email, and store the subject lines as UTF-8.  This saves a lot of
effort later on, even though it's not strictly RFC compliant.

inject-mbox.py

index 7a5539c..3faf3d6 100755 (executable)
@@ -28,6 +28,15 @@ processed = 0
 def inject(m):
     global processed
 
 def inject(m):
     global processed
 
+    # Decode the subject line and store it back in the email as UTF-8.
+    # This saves a lot of effort later on, even though it's not
+    # strictly RFC822 compliant.
+    # https://stackoverflow.com/questions/7331351/python-email-header-decoding-utf-8/7331577#7331577
+    subj = m['Subject']
+    subj = email.header.decode_header(subj)
+    subj = ''.join([ unicode(t[0], t[1] or 'ASCII') for t in subj ])
+    m['Subject'] = subj
+
     print("Injecting %s" % m['Subject'])
 
     channel.basic_publish(exchange = 'patchq_input',
     print("Injecting %s" % m['Subject'])
 
     channel.basic_publish(exchange = 'patchq_input',