12 0 obj << /Type /Font /Subtype /CIDFontType0 /BaseFont /AAAAAA+NotoSansCJK /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >> /FontDescriptor 13 0 R /DW 1000 /W [ 1 [500] 2 [600] ] >> endobj Pitfall: Text extraction returns garbled CJK text. Cause: Using +f1 ’s CMap incorrectly. Fix: Ensure your extractor uses the CMap referenced in the PDF (usually /CMap /Identity-H ).
qpdf --qdf --object-streams=disable document.pdf unpacked.pdf grep -A5 "/CIDFont" unpacked.pdf You will see something like: cidfont+f1 f2 f3 f4 f5 f6
This is an excellent and highly technical topic. The notation cidfont+f1 , cidfont+f2 , etc., is specific to and PDF internals, usually observed in PDF stream dumps , PostScript printer logs , or extracted font debugging output . 12 0 obj << /Type /Font /Subtype /CIDFontType0
Example simplified PDF object: