Shortcuts

Functions for dealing with byte strings of unknown encoding.

chardetng_py.shortcuts.detect(byte_str, *, allow_utf8=False, tld=None)

Detect the encoding of byte_str.

Parameters:
  • byte_str (bytes or bytearray) – Input buffer to decode.

  • allow_utf8 (bool) – If set to False, the return value of this method won’t be "UTF-8". When performing detection on "text/html" on non-file: URLs, Web browsers must pass False, unless the user has taken a specific contextual action to request an override. This way, Web developers cannot start depending on UTF-8 detection. Such reliance would make the Web Platform more brittle.

  • tld (bytes or bytearray or None) – If tld contains non-ASCII, period, or upper-case letters. The exception condition is intentionally limited to signs of failing to extract the label correctly, failing to provide it in its Punycode form, and failure to lower-case it. Full DNS label validation is intentionally not performed to avoid panics when the reality doesn’t match the specs.

Returns:

  • str

  • The encoding of byte_str, suitable for use with str.decode.

Return type:

str