PYnative

Python Programming

  • Learn Python
  • Exercises
  • Quizzes
  • Code Editor
  • Tricks
Home » Python » JSON » Python Encode Unicode and non-ASCII characters as-is into JSON

Python Encode Unicode and non-ASCII characters as-is into JSON

Updated on: May 14, 2021 | Leave a Comment

In this article, we will address the following frequently asked questions about working with Unicode JSON data in Python.

  • How to serialize Unicode or non-ASCII data into JSON as-is strings instead of \u escape sequence (Example, Store Unicode string ø as-is instead of \u00f8 in JSON)
  • Encode Unicode data in utf-8 format.
  • How to serialize all incoming non-ASCII characters escaped (Example, Store Unicode string ø as \u00f8 in JSON)

Further Reading:

  • Solve Python JSON Exercise to practice Python JSON skills

The Python RFC 7159 requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.

The ensure_ascii parameter

Use Python’s built-in module json provides the json.dump() and json.dumps() method to encode Python objects into JSON data.

The json.dump() and json.dumps() has a ensure_ascii parameter. The ensure_ascii is by-default true so the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii=False, these characters will be output as-is.

The json module always produces str objects. You get a string back, not a Unicode string. Because the escaping is allowed by JSON.

  • using a ensure_ascii=True, we can present a safe way of representing Unicode characters. By setting it to true we make sure the resulting JSON is valid ASCII characters (even if they have Unicode inside).
  • Using a ensure_ascii=False, we make sure resulting JSON store Unicode characters as-is instead of \u escape sequence.

Save non-ASCII or Unicode data as-is not as \u escape sequence in JSON

In this example, we will try to encode the Unicode Data into JSON. This solution is useful when you want to dump Unicode characters as characters instead of escape sequences.

Set ensure_ascii=False in json.dumps() to encode Unicode as-is into JSON

import json

unicodeData= {
    "string1": "明彦",
    "string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)

encodedUnicode = json.dumps(unicodeData, ensure_ascii=False) # use dump() method to write it in file
print("JSON character encoding by setting ensure_ascii=False", encodedUnicode)

print("Decoding JSON", json.loads(encodedUnicode))

Output:

unicode Data is  {'string1': '明彦', 'string2': 'ø'}
JSON character encoding by setting ensure_ascii=False {"string1": "明彦", "string2": "ø"}
Decoding JSON {'string1': '明彦', 'string2': 'ø'}

Note: This example is useful to store the Unicode string as-is in JSON.

JSON Serialize Unicode Data and Write it into a file.

In the above example, we saw how to Save non-ASCII or Unicode data as-is not as \u escape sequence in JSON. Now, Let’s see how to write JSON serialized Unicode data as-is into a file.

import json

sampleDict= {
    "string1": "明彦",
    "string2": u"\u00f8"
}
with open("unicodeFile.json", "w", encoding='utf-8') as write_file:
    json.dump(sampleDict, write_file, ensure_ascii=False)
print("Done writing JSON serialized Unicode Data as-is into file")

with open("unicodeFile.json", "r", encoding='utf-8') as read_file:
    print("Reading JSON serialized Unicode data from file")
    sampleData = json.load(read_file)
print("Decoded JSON serialized Unicode data")
print(sampleData["string1"], sampleData["string1"])

Output:

Done writing JSON serialized Unicode Data as-is into file
Reading JSON serialized Unicode data from file
Decoded JSON serialized Unicode data
明彦 明彦
JSON file after writing Unicode data as-is
JSON file after writing Unicode data as-is

Serialize Unicode objects into UTF-8 JSON strings instead of \u escape sequence

You can also set JSON encoding to UTF-8. UTF-8 is the recommended default for maximum interoperability. set ensure_ascii=False to and encode Unicode data into JSON using ‘UTF-8‘.

import json

# encoding in UTF-8
unicodeData= {
    "string1": "明彦",
    "string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)

print("Unicode JSON Data encoding using utf-8")
encodedUnicode = json.dumps(unicodeData, ensure_ascii=False).encode('utf-8')
print("JSON character encoding by setting ensure_ascii=False", encodedUnicode)

print("Decoding JSON", json.loads(encodedUnicode))

Output:

unicode Data is  {'string1': '明彦', 'string2': 'ø'}
Unicode JSON Data encoding using utf-8
JSON character encoding by setting ensure_ascii=False b'{"string1": "\xe6\x98\x8e\xe5\xbd\xa6", "string2": "\xc3\xb8"}'
Decoding JSON {'string1': '明彦', 'string2': 'ø'}

Encode both Unicode and ASCII (Mix Data) into JSON using Python

In this example, we will see how to encode Python dictionary into JSON which contains both Unicode and ASCII data.

import json

sampleDict = {"name": "明彦", "age": 25}
print("unicode Data is ", sampleDict)

# set ensure_ascii=True
jsonDict = json.dumps(sampleDict, ensure_ascii=True)
print("JSON character encoding by setting ensure_ascii=True")
print(jsonDict)

print("Decoding JSON", json.loads(jsonDict))

# set ensure_ascii=False
jsonDict = json.dumps(sampleDict, ensure_ascii=False)
print("JSON character encoding by setting ensure_ascii=False")
print(jsonDict)

print("Decoding JSON", json.loads(jsonDict))

# set ensure_ascii=False and encode using utf-8
jsonDict = json.dumps(sampleDict, ensure_ascii=False).encode('utf-8')
print("JSON character encoding by setting ensure_ascii=False and UTF-8")
print(jsonDict)

print("Decoding JSON", json.loads(jsonDict))

Output:

unicode Data is  {'name': '明彦', 'age': 25}
JSON character encoding by setting ensure_ascii=True
{"name": "\u660e\u5f66", "age": 25}
Decoding JSON {'name': '明彦', 'age': 25}

JSON character encoding by setting ensure_ascii=False
{"name": "明彦", "age": 25}
Decoding JSON {'name': '明彦', 'age': 25}

JSON character encoding by setting ensure_ascii=False and UTF-8
b'{"name": "\xe6\x98\x8e\xe5\xbd\xa6", "age": 25}'
Decoding JSON {'name': '明彦', 'age': 25}

Python Escape non-ASCII characters while encoding it into JSON

Let’ see how store all incoming non-ASCII characters escaped in JSON. It is a safe way of representing Unicode characters. By setting ensure_ascii=True we make sure resulting JSON is valid ASCII characters (even if they have Unicode inside).

import json

unicodeData= {
    "string1": "明彦",
    "string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)

# set ensure_ascii=True
encodedUnicode = json.dumps(unicodeData, ensure_ascii=True)
print("JSON character encoding by setting ensure_ascii=True")
print(encodedUnicode)

print("Decoding JSON")
print(json.loads(encodedUnicode))

Output:

unicode Data is  {'string1': '明彦', 'string2': 'ø'}
JSON character encoding by setting ensure_ascii=True
{"string1": "\u660e\u5f66", "string2": "\u00f8"}

Decoding JSON
{'string1': '明彦', 'string2': 'ø'}

Filed Under: Python, Python JSON

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

TweetF  sharein  shareP  Pin

About Vishal

Founder of PYnative.com I am a Python developer and I love to write articles to help developers. Follow me on Twitter. All the best for your future Python endeavors!

Related Tutorial Topics:

Python Python JSON

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ
Exercises
Quizzes

Leave a Reply Cancel reply

your email address will NOT be published. all comments are moderated according to our comment policy.

Use <pre> tag for posting code. E.g. <pre> Your code </pre>

Posted In

Python Python JSON
TweetF  sharein  shareP  Pin

  Python JSON

  • Python JSON Guide
  • JSON Serialization
  • JSON Parsing
  • JSON Validation
  • PrettyPrint JSON
  • Make Python Class JSON serializable
  • Convert JSON Into Custom Python Object
  • Python JSON Unicode
  • Parse JSON using Python requests
  • Post JSON using Python requests
  • Serialize DateTime into JSON
  • Serialize Python Set into JSON
  • Serialize NumPy array into JSON
  • Parse multiple JSON objects from file
  • Parse Nested JSON
  • Python JSON Exercise

All Python Topics

Python Basics Python Exercises Python Quizzes Python File Handling Python OOP Python Date and Time Python Random Python Regex Python Pandas Python Databases Python MySQL Python PostgreSQL Python SQLite Python JSON

About PYnative

PYnative.com is for Python lovers. Here, You can get Tutorials, Exercises, and Quizzes to practice and improve your Python skills.

Explore Python

  • Learn Python
  • Python Basics
  • Python Databases
  • Python Exercises
  • Python Quizzes
  • Online Python Code Editor
  • Python Tricks

Follow Us

To get New Python Tutorials, Exercises, and Quizzes

  • Twitter
  • Facebook
  • Sitemap

Legal Stuff

  • About Us
  • Contact Us

We use cookies to improve your experience. While using PYnative, you agree to have read and accepted our Terms Of Use, Cookie Policy, and Privacy Policy.

Copyright © 2018–2022 pynative.com