In this article, we will address the following frequently asked questions about working with Unicode JSON data in Python.
- How to serialize Unicode or non-ASCII data into JSON as-is strings instead of
\u
escape sequence (Example, Store Unicode stringø
as-is instead of\u00f8
in JSON) - Encode Unicode data in
utf-8
format. - How to serialize all incoming non-ASCII characters escaped (Example, Store Unicode string
ø
as\u00f8
in JSON)
Further Reading:
- Solve Python JSON Exercise to practice Python JSON skills
The Python RFC 7159 requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.
The ensure_ascii parameter
Use Python’s built-in module json provides the json.dump() and json.dumps() method to encode Python objects into JSON data.
The json.dump()
and json.dumps()
has a ensure_ascii
parameter. The ensure_ascii
is by-default true so the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii=False
, these characters will be output as-is.
The json module always produces str
objects. You get a string back, not a Unicode string. Because the escaping is allowed by JSON.
- using a
ensure_ascii=True
, we can present a safe way of representing Unicode characters. By setting it to true we make sure the resulting JSON is valid ASCII characters (even if they have Unicode inside). - Using a
ensure_ascii=False
, we make sure resulting JSON store Unicode characters as-is instead of\u
escape sequence.
Save non-ASCII or Unicode data as-is not as \u escape sequence in JSON
In this example, we will try to encode the Unicode Data into JSON. This solution is useful when you want to dump Unicode characters as characters instead of escape sequences.
Set ensure_ascii=False
in json.dumps()
to encode Unicode as-is into JSON
import json
unicodeData= {
"string1": "明彦",
"string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)
encodedUnicode = json.dumps(unicodeData, ensure_ascii=False) # use dump() method to write it in file
print("JSON character encoding by setting ensure_ascii=False", encodedUnicode)
print("Decoding JSON", json.loads(encodedUnicode))
Output:
unicode Data is {'string1': '明彦', 'string2': 'ø'} JSON character encoding by setting ensure_ascii=False {"string1": "明彦", "string2": "ø"} Decoding JSON {'string1': '明彦', 'string2': 'ø'}
Note: This example is useful to store the Unicode string as-is in JSON.
JSON Serialize Unicode Data and Write it into a file.
In the above example, we saw how to Save non-ASCII or Unicode data as-is not as \u escape sequence in JSON. Now, Let’s see how to write JSON serialized Unicode data as-is into a file.
import json
sampleDict= {
"string1": "明彦",
"string2": u"\u00f8"
}
with open("unicodeFile.json", "w", encoding='utf-8') as write_file:
json.dump(sampleDict, write_file, ensure_ascii=False)
print("Done writing JSON serialized Unicode Data as-is into file")
with open("unicodeFile.json", "r", encoding='utf-8') as read_file:
print("Reading JSON serialized Unicode data from file")
sampleData = json.load(read_file)
print("Decoded JSON serialized Unicode data")
print(sampleData["string1"], sampleData["string1"])
Output:
Done writing JSON serialized Unicode Data as-is into file Reading JSON serialized Unicode data from file Decoded JSON serialized Unicode data 明彦 明彦

Serialize Unicode objects into UTF-8 JSON strings instead of \u escape sequence
You can also set JSON encoding to UTF-8. UTF-8 is the recommended default for maximum interoperability. set ensure_ascii=False
to and encode Unicode data into JSON using ‘UTF-8‘.
import json
# encoding in UTF-8
unicodeData= {
"string1": "明彦",
"string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)
print("Unicode JSON Data encoding using utf-8")
encodedUnicode = json.dumps(unicodeData, ensure_ascii=False).encode('utf-8')
print("JSON character encoding by setting ensure_ascii=False", encodedUnicode)
print("Decoding JSON", json.loads(encodedUnicode))
Output:
unicode Data is {'string1': '明彦', 'string2': 'ø'} Unicode JSON Data encoding using utf-8 JSON character encoding by setting ensure_ascii=False b'{"string1": "\xe6\x98\x8e\xe5\xbd\xa6", "string2": "\xc3\xb8"}' Decoding JSON {'string1': '明彦', 'string2': 'ø'}
Encode both Unicode and ASCII (Mix Data) into JSON using Python
In this example, we will see how to encode Python dictionary into JSON which contains both Unicode and ASCII data.
import json
sampleDict = {"name": "明彦", "age": 25}
print("unicode Data is ", sampleDict)
# set ensure_ascii=True
jsonDict = json.dumps(sampleDict, ensure_ascii=True)
print("JSON character encoding by setting ensure_ascii=True")
print(jsonDict)
print("Decoding JSON", json.loads(jsonDict))
# set ensure_ascii=False
jsonDict = json.dumps(sampleDict, ensure_ascii=False)
print("JSON character encoding by setting ensure_ascii=False")
print(jsonDict)
print("Decoding JSON", json.loads(jsonDict))
# set ensure_ascii=False and encode using utf-8
jsonDict = json.dumps(sampleDict, ensure_ascii=False).encode('utf-8')
print("JSON character encoding by setting ensure_ascii=False and UTF-8")
print(jsonDict)
print("Decoding JSON", json.loads(jsonDict))
Output:
unicode Data is {'name': '明彦', 'age': 25} JSON character encoding by setting ensure_ascii=True {"name": "\u660e\u5f66", "age": 25} Decoding JSON {'name': '明彦', 'age': 25} JSON character encoding by setting ensure_ascii=False {"name": "明彦", "age": 25} Decoding JSON {'name': '明彦', 'age': 25} JSON character encoding by setting ensure_ascii=False and UTF-8 b'{"name": "\xe6\x98\x8e\xe5\xbd\xa6", "age": 25}' Decoding JSON {'name': '明彦', 'age': 25}
Python Escape non-ASCII characters while encoding it into JSON
Let’ see how store all incoming non-ASCII characters escaped in JSON. It is a safe way of representing Unicode characters. By setting ensure_ascii=True
we make sure resulting JSON is valid ASCII characters (even if they have Unicode inside).
import json
unicodeData= {
"string1": "明彦",
"string2": u"\u00f8"
}
print("unicode Data is ", unicodeData)
# set ensure_ascii=True
encodedUnicode = json.dumps(unicodeData, ensure_ascii=True)
print("JSON character encoding by setting ensure_ascii=True")
print(encodedUnicode)
print("Decoding JSON")
print(json.loads(encodedUnicode))
Output:
unicode Data is {'string1': '明彦', 'string2': 'ø'} JSON character encoding by setting ensure_ascii=True {"string1": "\u660e\u5f66", "string2": "\u00f8"} Decoding JSON {'string1': '明彦', 'string2': 'ø'}
Leave a Reply