- Python Web Scraping Cookbook
- Michael Heydt
- 82字
- 2021-06-30 18:44:04
Getting ready
We will read a file named unicode.html from our local web server, located at http://localhost:8080/unicode.html. This file is UTF-8 encoded and contains several sets of characters in different parts of the encoding space. For example, the page looks as follows in your browser:
data:image/s3,"s3://crabby-images/53831/53831e860c02f69304a6b56326ccb6fc13c9347e" alt=""
The Page in the Browser
Using an editor that supports UTF-8, we can see how the Cyrillic characters are rendered in the editor:
data:image/s3,"s3://crabby-images/d4e62/d4e62bfd55b6da7331bafd575381105aed00109a" alt=""
The HTML in an Editor
Code for the sample is in 02/06_unicode.py.