NodeJs Axios Website Requested Webpage HTML Response is Unreadable
Problem Description:
I’m trying to request a webpage and run it through cheerio
, but the response HTML is not readable and is made up of characters like �a��{t��
.
How do I receive a readable response?
const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://www.amazon.co.uk/Acer-XF243YPbmiiprx-Monitor-FreeSync-Adjustable/dp/B097F6DT45')
.then((response) => {
if (response.status === 200) {
const html = response.data;
const $ = cheerio.load(html);
console.log($.html())
}
}, (error) => console.log(err));
What I receive:
�Sn����M�٠��g���+=�������&x��.�@!�Q36�%�[�H�+ݴ�|��_/��d�8_
K�b&E�_�}[U1�@u梅Y����{T6ǞOrt��q���ri����eJ(���w����~}S�
�4/��&�2�y���X, �Ǥ���0b�n��PS6O��kY��=�2k�Z��K�Z��r�t
A4�����c���vnY�{t���u�_���C��C&��u����W)���He���O�X��
�]P�
��Ѐ�(�?�,y�m����b��|6���B�|�l6��ݻ+&�6�/vH�O�oX�;�X�s
[����I��&��U �����v�=�vR{��֥����L�r��tG�l�ܓY�N)����(
�(����qX���Z=f�b�-�����At舮�^U6���ف{�h�w�p���m��ϝ
In my attempts to fix I tried to reencode the response to utf-8
. I also requesting from http
instead of https
.
Solution – 1
Amazone simply block basic web scrapping made by programs without header and more, severals conditions can trigger their protections (resulting by sending garbages),
see more with Gidon Lev Eli’s answer on Quora.
You can still bypass it by making your program interacting with Amazon more "like a true browser" than "a mere headless script".