NodeJs Axios Website Requested Webpage HTML Response is Unreadable

NodeJs Axios Website Requested Webpage HTML Response is Unreadable

Problem Description:

I’m trying to request a webpage and run it through cheerio, but the response HTML is not readable and is made up of characters like �a��{t��.

How do I receive a readable response?

const axios = require('axios');
const cheerio = require('cheerio');

axios.get('https://www.amazon.co.uk/Acer-XF243YPbmiiprx-Monitor-FreeSync-Adjustable/dp/B097F6DT45')
    .then((response) => {
        if (response.status === 200) {
            const html = response.data;
            const $ = cheerio.load(html);
            console.log($.html())
        }
    }, (error) => console.log(err));

What I receive:

�Sn����M�٠��g���+=�������&x��.�@!�Q36�%�[�H�+ݴ�|��_/��d�8_
K�b&E�_�}[U1�@u梅Y����{T6ǞOrt��q���ri����eJ(���w����~}S�
    �4/��&�2�y���X, �Ǥ���0b�n��PS6O��kY��=�2k�Z��K�Z��r�t
    A4�����c���vnY�{t���u�_���C��C&��u����W)���He���O�X��
        �]P�
    ��Ѐ�(�?�,y�m����b��|6���B�|�l6��ݻ+&�6�/vH�O�oX�;�X�s 
    [����I��&��U �����v�=�vR{��֥����L�r��tG�l�ܓY�N)����(
        �(����qX���Z=f�b�-�����At舮�^U6���ف{�h�w�p���m��ϝ

In my attempts to fix I tried to reencode the response to utf-8. I also requesting from http instead of https.

Solution – 1

Amazone simply block basic web scrapping made by programs without header and more, severals conditions can trigger their protections (resulting by sending garbages),
see more with Gidon Lev Eli’s answer on Quora.

You can still bypass it by making your program interacting with Amazon more "like a true browser" than "a mere headless script".

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject