Q
Q
Quiab2021-04-11 13:49:01
Python
Quiab, 2021-04-11 13:49:01

How to make a CustomContextFactory Scrapy?

Hello!
When trying to parse one resource, there is always a need to manually sort through the DOWNLOADER_CLIENT_TLS_METHOD and DOWNLOADER_CLIENT_TLS_CIPHERS settings.
I wrote my own ContextFactory, and in the sources ( scrapy.utils.misc ) I added the spider transfer to the ContextFactory.
I wrote a code that, in case of receiving a bad response from the server after several repetitions, changes _ssl_method and tls_cipher by going through all the combinations received from itertools.product and the getCertificateOptions method returns CertificateOptions() with new settings. But they seem to be ignored and not applied.

@implementer(IPolicyForHTTPS)
class FutusContext(ClientContextFactory):

    METHODS = {
        'SSLv2': 1,
        'SSLv3': 2,
        'SSLv23': 3,
        'TLSv1': 4,
        'TLSv1.1': 5,
        'TLSv1.2': 6,
    }


    def __init__(self, method=SSL.TLSv1_1_METHOD, *args, **kwargs):
        self.spider = kwargs.pop('spider')
        super().__init__(*args, **kwargs)
        self._ssl_method = method
        self.tls_ciphers = AcceptableCiphers.fromOpenSSLCipherString(self.CIPHERS)

        self.combinations = list(product(self.spider.custom_settings.get('CUSTOM_TLS', []), self.spider.custom_settings.get('CUSTOM_CIPHER', [])))


    def getCertificateOptions(self):
        method = getattr(self, 'method', getattr(self, '_ssl_method', None))
        if self.spider.count > 3:
            next_comb = self.combinations.pop(0)
            self.spider.count = 0
            self._ssl_method = self.METHODS[next_comb[0]]
            self.tls_ciphers = AcceptableCiphers.fromOpenSSLCipherString(next_comb[1])

        return CertificateOptions(
                                verify=False,
                                method=getattr(self, 'method', getattr(self, '_ssl_method', None)),
                                fixBrokenPeers=True,
                                acceptableCiphers=self.tls_ciphers,
                            )

    def getContext(self, hostname=None, port=None):
        return self.getCertificateOptions().getContext()

    def creatorForNetloc(self, hostname, port):
        return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext())

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question