Forums

pyppeteer.launch

I use pyppeteer for generating screenshots on server. I have locally working code: ....

browser = await launch(
    handleSIGINT=False,
    handleSIGTERM=False,
    handleSIGHUP=False
)

But on server (pythonanywhere) it doesnt work. I've investigate problem and found that problem is in:

def get_ws_endpoint(url) -> str:
    url = url + '/json/version'
    timeout = time.time() + 30
    while (True):
        if time.time() > timeout:
            raise BrowserError('Browser closed unexpectedly:\n')
        try:
            with urlopen(url) as f:
                data = json.loads(f.read().decode())
            break
        except URLError as e:
            continue

The url value is http://127.0.0.1:46574/json/version (port is varieted). From my point of view problem occurs in urlopen(url) It works for me on my local machine. And not work on PythonAnyWhere. All packages have identical versions.

What errors does it generate?

URLErorr is

[Errno 111] Connection refused

and on timeout:

pyppeteer.errors.BrowserError: Browser closed unexpectedly:
**NO MATCH**

Do you know what infrastructure pyppeteer is using behind the scenes? If it's using Chrome as its browser, perhaps you could try using our new virtualization system; I've switched that on for your account, so it's possible that in a new console it will start working.

It uses chrome as engine for virtual rendering, Chrome Browser dont used. But it works locally on my laptop without any virtualization. I reload my webapp but it still dont work. The only differense is urlopen("http://127.0.0.1:46574/json/version") work on my computer and rise [Errno 111] Connection refused on server.

You try to connect to some port on the localhost and that would not be possible on PythonAnywhere as we do not let users run things like that. Your process is unable to open a port.

This is a logic of pyppeteer library. Code fragment from above (containing urlopen) is a part of that library. Does it mean that this library is not compatible with PythonAnywhere out of the box? Could you give me some workaround?

All code on PythonAnywhere runs in a virtualized environment -- that's how we support code from multiple people running on the same computers without interfering with each other or being able to see each other's code.

Have you tried running the script in a freshly-started console?

OK. I'm a little confused. We found a problem:

You try to connect to some port on the localhost and that would not be possible on PythonAnywhere as we do not let users run things like that. Your process is unable to open a port.

But, I cant understand how virtualization system can solve it.

And I cannot understand which script you are talking about:

Have you tried running the script in a freshly-started console?

Should I build some new script (from parts of my existing webapp) that tests this functionality?

Filip was, I think, a bit confused about what was going on. If I understand correctly how pyppeteer is working, it's trying to start up a local server to communicate with the browser. Socket servers in general won't work on PythonAnywhere -- you can't start a socket server in a console and communicate with it from another console, or from a website. But in the specific case where one process starts up a socket server, and then spawns another process on the same machine and tries to talk to it, things should work OK. Likewise if a process starts a subprocess, the subprocess starts a socket server, and then the parent process connects to the child process's server; that's how Selenium works, and it should be fine.

The virtualization comes in to the equation because the Chrome HTML rendering engine does not work on our old virtualization system, so we needed to get that sorted before any of this, because as you said, "It uses chrome as engine for virtual rendering".

If this is all running inside your website's code, then I would expect it to have reloaded by now, so you should be working with the new virtualization system, so hopefully that's sorted. So the next question is, where does the port number in the URL that you are connecting to come from?

Thanks a lot for the explanation!

As to your question about port. All these things happen inside pyppeteer library (puppeteer library wrapper). When I try to launch virtual browser (from my code) browser = pyppeteer.lanuch() it starts chrome process, and before connection established it gets endpoint. While retrieving endpoint urlopen called with f'http://127.0.0.1:{port}' where port is derived from:

def get_free_port() -> int:
    """Get free port."""
    sock = socket.socket()
    sock.bind(('localhost', 0))
    port = sock.getsockname()[1]
    sock.close()
    del sock
    gc.collect()
    return port

Port obtained correctly. For example url is http://127.0.0.1:46574/

That code is certainly a good way to find an available port -- but how is the port number communicated to Chrome (actually, probably Chromedriver) so that it knows that that is the port it needs to use for its server?

Port passed with endpoint url.

on pyppeteer browser launch:

  1. endpoint url created with port
  2. connection created with this endpoint (port passed and parsed inside)
  3. Browser instance created with this connection
self.browserWSEndpoint = get_ws_endpoint(self.url)
self.connection = Connection(self.browserWSEndpoint, self._loop, connectionDelay, )
browser = await Browser.create(self.connection, [], self.ignoreHTTPSErrors, self.defaultViewport, self.proc,
self.killChrome)
await self.ensureInitialPage(browser)
return  rowser

The earlier error that you reported: "pyppeteer.errors.BrowserError: Browser closed unexpectedly:" suggests that Chrome is failing to start and so there's nothing to connect to. Check the documentation of pypetteer to see if there's a way that you can get it to log from chrome so you can see why Chrome might be crashing.

Error "pyppeteer.errors.BrowserError: Browser closed unexpectedly:" raised as consequence of the [Errno 111] Connection refused (in openurl). As to my initial message.

Yes. That is what I meant. That does not give any indication about why Chrome failed and that is why we need more logs about why it closed unexpectedly. If pyppeteer has the option of getting additional logs to help debug it, then that should be documented somewhere in the pyppeteer documentation.

Thank you very much for your attention to my problem.!

System behavior is not stable today. Sometimes Chromium launches successfully, sometime not. Sometimes Chromium lanches, page opens!!!, but cant find element on the page! On my local machine the same code works correctly stable.

My code (an error can happen randomly on any line):

browser = await launch( )
page = await browser.newPage() 
await page.goto('http://127.0.0.1/sampleimage/' + str(sampleId))
await page.type('#id_username', "screenshooter")

In 30% of cases error raised on the first line (on launch Chromium) with:

2020-08-30 11:50:11,457: Internal Server Error: /ajax/slacksend/
Traceback (most recent call last):
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/gen_site/catalog/views.py", line 1607, in slack_send
    slack.post_session(session_id)
  File "/home/kostbash/gen_site/catalog/slack.py", line 366, in post_session
    asyncio.run(sendSampleImage(
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/kostbash/gen_site/catalog/slack.py", line 115, in sendSampleImage
    browser = await launch(
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/launcher.py", line 306, in launch
    return await Launcher(options, **kwargs).launch()
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/launcher.py", line 167, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/launcher.py", line 226, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
**NO MATCH**

This error I described above the reason is urlopen("http://127.0.0.1:46574/json/version") work on my computer and rise [Errno 111] Connection refused on PythonAnywhere

in 10% of cases Chromium launched successfully but raise an error on second line (new page) with:

2020-08-30 11:46:29,764: [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:37978/devtools/browser/acba2dab-5450-48c5-9eb8-000f50ebfffa
2020-08-30 11:46:29,768: !!!!!lanuched!!!!
2020-08-30 11:46:31,035: [I:pyppeteer.connection] connection closed
2020-08-30 11:46:31,303: Internal Server Error: /ajax/slacksend/
Traceback (most recent call last):
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/gen_site/catalog/views.py", line 1607, in slack_send
    slack.post_session(session_id)
  File "/home/kostbash/gen_site/catalog/slack.py", line 366, in post_session
    asyncio.run(sendSampleImage(
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/kostbash/gen_site/catalog/slack.py", line 122, in sendSampleImage
    page = await browser.newPage()
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/browser.py", line 202, in newPage
    return await self._defaultContext.newPage()
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/browser.py", line 355, in newPage
    return await self._browser._createPageInContext(self._id)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/browser.py", line 216, in _createPageInContext
    page = await target.page()
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/target.py", line 64, in page
    new_page = await Page.create(
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/page.py", line 92, in create
    await client.send('Page.enable'),
pyppeteer.errors.NetworkError: Protocol error Page.enable: Target closed.

in 40% of cases Chromium launch successfully, create new page, but raise an error on third line (open page) with:

2020-08-30 11:31:02,329: [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:47045/devtools/browser/894128a5-3639-4851-92b8-fefcefd7a81a
2020-08-30 11:31:02,376: !!!!!lanuched!!!!
2020-08-30 11:31:03,419: Internal Server Error: /ajax/slacksend/
Traceback (most recent call last):
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/gen_site/catalog/views.py", line 1607, in slack_send
    slack.post_session(session_id)
  File "/home/kostbash/gen_site/catalog/slack.py", line 366, in post_session
    asyncio.run(sendSampleImage(
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/kostbash/gen_site/catalog/slack.py", line 123, in sendSampleImage
    await page.goto('http://127.0.0.1:8000/sampleimage/' + str(sampleId))
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/page.py", line 879, in goto
    raise PageError(result)
pyppeteer.errors.PageError: net::ERR_INVALID_HTTP_RESPONSE at http://127.0.0.1:8000/sampleimage/262993

and in 10% of cases Chromium launch successfully, create new page, open url, but raise an error on 4th line (find element on page) with:

2020-08-30 11:45:58,588: [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:51167/devtools/browser/75e0ee66-a9e8-4693-a2f1-1b80750abbca
2020-08-30 11:45:58,592: !!!!!lanuched!!!!
2020-08-30 11:45:59,678: Internal Server Error: /ajax/slacksend/
Traceback (most recent call last):
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/gen_site/catalog/views.py", line 1607, in slack_send
    slack.post_session(session_id)
  File "/home/kostbash/gen_site/catalog/slack.py", line 366, in post_session
    asyncio.run(sendSampleImage(
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/kostbash/gen_site/catalog/slack.py", line 124, in sendSampleImage
    await page.type('#id_username', "screenshooter")
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/page.py", line 1589, in type
    return await frame.type(selector, text, options, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/pyppeteer/frame_manager.py", line 660, in type
    raise PageError('Cannot find {} on this page'.format(selector))
pyppeteer.errors.PageError: Cannot find #id_username on this page

You could be getting the "browser closed unexpectedly" error because you are not cleaning up the processes that you have previously started, so it cannot start because you are over the limit of the number of processes that you can have on a machine. It could also be caused by a timeout because you're in the tarpit when you're starting the browser.

The other errors are related to how you're interacting with the site. To debug them, you will need to inspect the response that the server is giving you. There may be clues in there about what is going wrong. Some possibilities are that the site is rate-limiting you or they are interpreting your usage as an attack and are responding to that.

Thank you for your answer. I use only one process. The only thing that belongs to parallelizm is asyncio but it works in one process as I know. As to tarpit I use only 2% of my CPU seconds. Did I understand your phrase

you're in the tarpit when you're starting the browser

correctly that using Chromium automatically lowers the priority of my code?

I did a little research. I run a part of my web application which starts Chromium several times (1 min delay) in a row and fixed errors. Results:

pyppeteer.errors.PageError: Cannot find #id_username on this page
http.client.BadStatusLine: GET /json/version HTTP/1.1#015
pyppeteer.errors.PageError: Cannot find #id_username on this page
BlockingIOError: [Errno 11] Resource temporarily unavailable
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
----
http.client.BadStatusLine: GET /json/version HTTP/1.1#015
----
http.client.BadStatusLine: GET /json/version HTTP/1.1#015
http.client.BadStatusLine: GET /json/version HTTP/1.1#015

--- means delay for approx 10 minutes. From this list, you can see that at first the browser was launched twice and the page was opened and the errors were at later stages. There was almost always the same error in the last runs: http.client.BadStatusLine: GET /json/version HTTP/1.1#015. Details for this error:

2020-08-31 10:50:35,484: Internal Server Error: /ajax/slacksend/
Traceback (most recent call last):
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/home/kostbash/.virtualenvs/venv1/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 285, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: GET /json/version HTTP/1.1#015
**NO MATCH**

On my local machine all works perfectly even under load. How can I move forward on this issue?

I think Glenn is right that the problem is that Chrome processes are not being shut down properly. Am I right in thinking that your pyppeteer code is running as part of your website? I can see that there are a very large number of defunct Chrome processes running under your account on the server where your website is running.

That would explain why it sometimes works and sometimes doesn't -- the first few runs would start new Chrome processes, use them, and then leave them hanging around. Then later runs would have problems starting Chrome because you'd hit resource constraints. Periodically we clear down junk processes on web servers, so after we did that, things would start working again.

What code do you have to shut down the Chrome instances when you are done with them?

Yes, pyppeteer code runs as a part of my website. A few words about my web application: On one of the pages there is a button, when you click on it, a series (now when I am testing only two) of requests occurs to another page with parameters in order to take a screenshot from it.

What we have? Two calls to take a screenshot. I test this about ten times a day (for example yesterday only for write my previous post ))) ). Is it possible that from this load such errors may occur (and large Chrome processes launched)? This code (pyppeteer) is available only for this button.

How do I close my chrome instances? Below is all my code with pypppeteer:

browser = await launch()
page = await browser.newPage()
await page.goto('http://127.0.0.1:8000/sampleimage/' + str(sampleId))
await page.type('#id_username', "screenshooter")

await page.type('#id_password', "*******")
await asyncio.wait([page.click('input.btn'),
                    page.waitForSelector(".MathJax_Preview")])
body = await page.J('body');
await body.screenshot({
    'path': img_file_name
});
await browser.close()

As I wrote on previous comments errors occur in first four lines. So browser.close() never called. For example right now I press that button three times and got error in 4th string in first two attempts and pyppeteer.errors.NetworkError: Protocol error Page.enable: Target closed. on third attempt. It seems that some my logic error with finding elements on page exists, then exception raised and browser never closed, after resource constraints hits and all the following errors caused by this limitation.

I'll try to use try finally for father investigations and try to debug error with selector. But could you tell me more about defunct processes that you see, and about that constraints and how frequently this constraints drops? Thanks in advance.

I'm pretty sure that async code like that will not work in a web app. Also, there is nothing at 127.0.0.1:8000 when you're running on PythonAnywhere.

I'm pretty sure that async code like that will not work in a web app.

Why? It works for me on local machine in web app.

Also, there is nothing at 127.0.0.1:8000 when you're running on PythonAnywhere.

How can I request to webapp as I defined in django urls.py: url(r'^sampleimage/(\d+)', views.sampleimage, name='sampleimage') ?

Async won't work as we run your web app using uwsgi that is not async.

To run your web app you need to configure it on your "Web" page on PythonAnywhere.

To run your web app you need to configure it on your "Web" page on PythonAnywhere.

Did you mean wsgi.py? This file is the same as my local file, which works entirely locally. Please, tell me principal differences between my local implementation (standard Django runserver from the box) and PythonAnywhere environment.

And also there was no answer to my previous question. You wrote:

Also, there is nothing at 127.0.0.1:8000 when you're running on PythonAnywhere.

And I asked: How can I request to webapp as I defined in django urls.py: url(r'^sampleimage/(\d+)', views.sampleimage, name='sampleimage')?

Also, there is nothing at 127.0.0.1:8000 when you're running on PythonAnywhere.

And I asked: How can I request to webapp as I defined in django urls.py: url(r'^sampleimage/(\d+)', views.sampleimage, name='sampleimage')?

If you've set up a website at www.yourdomain.com on the "Web" page, then you can access that URL as something like http://www.yourdomain.com/sampleimage/1234, where 1234 is the ID of the image that you'd expect to have pulled out from that URL mapping.

Please, tell me principal differences between my local implementation (standard Django runserver from the box) and PythonAnywhere environment.

Django's runserver runs Django's development server, which is designed for debugging. It runs on your local computer, is not accessible from the public Internet, and by default will run on port 8000.

On PythonAnywhere, you're deploying a site to the public Internet so that other people can see it, so you have a front-end webserver (nginx) handling incoming requests, and using uWSGI to run Django behind that -- a fairly standard production environment. It means that your site is running on the domain name that you specify on the "Web" page, on port 80 (for HTTP) and port 443 (for HTTPS).

Thanks a lot, giles! Excuse me for my beginner questions ))) This is my first web application I made to teach students math. During communication, I also learned English ))) Now I have better understanging about it. I'll try to figure it out.

:) Happy coding!