Forums

lxml.etree.XMLSyntaxError: b'failed to load external entity ""'

Hello, so I'm having two or three errors, at the moment I'm trying to request something from it. The code I'm having is

2020-06-30 22:10:23,230: Exception on /comparar [POST]
Traceback (most recent call last):
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/pandas/io/html.py", line 718, in _build_doc
    r = parse(self.io, parser=parser)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/lxml/html/__init__.py", line 939, in parse
    return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
  File "src/lxml/etree.pyx", line 3521, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 638, in lxml.etree._raiseParseError
OSError: Error reading file '': failed to load external entity ""
**NO MATCH**
During handling of the above exception, another exception occurred:
**NO MATCH**
Traceback (most recent call last):
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/Franconline/version1/app_web.py", line 20, in comparar
    main.hacerTablas('primeraTabla.csv',plan1)
  File "/home/Franconline/version1/prueba.py", line 66, in hacerTablas
    hacerArchivoCsv(url,tabla)
  File "/home/Franconline/version1/prueba.py", line 18, in hacerArchivoCsv
    tabla1 = read_html(urlDeTabla,attrs={"class":"table-bordered"})
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/pandas/io/html.py", line 1085, in read_html
    return _parse(
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/pandas/io/html.py", line 895, in _parse
    tables = p.parse_tables()
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/pandas/io/html.py", line 213, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/pandas/io/html.py", line 726, in _build_doc
    r = fromstring(self.io, parser=parser)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/lxml/html/__init__.py", line 875, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/home/Franconline/.virtualenvs/flaskk/lib/python3.8/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 649, in lxml.etree._raiseParseError
  File "<string>", line 0
lxml.etree.XMLSyntaxError: b'failed to load external entity ""'

I read in other posts that you have kind of a whitelist for pages and so, and I don't know if this have something to do with it. But anyways, I'm having this trouble, maybe someone can help me with this. Thanks.

If the external entity that you're trying to load is not on a site that is on the whitelist, then the error would be caused by that.

Yeah, it's not. The page is https://www.info.unlp.edu.ar/ , could it be added to the white list?

We can whitelist sites if they're part of an official documented API -- which I guess can reasonably include XML included entities. Is there any documentation on that site that describes the entities they provide?