Skip to content

etree treewalker infinite loop #217

Closed
@gsnedders

Description

@gsnedders

This goes into an infinite loop:

import html5lib

frag = html5lib.parseFragment("<b><em><foo><foob><fooc><aside></b></em>")

walker = html5lib.getTreeWalker("etree")

print list(walker(frag))

Activity

self-assigned this
on Dec 3, 2015
gsnedders

gsnedders commented on Dec 3, 2015

@gsnedders
MemberAuthor

This is weird.

(Pdb) print list(frag)
[<Element u'{http://www.w3.org/1999/xhtml}b' at 0x7fdd8eb42990>, <Element u'{http://www.w3.org/1999/xhtml}aside' at 0x7fdd8eb42e10>, <Element u'{http://www.w3.org/1999/xhtml}aside' at 0x7fdd8eb42e10>]

Note it's a fragment containing the same element twice.

gsnedders

gsnedders commented on Dec 3, 2015

@gsnedders
MemberAuthor

It's still strictly a tree, though, so I don't think the tree walker should break so badly?

added a commit that references this issue on Dec 3, 2015
gsnedders

gsnedders commented on Dec 3, 2015

@gsnedders
MemberAuthor

list(parents[-1]).index(parent) in the etree walker is why. The above commit adds an assertion that the count is 1, which at least prevents the infinite loop.

added a commit that references this issue on Dec 3, 2015
modified the milestone: 0.99999999 on Dec 3, 2015
added a commit that references this issue on Dec 3, 2015
added a commit that references this issue on May 28, 2016

Fix html5lib#217: Fully remove element in removeChild in etree treebu…

d010144
added a commit that references this issue on May 28, 2016

fixup! Fix html5lib#217: Fully remove element in removeChild in etree…

cbc1b34

2 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @gsnedders

      Issue actions

        etree treewalker infinite loop · Issue #217 · html5lib/html5lib-python