Commit graph

9 commits

Author SHA1 Message Date
Marcel 8d87bb4d91
[parsing] unify tag nesting 2023-03-18 18:38:48 +01:00
Marcel 65f91148fc
[parsing] search for case-insensitive tag names 2023-03-18 18:38:48 +01:00
Marcel 6169b3eca8
[parsing] replace HTMLCommentRanges with HTMLIgnoreRanges
* ignore matches within CDATA elements and comments
2023-03-18 18:38:47 +01:00
Marcel 29278a3323
[parsing] fix return value 2023-03-18 18:38:46 +01:00
Marcel 7a67a2028f
[parsing] tweak tag regex 2023-03-18 18:38:46 +01:00
Marcel dbf350c122
[parsing] return unclosed matched tags 2023-03-18 18:38:45 +01:00
Marcel 8451074b50
[parsing] fix: don't push unmatched void tags onto queue 2023-03-18 18:38:45 +01:00
Marcel 176a156c65
[parsing] rework interface, implemented all get_element(s) functions + extract_attributes() as MatchingElementParser class methods and improve performance 2023-03-18 18:38:44 +01:00
Marcel 5e3894df3f
[parsing] add new module containing various HTML parser classes as replacement for utils.get_html_... functions
* performance is mostly better for large HTML data and on PyPy
2023-03-18 18:38:43 +01:00