profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/rushter/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

rushter/MLAlgorithms 8642

Minimal and clean examples of machine learning algorithms implementations

rushter/data-science-blogs 5698

A curated list of data science blogs

rushter/heamy 527

A set of useful tools for competitive data science.

rushter/selectolax 433

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

rushter/socks5 110

A toy socks 5 server written in Python

ProfoundNetworks/gzipi 24

Tools for indexing gzip files to support random-like access.

rushter/Facebook-Recruiting 22

Predict if an online bid is made by a machine or a human

rushter/blog_code 21

Code snippets from my blog

rushter/sqlinjast 12

Detecting SQL Injections using AST

delete branch ProfoundNetworks/selectolax

delete branch : dev2

delete time in an hour

push eventProfoundNetworks/selectolax

Artem Golubin

commit sha 7affc6f075dbc1e12f628c9c90e69fc574ad4b23

Improve text extraction

view details

Artem Golubin

commit sha c8b96320dd111fb9df74cb5125883424e6f9fb1e

Merge branch 'master' into dev

view details

BarryThrill

commit sha 5ce2a4f6b4954bbc4afa8a2e30a7db55305b54ff

Updated "Available backends" (#43) You have imported: `from selectolax.lexbor import LexborHTMLParser`, there is no need to call `selectolax.lexbor.LexborHTMLParser` but instead just `LexborHTMLParser `

view details

Artem Golubin

commit sha f4090a6a695a953b737e486f8d0d41edea5ac9b2

Fix custom selector

view details

Artem Golubin

commit sha a16fdfe18d744fe27aa1213f7e8b0c4be4ef1372

Update changes

view details

Artem Golubin

commit sha 13603612fb1d7c8b43c2f134ef6dc53d926ff445

Bump version: 0.3.1 → 0.3.2

view details

Artem Golubin

commit sha 6e8486fb4bedf6c8b2ac6239122c544ea1ae1a4c

Fix decompose method. Resolves #45

view details

Artem Golubin

commit sha 5eb52c24cdaebd880478eed8dbf97595b3631bfd

Fix docstring

view details

Artem Golubin

commit sha f498ef4ae36d75d403e68f861eb55ee21b262683

Update LICENSE

view details

Artem Golubin

commit sha 84992b5caea88d2f64e777c33dcdefe8ed585426

Update lexbor

view details

Artem Golubin

commit sha 393c9575f97c208c9d9b2ed9671b653d3e8dfb76

Use document root for HTML root property

view details

Artem Golubin

commit sha 482fa5910a040a4400cf50cb356d75fcb4c4e29f

Bump version: 0.3.2 → 0.3.3

view details

Artem Golubin

commit sha 243b205c7615f696e272ea0883c1e10899edf425

Fix HTML for root tag

view details

Artem Golubin

commit sha 15c23046d838a28d158d95623f75c7c00072d0e9

Bump version: 0.3.3 → 0.3.4

view details

Artem Golubin

commit sha 20ffc058ab81776015c057f1e87128ac2276ae0d

Update changelog

view details

Artem Golubin

commit sha 09a228ea1850c2cd265e9546845de4b186281b11

Update lexbor

view details

Artem Golubin

commit sha afba938d09a917fad9dad8fe18c28d93e7213cff

Merge branch 'master' into dev # Conflicts: # lexbor

view details

Artem Golubin

commit sha 64b042fca9131c02dc255e9c44dbd1e4721e1d4d

Add lexbor

view details

push time in an hour

push eventProfoundNetworks/selectolax

BarryThrill

commit sha 5ce2a4f6b4954bbc4afa8a2e30a7db55305b54ff

Updated "Available backends" (#43) You have imported: `from selectolax.lexbor import LexborHTMLParser`, there is no need to call `selectolax.lexbor.LexborHTMLParser` but instead just `LexborHTMLParser `

view details

Artem Golubin

commit sha f4090a6a695a953b737e486f8d0d41edea5ac9b2

Fix custom selector

view details

Artem Golubin

commit sha a16fdfe18d744fe27aa1213f7e8b0c4be4ef1372

Update changes

view details

Artem Golubin

commit sha 13603612fb1d7c8b43c2f134ef6dc53d926ff445

Bump version: 0.3.1 → 0.3.2

view details

Artem Golubin

commit sha 6e8486fb4bedf6c8b2ac6239122c544ea1ae1a4c

Fix decompose method. Resolves #45

view details

Artem Golubin

commit sha 5eb52c24cdaebd880478eed8dbf97595b3631bfd

Fix docstring

view details

Artem Golubin

commit sha f498ef4ae36d75d403e68f861eb55ee21b262683

Update LICENSE

view details

Artem Golubin

commit sha 84992b5caea88d2f64e777c33dcdefe8ed585426

Update lexbor

view details

Artem Golubin

commit sha 393c9575f97c208c9d9b2ed9671b653d3e8dfb76

Use document root for HTML root property

view details

Artem Golubin

commit sha 482fa5910a040a4400cf50cb356d75fcb4c4e29f

Bump version: 0.3.2 → 0.3.3

view details

Artem Golubin

commit sha 243b205c7615f696e272ea0883c1e10899edf425

Fix HTML for root tag

view details

Artem Golubin

commit sha 15c23046d838a28d158d95623f75c7c00072d0e9

Bump version: 0.3.3 → 0.3.4

view details

Artem Golubin

commit sha 20ffc058ab81776015c057f1e87128ac2276ae0d

Update changelog

view details

Artem Golubin

commit sha 09a228ea1850c2cd265e9546845de4b186281b11

Update lexbor

view details

push time in an hour

push eventrushter/selectolax

Artem Golubin

commit sha 09a228ea1850c2cd265e9546845de4b186281b11

Update lexbor

view details

push time in an hour

create barnchProfoundNetworks/selectolax

branch : dev2

created branch time in 3 days

issue commentrushter/selectolax

`<!DOCTYPE ...>` is not preserved during parsing

I made a simple fix, please test it.

vidhu

comment created time in 3 days

release rushter/selectolax

v0.3.4

released time in 3 days

push eventrushter/selectolax

Artem Golubin

commit sha 243b205c7615f696e272ea0883c1e10899edf425

Fix HTML for root tag

view details

Artem Golubin

commit sha 15c23046d838a28d158d95623f75c7c00072d0e9

Bump version: 0.3.3 → 0.3.4

view details

Artem Golubin

commit sha 20ffc058ab81776015c057f1e87128ac2276ae0d

Update changelog

view details

push time in 3 days

created tagrushter/selectolax

tagv0.3.4

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

created time in 3 days

push eventrushter/selectolax

Artem Golubin

commit sha f498ef4ae36d75d403e68f861eb55ee21b262683

Update LICENSE

view details

Artem Golubin

commit sha 84992b5caea88d2f64e777c33dcdefe8ed585426

Update lexbor

view details

Artem Golubin

commit sha 393c9575f97c208c9d9b2ed9671b653d3e8dfb76

Use document root for HTML root property

view details

Artem Golubin

commit sha 482fa5910a040a4400cf50cb356d75fcb4c4e29f

Bump version: 0.3.2 → 0.3.3

view details

push time in 3 days

created tagrushter/selectolax

tagv0.3.3

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

created time in 3 days

issue commentrushter/selectolax

`<!DOCTYPE ...>` is not preserved during parsing

I will add doctype, but most likely, only next week.

vidhu

comment created time in 3 days

startedssh-mitm/ssh-mitm

started time in 5 days

issue commentrushter/selectolax

`<!DOCTYPE ...>` is not preserved during parsing

Since the parsing engines behind the selectolax follow all the standards, they omit the doctype. Not sure if we can retrieve the original doctype, I need to check. In HTML5, the doctype is the same for all websites.

You can test it in Chrome/Firefox devtools: document.documentElement.outerHTML returns no doctype.

vidhu

comment created time in 5 days

issue commentlexbor/lexbor

Is it possible to provide a simple data extraction example?

There are examples in C https://github.com/lexbor/lexbor/blob/master/examples/lexbor/selectors/

diracsbracket

comment created time in 17 days

issue commentlexbor/lexbor

Reference counting of DOM nodes

I would avoid anything that requires tree traversal (up or down)

This is why I called this smart ID, it should be possible to check if a child node is inside any given parent without making any traversals just given two IDs (any parent/root, child)

phoerious

comment created time in 20 days

issue commentlexbor/lexbor

Reference counting of DOM nodes

I think some kind of smart ID could fix this problem.

We need to have a numerical ID for each node, but such an ID must have one important property: it should be possible to check if a given ID is not inserted into any parent node with another ID.

So, if a user deletes a big subtree, we keep the root ID of the subtree. Next time, if a user tries to access any child node that has already been wrapped, we need to check that the ID of the child node does not depend on the removed parent.

Not sure if such IDs exist for the tree data structures.

phoerious

comment created time in 20 days

issue commentlexbor/lexbor

Code duplication when parsing malformed HTML

@lexborisov Thanks!

BTW, is there a way to fix similar problems without patching the code? https://github.com/lexborisov/Modest/issues/86

I've fixed this by changing the parsing behavior of the iframe tag to div in the code. Can I somehow modify the handling of the iframe in runtime, i.e. by changing the value in a struct that stores meta data about the iframe tag?

Such problems are not related to your engine, but it would be cool to have a way to alter parsing behavior.

rushter

comment created time in 21 days

issue commentrushter/selectolax

Reverse logic in LexborNode.decompose()

Thank you for spotting the bug.

phoerious

comment created time in 22 days

issue closedrushter/selectolax

Reverse logic in LexborNode.decompose()

While browsing the Lexbor wrapper code, I found a logic bug in

https://github.com/rushter/selectolax/blob/13603612fb1d7c8b43c2f134ef6dc53d926ff445/selectolax/lexbor/node.pxi#L269-L272

The if branches should probably be swapped.

closed time in 22 days

phoerious

push eventrushter/selectolax

Artem Golubin

commit sha 6e8486fb4bedf6c8b2ac6239122c544ea1ae1a4c

Fix decompose method. Resolves #45

view details

Artem Golubin

commit sha 5eb52c24cdaebd880478eed8dbf97595b3631bfd

Fix docstring

view details

push time in 22 days

startedhttpvoid/writeups

started time in 22 days

release rushter/selectolax

v0.3.2

released time in a month

push eventrushter/selectolax

Artem Golubin

commit sha f4090a6a695a953b737e486f8d0d41edea5ac9b2

Fix custom selector

view details

Artem Golubin

commit sha a16fdfe18d744fe27aa1213f7e8b0c4be4ef1372

Update changes

view details

Artem Golubin

commit sha 13603612fb1d7c8b43c2f134ef6dc53d926ff445

Bump version: 0.3.1 → 0.3.2

view details

push time in a month

created tagrushter/selectolax

tagv0.3.2

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

created time in a month

issue closedrushter/selectolax

Fix text extraction for lexbor

In some cases, it segfaults

closed time in a month

rushter

issue commentrushter/selectolax

Fix text extraction for lexbor

@rushter

Is this a problem on my side?

I don't think so, I need to check. I don't use text extraction function from lexbor, because it lacks the separator (whitespace or new line) parameter which is very useful for real-world HTML where whitespaces can be added via CSS styles.

rushter

comment created time in a month

issue openedrushter/selectolax

Fix text extraction for lexbor

In some cases, it segfaults

created time in a month

pull request commentrushter/selectolax

Updated "Available backends"

Thanks!

BarryThrill

comment created time in a month

push eventrushter/selectolax

BarryThrill

commit sha 5ce2a4f6b4954bbc4afa8a2e30a7db55305b54ff

Updated "Available backends" (#43) You have imported: `from selectolax.lexbor import LexborHTMLParser`, there is no need to call `selectolax.lexbor.LexborHTMLParser` but instead just `LexborHTMLParser `

view details

push time in a month