profile
viewpoint
Jeff Widman jeffwidman Bellingham, WA http://jeffwidman.com/ I enjoy data pipeline / distributed systems engineering. Projects I help maintain: kafka-python, kazoo, FactoryBoy, Prezto, Flask (alum), etc.

dpkp/kafka-python 4010

Python client for Apache Kafka

FactoryBoy/factory_boy 2268

A test fixtures replacement for Python

flask-debugtoolbar/flask-debugtoolbar 763

A toolbar overlay for debugging Flask applications

jeffwidman/bitbucket-issue-migration 293

A small script for migrating repo issues from Bitbucket to GitHub

belak/prezto-contrib 83

A set of additional plugins designed to work easily with prezto

jeffwidman/sqlalchemy-postgresql-materialized-views 73

A SQLAlchemy recipe for managing PostgreSQL Materialized Views:

jeffwidman/dotfiles 32

Works on *nix, optimized for macOS. Managed using Stow

jeffwidman/ansible-centminmod 12

Ansible role for installing/configuring centminmod on CentOS 7

jeffwidman/ansible-yum-cron 5

Ansible role for installing/configuring yum-cron on CentOS/RHEL

jeffwidman/ansible-centos-bootstrap 3

Ansible role for handling miscellaneous CentOS tasks when bootstrapping a new server

delete branch jeffwidman/crc32c

delete branch : patch-1

delete time in 6 hours

issue closeddpkp/kafka-python

Get event time stamp in consumer

I am using kafka-python==2.0.1

producer.py

def queue(self, key, value): self.producer.send( MESSAGES_TOPIC, timestamp_ms=int(time.mktime(datetime.utcnow().timetuple())), key=key, value=value, )

In timestamp_ms, I am adding the time at which the message gets queued. In consumer, I am using:

json.loads(event.body.decode('utf-8'))

How can i get timestamp_ms value in consumer?

closed time in 15 hours

jyotisachdeva57

issue commentdpkp/kafka-python

Get event time stamp in consumer

I don't recall for sure, but I have a hunch it's part of the consumer message/record metadata... meaning it's not going to be part of the event.body at all. Inspect the raw result of what poll() gives you back and you can probably see it, as long as your API protocol is set high enough and your broker supports it. Otherwise if the api protocol is too low, then the broker will downgrade the record metadata and omit it.

jyotisachdeva57

comment created time in 15 hours

issue commentdpkp/kafka-python

Topic title validator / error checker?

Does the Java client do any validation like this? It certainly sounds like it'd be useful...

amkearns-usgs

comment created time in 15 hours

issue closeddpkp/kafka-python

use https

we are using this library to consume Azure Event hub which use port 9000+ for communication. Due to firewall, is there anyway to use https/rest call through http port for communication?

closed time in 15 hours

cometta

issue commentdpkp/kafka-python

use https

No, this uses Kafka protocol, entirely different then... you would want to use a different library perhaps the requests lib to use http consumption (if EventHub even supports that)

cometta

comment created time in 15 hours

issue commentdpkp/kafka-python

Is kafka-python support incremental cooperative rebalancing?

Nobody has taken the time to put together a PR adding support for it... we'd certainly be open to it!

jen6

comment created time in 15 hours

issue commentdpkp/kafka-python

KafkaConsumer creates topics

does your broker's server.properties allow auto-creating topics?

solebox

comment created time in 15 hours

issue closeddpkp/kafka-python

print not delete

KafkaAdminClient(bootstrap_servers=['']).list_consumer_groups()

list_consumer_groups of this method 1038 row there is one print

closed time in 15 hours

whuizhe

issue commentdpkp/kafka-python

print not delete

Not sure what you're referring to? Can't find any print calls in https://github.com/dpkp/kafka-python/blob/master/kafka/admin/client.py

whuizhe

comment created time in 15 hours

issue closeddpkp/kafka-python

Error with 'api_version' parametr in KafkaProducer

My KafkaProducer construct

producer = KafkaProducer(
    bootstrap_servers=['10.1.25.111:9092'],
    compression_type='gzip',
    acks='all',
    batch_size=50000,
    max_request_size=1048576,
    api_version=2
    )

Error:

site-packages\kafka\producer\kafka.py in _max_usable_produce_magic(self)
    508 
    509     def _max_usable_produce_magic(self):
--> 510         if self.config['api_version'] >= (0, 11):
    511             return 2
    512         elif self.config['api_version'] >= (0, 10):

TypeError: '>=' not supported between instances of 'int' and 'tuple'

closed time in 15 hours

bukreevai

issue commentdpkp/kafka-python

Error with 'api_version' parametr in KafkaProducer

Thanks @kvfi and @DavidLin3!

bukreevai

comment created time in 15 hours

issue closeddpkp/kafka-python

time.sleep blocks heartbeat

I am using 1.4.7 version and python 3.7, and in my code that runs while consuming a message I used "time.sleep(10)" in a certain flow, and because of that sleep the heartbeat stopped and an infinite loop was created of re-balancing, leaving and re-joining the group and re-consuming the same Kafka message. Shouldn`t the heartbeat be in a different thread and keep going?

Log lines:

"Heartbeat failed for group <my_group_name> because it is rebalancing" Every 3 seconds Along with "Heartbeat poll expired, leaving group"

closed time in 15 hours

assaf-shechter

issue commentdpkp/kafka-python

time.sleep blocks heartbeat

Even with the heartbeat thread, you still have to call poll() frequently enough to not drop out of the consumer group. They have two timeouts... essentially one is "alive" and the other is "making progress"...

Closing as this is most likely not a bug but a misunderstanding of how Kafka works.

assaf-shechter

comment created time in 15 hours

issue commentdpkp/kafka-python

KafkaConsumer.pause() always raises `partitions must be TopicPartition namedtuples`

Looks like a valid bug, PR welcome!

francoisfernando

comment created time in 15 hours

issue commentdpkp/kafka-python

Exception in thread kafka-python-producer-XXX (most likely raised during interpreter shutdown)

What version of kafka-python / python 2?

Can you run this under python 3 and see if you still hit it? Others may feel differently, but I'm not interested in spending time supporting python 2 given that it's EOL'd and the threading implementation has changed a bit under python 3.

huqd

comment created time in 15 hours

issue closeddpkp/kafka-python

Cannot use in jupyter

closed time in 15 hours

DachuanZhao

issue commentdpkp/kafka-python

Cannot use in jupyter

Sure you can, I've used it in Jypter myself many times. You just have to be aware that if your'e using high-level consumer groups, they have timeouts and you can hit those if you're doing ad-hoc experimenting.

DachuanZhao

comment created time in 15 hours

issue commentdpkp/kafka-python

TypeError: 'NoneType' object is not iterable

Can you include example code and the full traceback? I realize the code works 99% of the time, and only intermittently throws the error and only with a new auto-created topic, but it'd still be helpful for stepping through what's happening.

avloss

comment created time in 15 hours

issue commentdpkp/kafka-python

Support for sasl_mechanism="DMS"

There's not really a roadmap right now, us maintainers are basically in bugfix mode/approve feature patches that others submit. So you are welcome to submit a PR and we'll happily take a look at it.

mamilov

comment created time in 15 hours

issue commentdpkp/kafka-python

BrokerConnection | Error receiving network data closing socket

This may be related to #1985... not sure, I haven't looked closely, but at first glance the sound similar.

takwas

comment created time in 15 hours

issue commentdpkp/kafka-python

kafka.errors.UnrecognizedBrokerVersion: UnrecognizedBrokerVersion on kafka==2.5.0

Looks like a new broker version landed... PR welcome if someone wants to add it to the map of "broker version --> supported calls"

loveJasmine

comment created time in 15 hours

issue commentdpkp/kafka-python

AWS lambda cannot send message to Kafka running on EC2 instance

There's nothing we can really go on here, you'd need to provide a lot more info. It's most likely a mismatch between your config and your environment.

hiennmhd

comment created time in 15 hours

issue commentdpkp/kafka-python

Kafka + OAuth + Azure EventHub - unexpected issues

No idea, unfortunately this is something you'll need to debug further... Can you run it under pdb and step through it?

One thing to check is how eventhub responds when we probe for the broker api version... you can bypass the probing by manually pinning the version.

MichalKosowski

comment created time in 15 hours

issue closeddpkp/kafka-python

Support for .poll()

I have code

consumer = KafkaConsumer(bootstrap_servers=servers)
topic_partition  = TopicPartition(topic, 0)
consumer.assign([topic_partition])
print (consumer.position(topic_partition))
print (consumer.end_offsets([topic_partition]))
#for msg in consumer:
#    print (msg)
messages = consumer.poll(5000)
print (messages)

In prints I have right data about topic. And consumer.poll(5000) do not return mesaages.

closed time in 15 hours

bukreevai

issue commentdpkp/kafka-python

Support for .poll()

poll() is async... you may need to call it several times before it returns data, depending on how fast your broker responds.

bukreevai

comment created time in 15 hours

delete branch dpkp/kafka-python

delete branch : bump-dev-requirements.txt

delete time in 15 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha cb96a1a6c79c17ac9b3399b7a33bbaea7ad8886f

Bump dev requirements (#2129) Also re-order lexicographically. Note that I did not exhaustively test this... there could be edge cases depending on the python version. But I think we should be okay because `tox.ini` is currently testing using with unpinned versions, so I think we're already running these versions in our test suite.

view details

push time in 15 hours

PR merged dpkp/kafka-python

Bump dev requirements

Also re-order lexicographically.

Note that I did not exhaustively test this... there could be edge cases depending on the python version. But I think we should be okay because tox.ini/.travis.yml is currently testing using with unpinned versions, so I think we're already running these versions in our test suite.

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+15 -15

0 comment

1 changed file

jeffwidman

pr closed time in 15 hours

pull request commentdpkp/kafka-python

Check client version iif not given

Possibly related #2107 / #2108 ... as I noted https://github.com/dpkp/kafka-python/pull/2108#pullrequestreview-490178998, we should probably step-back and re-examine the way this gets handled so that it's consistent everywhere.

jacopofar

comment created time in 16 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 098ecbfd79ce8919c1d3dec50a207bbbe62c894c

Merge _find_coordinator_id methods (#2127) Previously there were two methods: * `_find_coordinator_id()` * `_find_many_coordinator_ids()` But they do basically the same thing internally. And we need the plural two places, but the singular only one place. So merge them, and change the function signature to take a list of `group_ids` and return a dict of `group_id: coordinator_id`s. As a result of this, the `describe_groups()` command should scale better because the `_find_coordinator_ids()` command issues all the requests async, instead of sequentially blocking as the `described_groups()` used to do.

view details

Jeff Widman

commit sha 16f48671e6c821c1174acc8fe27eee58a2316156

Fix crc32c deprecation warning (#2128) Fix a deprecation warning in the newest version.

view details

Jeff Widman

commit sha c7a6c76458364af3e0a65ac25bf83bcfb5fbed41

Bump dev requirements Also re-order lexicographically. Note that I did not exhaustively test this... there could be edge cases depending on the python version. But I think we should be okay because `tox.ini` is currently testing using with unpinned versions, so I think we're already running these versions in our test suite.

view details

push time in 16 hours

delete branch dpkp/kafka-python

delete branch : bump-crc32c-version

delete time in 16 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 16f48671e6c821c1174acc8fe27eee58a2316156

Fix crc32c deprecation warning (#2128) Fix a deprecation warning in the newest version.

view details

push time in 16 hours

PR merged dpkp/kafka-python

Fix crc32c deprecation warning

Fix a deprecation warning in the newest version.

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+2 -2

0 comment

2 changed files

jeffwidman

pr closed time in 16 hours

PR opened dpkp/kafka-python

Bump dev requirements

Also re-order lexicographically.

Note that I did not exhaustively test this... there could be edge cases depending on the python version. But I think we should be okay because tox.ini is currently testing using with unpinned versions, so I think we're already running these versions in our test suite.

I didn't bump crc32c as that's handled in https://github.com/dpkp/kafka-python/pull/2128.

+15 -15

0 comment

1 changed file

pr created time in 16 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha caab3c536303e5caa70eb562fb5ec4a4a301461d

Bump dev requirements Also re-order lexicographically. Note that I did not exhaustively test this... there could be edge cases depending on the python version. But I think we should be okay because `tox.ini` is currently testing using with unpinned versions, so I think we're already running these versions in our test suite.

view details

push time in 16 hours

create barnchdpkp/kafka-python

branch : bump-dev-requirements.txt

created branch time in 16 hours

delete branch dpkp/kafka-python

delete branch : merge-_find_coordinator_ids-methods

delete time in 16 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 098ecbfd79ce8919c1d3dec50a207bbbe62c894c

Merge _find_coordinator_id methods (#2127) Previously there were two methods: * `_find_coordinator_id()` * `_find_many_coordinator_ids()` But they do basically the same thing internally. And we need the plural two places, but the singular only one place. So merge them, and change the function signature to take a list of `group_ids` and return a dict of `group_id: coordinator_id`s. As a result of this, the `describe_groups()` command should scale better because the `_find_coordinator_ids()` command issues all the requests async, instead of sequentially blocking as the `described_groups()` used to do.

view details

push time in 16 hours

PR merged dpkp/kafka-python

Merge _find_coordinator_id methods

Previously there were two methods:

  • _find_coordinator_id()
  • _find_many_coordinator_ids()

But they do basically the same thing internally. And we need the plural two places, but the singular only one place.

So merge them, and change the function signature to take a list of group_ids and return a dict of group_id: coordinator_id's.

As a result of this, the describe_groups() command should scale better because the _find_coordinator_ids() command issues all the requests async, instead of sequentially blocking as the described_groups() used to do.

IIRC, this deviates slightly from the Java client, as they only take a singular group ID... mostly because it gets tricky to handle retriable vs non-retriable errors when you send a bunch at once. However, we don't handle errors quite the same way--we just raise if the futures don't complete, and also raise if we encounter any problems in the returned FindCoordinatorResponse. Given that, IMO this is more useful/scalable...

Fix #2124

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+27 -42

1 comment

1 changed file

jeffwidman

pr closed time in 16 hours

issue closeddpkp/kafka-python

Switch describe_consumer_groups() to use _find_many_coordinator_ids()

Once #2040 is merged, it'd be more performant/scalable if we updated describe_consumer_groups() to use the new _find_many_coordinator_ids() to issue the find coordinator requests async.

closed time in 16 hours

jeffwidman

PR opened dpkp/kafka-python

Fix crc32c deprecation warning

Fix a deprecation warning in the newest version.

+2 -2

0 comment

2 changed files

pr created time in 17 hours

create barnchdpkp/kafka-python

branch : bump-crc32c-version

created branch time in 17 hours

PR opened ICRAR/crc32c

Fix typos
+4 -4

0 comment

1 changed file

pr created time in 17 hours

push eventjeffwidman/crc32c

Jeff Widman

commit sha 4a0fca4959247db2498903a92e6ca0fd878084d0

Fix typos

view details

push time in 17 hours

fork jeffwidman/crc32c

A python package implementing the crc32c algorithm in hardware and software

fork in 17 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 80664a55bfafb243e89640986b8d53748b996df6

Merge _find_coordinator_id methods Previously there were two methods: * `_find_coordinator_id()` * `_find_many_coordinator_ids()` But they do basically the same thing internally. And we need the plural two places, but the singular only one place. So merge them, and change the function signature to take a list of `group_ids` and return a dict of `group_id: coordinator_id`s. As a result of this, the `describe_groups()` command should scale better because the `_find_coordinator_ids()` command issues all the requests async, instead of sequentially blocking as the `described_groups()` used to do.

view details

push time in 17 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 6cfe706d1ab4eaa7c970f19ce102f65625affb96

Lint cleanup (#2126) Small cleanup leftover from https://github.com/dpkp/kafka-python/pull/2035

view details

Jeff Widman

commit sha 57ccb4d28acf02fabd6b2baf979d8c63652174a5

Merge _find_coordinator_id methods Previously there were two methods: * `_find_coordinator_id()` * `_find_many_coordinator_ids()` But they do basically the same thing internally. And we need the plural two places, but the singular only one place. So merge them, and change the function signature to take a list of `group_ids` and return a dict of `group_id: coordinator_id`s. As a result of this, the `describe_groups()` command should scale better because the `_find_coordinator_ids()` command issues all the requests async, instead of sequentially blocking as the `described_groups()` used to do.

view details

push time in 18 hours

pull request commentdpkp/kafka-python

Merge _find_coordinator_id methods

IIRC, this deviates slightly from the Java client, as they only take a singular group ID... mostly because it gets tricky to handle retriable vs non-retriable errors when you send a bunch at once. However, we don't handle errors quite the same way--we just raise if the futures don't complete, and also raise if we encounter any problems in the returned FindCoordinatorResponse. Given that, IMO this is more useful/scalable...

jeffwidman

comment created time in 18 hours

delete branch dpkp/kafka-python

delete branch : minor-cleanup

delete time in 18 hours

push eventdpkp/kafka-python

Jeff Widman

commit sha 6cfe706d1ab4eaa7c970f19ce102f65625affb96

Lint cleanup (#2126) Small cleanup leftover from https://github.com/dpkp/kafka-python/pull/2035

view details

push time in 18 hours

PR merged dpkp/kafka-python

Lint cleanup

Small lint cleanup leftover from https://github.com/dpkp/kafka-python/pull/2035

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+3 -4

0 comment

1 changed file

jeffwidman

pr closed time in 18 hours

PR opened dpkp/kafka-python

Merge _find_coordinator_id methods

Previously there were two methods:

  • _find_coordinator_id()
  • _find_many_coordinator_ids()

But they do basically the same thing internally. And we need the plural two places, but the singular only one place.

So merge them, and change the function signature to take a list of group_ids and return a dict of group_id: coordinator_ids.

As a result of this, the describe_groups() command should scale better because the _find_coordinator_ids() command issues all the requests async, instead of sequentially blocking as the described_groups() used to do.

+27 -42

0 comment

1 changed file

pr created time in 18 hours

PR opened dpkp/kafka-python

Lint cleanup

Small lint cleanup leftover from https://github.com/dpkp/kafka-python/pull/2035

+3 -4

0 comment

1 changed file

pr created time in 18 hours

create barnchdpkp/kafka-python

branch : merge-_find_coordinator_ids-methods

created branch time in 18 hours

create barnchdpkp/kafka-python

branch : minor-cleanup

created branch time in 18 hours

push eventdpkp/kafka-python

Swen Wenzel

commit sha 16a0b3155fdeebe80295fcfb0f32d75af74dcb1a

Feature: delete consumergroups (#2040) * Add consumergroup related errors * Add DeleteGroups to protocol.admin * Implement delete_groups feature on KafkaAdminClient

view details

push time in 19 hours

PR merged dpkp/kafka-python

Feature delete consumergroups

I didn't find any issue requesting this, but I could use it so I just went ahead and implemented it. It's kind of a work in progress, just wanted to get it out for early feedback. Tests are running fine and it looks good but what's still missing is testing if group coordinator discovery works properly when there are different brokers. Also I'm not quite sure how to handle errors. Right now they are ignored and it's left for the caller to inspect the result but we could also just raise any errors. WDYT?

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+219 -5

6 comments

4 changed files

swenzel

pr closed time in 19 hours

pull request commentdpkp/kafka-python

Feature delete consumergroups

Thanks @swenzel!

swenzel

comment created time in 19 hours

Pull request review commentdpkp/kafka-python

Feature delete consumergroups

 def test_describe_configs_invalid_broker_id_raises(kafka_admin_client):      with pytest.raises(ValueError):         configs = kafka_admin_client.describe_configs([ConfigResource(ConfigResourceType.BROKER, broker_id)])+++@pytest.mark.skipif(env_kafka_version() < (1, 1), reason="Delete consumer groups requires broker >=1.1")+def test_delete_consumergroups(kafka_admin_client, kafka_consumer_factory, send_messages):+    send_messages(range(0, 100), partition=0)+    consumer1 = kafka_consumer_factory(group_id="test1-group1")+    next(consumer1)+    consumer1.close()++    consumer2 = kafka_consumer_factory(group_id="test1-group2")+    next(consumer2)+    consumer2.close()++    consumer3 = kafka_consumer_factory(group_id="test1-group3")+    next(consumer3)+    consumer3.close()++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test1-group1" in consumergroups+    assert "test1-group2" in consumergroups+    assert "test1-group3" in consumergroups++    delete_results = {+        group_id: error+        for group_id, error in kafka_admin_client.delete_consumer_groups(["test1-group1", "test1-group2"])+    }+    assert delete_results["test1-group1"] == NoError+    assert delete_results["test1-group2"] == NoError+    assert "test1-group3" not in delete_results++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test1-group1" not in consumergroups+    assert "test1-group2" not in consumergroups+    assert "test1-group3" in consumergroups+++@pytest.mark.skipif(env_kafka_version() < (1, 1), reason="Delete consumer groups requires broker >=1.1")+def test_delete_consumergroups_with_errors(kafka_admin_client, kafka_consumer_factory, send_messages):+    send_messages(range(0, 100), partition=0)+    consumer1 = kafka_consumer_factory(group_id="test2-group1")+    next(consumer1)+    consumer1.close()++    consumer2 = kafka_consumer_factory(group_id="test2-group2")+    next(consumer2)++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test2-group1" in consumergroups+    assert "test2-group2" in consumergroups+    assert "test2-group3" not in consumergroups++    delete_results = {+        group_id: error+        for group_id, error in kafka_admin_client.delete_consumer_groups(["test2-group1", "test2-group2", "test2-group3"])+    }++    assert delete_results["test2-group1"] == NoError+    assert delete_results["test2-group2"] == NonEmptyGroupError+    assert delete_results["test2-group3"] == GroupIdNotFoundError++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test2-group1" not in consumergroups+    assert "test2-group2" in consumergroups+    assert "test2-group3" not in consumergroups

Good call.. PR welcome: https://github.com/dpkp/kafka-python/issues/2125

swenzel

comment created time in 19 hours

PullRequestReviewEvent

issue openeddpkp/kafka-python

Add a .editorconfig file

This is an easy one if someone's looking for a way to get involved with open source: https://github.com/dpkp/kafka-python/pull/2040#discussion_r490231719

add an .editorconfig file to specify newlines at end of file, spaces rather than tabs, etc.

created time in 19 hours

Pull request review commentdpkp/kafka-python

Add delete records method support for kafka admin api

 def create_partitions(self, topic_partitions, timeout_ms=None, validate_only=Fal                 .format(version))         return self._send_request_to_controller(request) +    def delete_records(self, records_to_delete, timeout_ms=None):+        """Delete records whose offset is smaller than the given offset of the corresponding partition.+

Please include a note that errors must be checked as they are not raised (at least not currently).

10101010

comment created time in 19 hours

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Add delete records method support for kafka admin api

 class CreatePartitionsRequest_v1(Request): CreatePartitionsResponse = [     CreatePartitionsResponse_v0, CreatePartitionsResponse_v1, ]-

Why delete the newline at the end?

10101010

comment created time in 19 hours

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Add delete records method support for kafka admin api

 def create_partitions(self, topic_partitions, timeout_ms=None, validate_only=Fal                 .format(version))         return self._send_request_to_controller(request) +    def delete_records(self, records_to_delete, timeout_ms=None):+        """Delete records whose offset is smaller than the given offset of the corresponding partition.++        :param records_to_delete: ``{TopicPartition: int}``: The earliest available offsets for the+            given partitions.++        :return: List of DeleteRecordsResponse+        """+        timeout_ms = self._validate_timeout(timeout_ms)+        version = self._matching_api_version(MetadataRequest)++        topics = set()++        for topic2partition in records_to_delete:+            topics.add(topic2partition.topic)++        request = MetadataRequest[version](+            topics=list(topics),+            allow_auto_topic_creation=False+        )++        future = self._send_request_to_node(self._client.least_loaded_node(), request)++        self._wait_for_futures([future])+        response = future.value++        version = self._matching_api_version(DeleteRecordsRequest)++        PARTITIONS_INFO = 3+        NAME = 1+        PARTITION_INDEX = 1+        LEADER = 2++        partition2leader = dict()++        for topic in response.topics:+            for partition in topic[PARTITIONS_INFO]:+                t2p = TopicPartition(topic=topic[NAME], partition=partition[PARTITION_INDEX])+                partition2leader[t2p] = partition[LEADER]++        responses = []++        for topic2partition in records_to_delete:+            request = DeleteRecordsRequest[version](+                topics=[(topic2partition.topic, [(topic2partition.partition, records_to_delete[topic2partition])])],+                timeout_ms=timeout_ms+            )+            # Sending separate request for each partition leader+            future = self._send_request_to_node(partition2leader[topic2partition], request)+            self._wait_for_futures([future])++            response = future.value+            responses.append(response)++        return responses+     # delete records protocol not yet implemented     # Note: send the request to the partition leaders

Also, this note should be deleted given that this PR implements it. 😄

10101010

comment created time in 20 hours

PullRequestReviewEvent

issue openeddpkp/kafka-python

Switch describe_consumer_groups() to use _find_many_coordinator_ids()

Once #2040 is merged, it'd be more performant/scalable if we updated describe_consumer_groups() to use the new _find_many_coordinator_ids() to issue the find coordinator requests async.

created time in a day

pull request commentdpkp/kafka-python

Feature delete consumergroups

@swenzel can you rebase to fix conflicts?

Take a look at my feedback above... the comment and newline are nits but if you're already rebasing be nice to fix these.

swenzel

comment created time in a day

pull request commentdpkp/kafka-python

Feature delete consumergroups

Any opinion on how to handle failed deletes? Raise exception vs. leave it for the caller to check?

That's really tough... in list_consumer_groups() and describe_consumer_groups() we immediately raise if any group throws an error... but those are non-destructive operations, so re-running them is fine.

Here I'm hesitant to raise since everything is sent async so some groups may complete and others may fail... it is idempotent to simply re-run, but if some will always fail with errors like "group unknown/ doesn't exist" etc, then you want to know what happened to the others in the list... did they delete successfully? So I'd probably lean (slightly) toward just returning the error codes and letting the caller inspect to see which completed and which failed.

swenzel

comment created time in a day

pull request commentdpkp/kafka-python

Feature delete consumergroups

Considering how much work is required on the test suite to make the tests use multiple brokers I'd just leave it as it is now unless you really want me to extend the tests.

I am fine with leaving as-is... agreed that doing multiple brokers is a lot of extra work. It's probably better done as a pytest fixture that spins up multiple brokers and multiple consumer groups that are guaranteed to have their coordinators spread across the brokers, as we could leverage that for several unit tests...

swenzel

comment created time in a day

Pull request review commentdpkp/kafka-python

Feature delete consumergroups

 def list_consumer_group_offsets(self, group_id, group_coordinator_id=None,         response = future.value         return self._list_consumer_group_offsets_process_response(response) -    # delete groups protocol not yet implemented-    # Note: send the request to the group's coordinator.+    def delete_consumer_groups(self, group_ids, group_coordinator_id=None):+        """Delete Consumer Group Offsets for given consumer groups.++        Note:+        This does not verify that the group ids actually exist and+        group_coordinator_id is the correct coordinator for all these groups.++        The result needs checking for potential errors.++        :param group_ids: The consumer group ids of the groups which are to be deleted.+        :param group_coordinator_id: The node_id of the broker which is the coordinator for+            all the groups. Use only if all groups are coordinated by the same broker.+            If set to None, will query the cluster to find the coordinator for every single group.+            Explicitly specifying this can be useful to prevent+            that extra network round trips if you already know the group+            coordinator. Default: None.+        :return: A list of tuples (group_id, KafkaError)+        """+        if group_coordinator_id is not None:+            futures = [self._delete_consumer_groups_send_request(group_ids, group_coordinator_id)]+        else:+            groups_coordinators = defaultdict(list)+            for group_id, group_coordinator_id in self._find_many_coordinator_ids(group_ids):+                groups_coordinators[group_coordinator_id].append(group_id)+            futures = [+                self._delete_consumer_groups_send_request(group_ids, group_coordinator_id)+                for group_coordinator_id, group_ids in groups_coordinators.items()+            ]++        self._wait_for_futures(futures)++        results = []+        for f in futures:+            results.extend(self._convert_delete_groups_response(f.value))+        return results++    def _convert_delete_groups_response(self, response):

This is outside the scope of this PR, but this client code is a little inconsistent currently between _convert_X(response) and _X_process_response(response)... it'd be nice to make these more consistent.

swenzel

comment created time in a day

Pull request review commentdpkp/kafka-python

Feature delete consumergroups

 def _find_coordinator_id(self, group_id):         response = future.value         return self._find_coordinator_id_process_response(response) +    def _find_many_coordinator_ids(self, group_ids):

Looking at this more, I certainly think it'd be convenient to have this method... we could also use it within the describe_consumer_groups() method to emit all the group coordinator requests in parallel.

swenzel

comment created time in a day

Pull request review commentdpkp/kafka-python

Feature delete consumergroups

 def test_describe_configs_invalid_broker_id_raises(kafka_admin_client):      with pytest.raises(ValueError):         configs = kafka_admin_client.describe_configs([ConfigResource(ConfigResourceType.BROKER, broker_id)])+++@pytest.mark.skipif(env_kafka_version() < (1, 1), reason="Delete consumer groups requires broker >=1.1")+def test_delete_consumergroups(kafka_admin_client, kafka_consumer_factory, send_messages):+    send_messages(range(0, 100), partition=0)+    consumer1 = kafka_consumer_factory(group_id="test1-group1")+    next(consumer1)+    consumer1.close()++    consumer2 = kafka_consumer_factory(group_id="test1-group2")+    next(consumer2)+    consumer2.close()++    consumer3 = kafka_consumer_factory(group_id="test1-group3")+    next(consumer3)+    consumer3.close()++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test1-group1" in consumergroups+    assert "test1-group2" in consumergroups+    assert "test1-group3" in consumergroups++    delete_results = {+        group_id: error+        for group_id, error in kafka_admin_client.delete_consumer_groups(["test1-group1", "test1-group2"])+    }+    assert delete_results["test1-group1"] == NoError+    assert delete_results["test1-group2"] == NoError+    assert "test1-group3" not in delete_results++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test1-group1" not in consumergroups+    assert "test1-group2" not in consumergroups+    assert "test1-group3" in consumergroups+++@pytest.mark.skipif(env_kafka_version() < (1, 1), reason="Delete consumer groups requires broker >=1.1")+def test_delete_consumergroups_with_errors(kafka_admin_client, kafka_consumer_factory, send_messages):+    send_messages(range(0, 100), partition=0)+    consumer1 = kafka_consumer_factory(group_id="test2-group1")+    next(consumer1)+    consumer1.close()++    consumer2 = kafka_consumer_factory(group_id="test2-group2")+    next(consumer2)++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test2-group1" in consumergroups+    assert "test2-group2" in consumergroups+    assert "test2-group3" not in consumergroups++    delete_results = {+        group_id: error+        for group_id, error in kafka_admin_client.delete_consumer_groups(["test2-group1", "test2-group2", "test2-group3"])+    }++    assert delete_results["test2-group1"] == NoError+    assert delete_results["test2-group2"] == NonEmptyGroupError+    assert delete_results["test2-group3"] == GroupIdNotFoundError++    consumergroups = {group_id for group_id, _ in kafka_admin_client.list_consumer_groups()}+    assert "test2-group1" not in consumergroups+    assert "test2-group2" in consumergroups+    assert "test2-group3" not in consumergroups

missing newline at end

swenzel

comment created time in a day

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentdpkp/kafka-python

Add delete records method support for kafka admin api

Can you add a basic unit test of this?

10101010

comment created time in a day

pull request commentdpkp/kafka-python

Avoid 100% CPU usage while socket is closed

Agreed, can you look at where this negative FD might be coming from? If you can consistently repro it, then the eBPF tools would probably make it a lot easier to track down...

Or if you have a way to consistently repro it in a test case, I'd be willing to take a look...

orange-kao

comment created time in a day

Pull request review commentdpkp/kafka-python

KIP-54: Implement sticky partition assignment strategy

 def metadata(self, topics):         pass      @abc.abstractmethod-    def on_assignment(self, assignment):+    def on_assignment(self, assignment, generation):

I haven't looked deeply into KIP-429, but if it also leverages generations (and I suspect it might given that it includes incremental shifts), then we may want to consider keeping this and bumping the kafka-python version to 3.x for the breaking change... I have a feeling it may make our lives easier down the road. But if they don't need it there, then agree it would be nicer not to add it just for the sticky assignor case...

aynroot

comment created time in a day

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

KIP-54: Implement sticky partition assignment strategy

 class ConsumerCoordinator(BaseCoordinator):         'enable_auto_commit': True,         'auto_commit_interval_ms': 5000,         'default_offset_commit_callback': None,-        'assignors': (RangePartitionAssignor, RoundRobinPartitionAssignor),+        'assignors': (RangePartitionAssignor, RoundRobinPartitionAssignor, StickyPartitionAssignor),

I looked into this a little more, and based on the work in KIP-429, it's probably not a great idea to set this as the preferred option...

But I do still support including it in the list of default options. Additionally, if this is joining a mixed-language consumer group, it's more likely that it'd be cross-compatible with the Java clients etc if we include this in the default list.

aynroot

comment created time in a day

PullRequestReviewEvent

pull request commentdpkp/kafka-python

KIP-54: Implement sticky partition assignment strategy

Does this also include the updates described in KIP-341?

If you ported the latest Java code, you should be fine, but if you based your implementation off the original KIP-54 it does have a bug that KIP-341 solves...

aynroot

comment created time in a day

Pull request review commentdpkp/kafka-python

Feature delete consumergroups

 def _find_coordinator_id(self, group_id):         response = future.value         return self._find_coordinator_id_process_response(response) +    def _find_many_coordinator_ids(self, group_ids):+        """Find the broker node_id of the coordinator for each of the given groups.++        Sends a FindCoordinatorRequest message to the cluster for each group_id.+        Will block until the FindCoordinatorResponse is received for all groups.+        Any errors are immediately raised.++        :param group_ids: A list of consumer group IDs. This is typically the group+            name as a string.+        :return: A list of tuples (group_id, node_id) where node_id is the id+            of the broker that is the coordinator for the corresponding group.+        """+        # Note: Java may change how this is implemented in KAFKA-6791.

This comment may be outdated, see https://github.com/apache/kafka/pull/4902#issuecomment-390830255

swenzel

comment created time in a day

PullRequestReviewEvent

push eventdpkp/kafka-python

Pedro Calleja

commit sha e485a6ee2a1f05f2333e22b0fbdbafb12badaf3f

Fix initialization order in KafkaClient (#2119) Fix initialization order in KafkaClient

view details

push time in a day

PR merged dpkp/kafka-python

Fix initialization order in KafkaClient

this resolve #2118

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+6 -3

0 comment

1 changed file

pecalleja

pr closed time in a day

issue closeddpkp/kafka-python

Weird error after bad producer initialization

Hi, I'm getting this weird error after the GC tries to clean the memory and trigger the __del__ method of KafkaClient. When the producer is not successfully initialized, the property _closed is not ready in the KafkaClient object

this snippet if to illustrate the point:

from kafka import KafkaProducer
import os
import gc

def test_bad_kafka_initialization():
    connect_str = os.getenv('BROKER_URL')
    try:
        producer = KafkaProducer(bootstrap_servers=connect_str)
    except Exception as e:
        gc.collect()
        assert isinstance(e, TypeError)
        raise e

if the BROKER_URL env variable is not defined is pretty normal raise a TypeError exception but after that, another ignored exception shows up :

Exception ignored in: <function KafkaClient.__del__ at 0x10ecc04c0>
Traceback (most recent call last):
  File "..../python3.8/site-packages/kafka/client_async.py", line 443, in __del__
  File "..../python3.8/site-packages/kafka/client_async.py", line 417, in _close
AttributeError: 'KafkaClient' object has no attribute '_closed'

this happens because in KafkaClient.__init__ are properties defined after too after all initialization:

 self.cluster = ClusterMetadata(**self.config)   -> here the initialization broke
 .....
 self._closed = False
 self._wake_r, self._wake_w = socket.socketpair()
 self._selector = self.config['selector']()

closed time in a day

pecalleja
PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Fix initialization order in KafkaClient

 def __init__(self, **configs):             if key in configs:                 self.config[key] = configs[key] +        # these properties need to be set on top of the initialization pipeline+        # because they are used when __del__ method is called+        self._closed = False+        self._wake_r, self._wake_w = socket.socketpair()+        self._selector = self.config['selector']()+

Keep this newline as part of keeping the "required initialization section" separate

pecalleja

comment created time in a day

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Fix initialization order in KafkaClient

 def __init__(self, **configs):             if key in configs:                 self.config[key] = configs[key] +        # these properties need to be set on top of the initialization pipeline+        # because they are used when __del__ method is called+        self._closed = False+        self._wake_r, self._wake_w = socket.socketpair()+        self._selector = self.config['selector']()+         self.cluster = ClusterMetadata(**self.config)         self._topics = set()  # empty set will fetch all topic metadata         self._metadata_refresh_in_progress = False-        self._selector = self.config['selector']()+         self._conns = Dict()  # object to support weakrefs         self._api_versions = None         self._connecting = set()         self._sending = set()         self._refresh_on_disconnects = True         self._last_bootstrap = 0         self._bootstrap_fails = 0-        self._wake_r, self._wake_w = socket.socketpair()+

Wouldn't you want to delete these new lines?

pecalleja

comment created time in a day

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Fix initialization order in KafkaClient

 def __init__(self, **configs):          self._selector.register(self._wake_r, selectors.EVENT_READ)         self._idle_expiry_manager = IdleConnectionManager(self.config['connections_max_idle_ms'])-        self._closed = False+

Wouldn't you want to delete these new lines?

pecalleja

comment created time in a day

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

Fix initialization order in KafkaClient

 def __init__(self, **configs):             if key in configs:                 self.config[key] = configs[key] +        # these properties need to be set on top of the initialization pipeline+        # because they are used when __del__ method is called+        self._closed = False+        self._wake_r, self._wake_w = socket.socketpair()+        self._selector = self.config['selector']()+         self.cluster = ClusterMetadata(**self.config)         self._topics = set()  # empty set will fetch all topic metadata         self._metadata_refresh_in_progress = False-        self._selector = self.config['selector']()+

Wouldn't you want to delete these new lines?

pecalleja

comment created time in a day

PullRequestReviewEvent

Pull request review commentdpkp/kafka-python

KIP-54: Implement sticky partition assignment strategy

 class ConsumerCoordinator(BaseCoordinator):         'enable_auto_commit': True,         'auto_commit_interval_ms': 5000,         'default_offset_commit_callback': None,-        'assignors': (RangePartitionAssignor, RoundRobinPartitionAssignor),+        'assignors': (RangePartitionAssignor, RoundRobinPartitionAssignor, StickyPartitionAssignor),

This assignment algorithm isn't just sticky, it's also better balanced than either of the other two... and it favors better balancing over stickiness... at least that's what I recall from when I helped drive that KIP forward.

So I think in that case, it'd be nice to add to the default list, and I'd even support moving it up the list to the primary default so that users get the most optimized balancing algorithm.

Be good to know what the Java client does here, I'd check but my wife just said dinner is ready so I need to roll...

aynroot

comment created time in a day

PullRequestReviewEvent
more