profile
viewpoint
Almog Gavra agavra Confluent Mountain View, California working @confluentinc to build a streaming db

agavra/ksql 1

KSQL - the Streaming SQL Engine for Apache Kafka

agavra/agavra.github.io 0

Personal Website for Almog Gavra

agavra/kafka 0

Mirror of Apache Kafka

agavra/schema-registry 0

Confluent Schema Registry for Kafka

issue openedconfluentinc/ksql

Time based version of `LATEST_BY_OFFSET`

LATEST_BY_OFFSET and EARLIEST_BY_OFFSET are useful UDAFs. However, they don't handle out-of-order data as, by design, they are offset based, not time based.

We need similar aggregate functions that take capture the latest value(s) by time, not offset.

We could have LATEST_BY_TIME, however, currently aggregate functions do no have implicit access to pseudo / system columns like ROWTIME, (though maybe they should!). So the user would need to pass in ROWTIME as a parameter, e.g.LATEST_BY_TIME(myColumn, ROWTIME). This would allow any time value for the second parameter.

Such a LATEST_BY_TIME method is really just tracking the myColumn with the highest ROWTIME. Such a method can be generalised. I think @agavra suggested an appropriate name for such a aggregate function elsewhere. Some kind of maxAgg function. A variant that tracks the max 'N' values, and earliest variants,\ should also be supported.

We'd need the UDAF framework enhanced to implement this: https://github.com/confluentinc/ksql/issues/5747

created time in 26 minutes

created tagconfluentinc/ksql

tagv6.1.0-135

The event streaming database purpose-built for stream processing applications

created time in 3 hours

created tagconfluentinc/ksql

tagv6.1.0-134

The event streaming database purpose-built for stream processing applications

created time in 4 hours

issue commentconfluentinc/ksql

Add NULLIF function

Hi, just raised a PR on this one, happy to be assigned. Thanks!

rmoff

comment created time in 8 hours

pull request commentconfluentinc/ksql

feat: added NULLIF function (#6567)

@confluentinc It looks like @fjbecerra just signed our Contributor License Agreement. :+1:

Always at your service,

clabot

fjbecerra

comment created time in 9 hours

PR opened confluentinc/ksql

Reviewers
feat: added NULLIF function (#6567)

Description

#6567

The NULLIF function returns NULL if and only if value1 and value2 are equal. Otherwise it returns value1.

Testing done

  • Engine. Unit test covering NULLIF function.
  • Functional-tests. Added test case in null.json and generated historial plan.

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #<issue number>")
+444 -0

0 comment

7 changed files

pr created time in 9 hours

PR opened confluentinc/ksql

feat: Move code from physical to logical plan for Projection

Description

What behavior do you want to change, why, how does your patch achieve the changes?

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #<issue number>")
+424 -261

0 comment

27 changed files

pr created time in 20 hours

created tagconfluentinc/ksql

tagv6.2.0-beta201126193631

The event streaming database purpose-built for stream processing applications

created time in a day

issue openedconfluentinc/ksql

KSQLDB rest API server unexpectedly closes connection with streaming client

Describe the bug Making a request to the REST API that ksql makes available allowing you to get a stream of data from ksql sometimes does not work, terminating with an error, RemoteDisconnected('Remote end closed connection without response') (This particular error message is got when using pythons urllib3 library. If you immediately retry the same request, it works the second time around

To Reproduce Steps to reproduce the behavior, include:

  1. The version of KSQL: KSQLDB 0.9.0
  2. Sample source data: Plain json
  3. Any SQL statements you ran: SELECT * FROM topic_name_table EMIT CHANGES;

Expected behavior I expect a response for the executed query, even if the underlying table is empty, it should (and does when it works) return a string describing the structure of subsequent results to come, if any

Actual behaviour A clear and concise description of what actually happens, including:

  1. CLI output: Not CLI related
  2. Error messages: RemoteDisconnected('Remote end closed connection without response')
  3. KSQL logs: ksql hangs on the following log message
[2020-11-26 14:28:17,905] INFO stream-thread [_confluent-ksql-cp-kafkatransient_8637031053715531820_1606400891369-1f312418-b661-4b1f-b28c-6531ff75de9b-StreamThread-1] Setting topic 'topic_name' to consume from earliest offset (org.apache.kafka.streams.processor.internals.StreamThread:907)

Additional context None

created time in a day

PR opened confluentinc/ksql

feat: optional `KAFKA_TOPIC`

Description

implements [KLIP-34]: #6065

This change makes KAFKA_TOPIC optional in the WITH clause of CREATE TABLE and CREATE STREAM statements, where the table/stream is being created on top of an existing topic, for example:

CREATE STREAM pageviews (
    page_id BIGINT,
    viewtime BIGINT,
    user_id VARCHAR
  );

The above statement creates a new stream called PAGEVIEWS. As no KAFKA_TOPIC property is specified, ksqlDB will attempt to find a matching existing and accessible topic in the Kafka cluster. By default, ksqlDB will match any topic with the same name as the source, ignoring case. Where multiple matches are found, any exact match is used or an error is returned.

Users can control this behaviour via the ksql.persistence.source.topic.naming.strategy configuration, with three options supplied out of the box:

  • CaseInsensitiveSourceTopicNamingStrategy (default): case-insensitive matching
  • CaseSensitiveSourceTopicNamingStrategy: exact name matching
  • ExplicitSourceTopicNamingStrategy: forces users to explicitly provide KAFKA_TOPIC in the statement.

Testing done

Usual.

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #<issue number>")
+2406 -117

0 comment

55 changed files

pr created time in a day

push eventconfluentinc/ksql

Andy Coates

commit sha 6e4e24949f1ff1d62780cd7cb7d09d19510d065c

Revert "test: historical plans" This reverts commit 9ae171eb6dbd421a4bf727472f021ba933354b94.

view details

Andy Coates

commit sha b465d73ce53c97a16962dfaba0d7d1f04060c07a

chore: revert unintentional change

view details

Andy Coates

commit sha 61bb0850149966a42abeaa3136a09598ae16efb9

Revert "test: test files" This reverts commit a0930274

view details

Andy Coates

commit sha a5fac03dddbd09b13e29007f7d957f56b1cde889

Revert "feat: optional `KAFKA_TOPIC`" This reverts commit 862c59e9

view details

push time in a day

push eventconfluentinc/ksql

Andy Coates

commit sha 862c59e9c7eaa3ed06a9e75055db3cebe0ba0d89

feat: optional `KAFKA_TOPIC` implements [KLIP-34]: https://github.com/confluentinc/ksql/pull/6065 This change makes `KAFKA_TOPIC` optional in the `WITH` clause of `CREATE TABLE` and `CREATE STREAM` statements, where the table/stream is being created on top of an existing topic, for example: ```sql CREATE STREAM pageviews ( page_id BIGINT, viewtime BIGINT, user_id VARCHAR ); ``` The above statement creates a new stream called `PAGEVIEWS`. As no `KAFKA_TOPIC` property is specified, ksqlDB will attempt to find a matching existing and accessible topic in the Kafka cluster. By default, ksqlDB will match any topic with the same name as the source, ignoring case. Where multiple matches are found, any exact match is used or an error is returned. Users can control this behaviour via the `ksql.persistence.source.topic.naming.strategy` configuration, with three options supplied out of the box: * `CaseInsensitiveSourceTopicNamingStrategy` (default): case-insensitive matching * `CaseSensitiveSourceTopicNamingStrategy`: exact name matching * `ExplicitSourceTopicNamingStrategy`: forces users to explicitly provide `KAFKA_TOPIC` in the statement.

view details

Andy Coates

commit sha a09302745b7a89c47de3f2adfc4caf120d80b0d4

test: test files

view details

Andy Coates

commit sha 9523f2bf289e5896a29db20ca12caca67abb1501

docs: doc updates

view details

Andy Coates

commit sha 9ae171eb6dbd421a4bf727472f021ba933354b94

test: historical plans

view details

push time in a day

created tagconfluentinc/ksql

tagv6.1.0-133

The event streaming database purpose-built for stream processing applications

created time in a day

created tagconfluentinc/ksql

tagv6.2.0-269

The event streaming database purpose-built for stream processing applications

created time in 2 days

push eventconfluentinc/ksql

Rohan

commit sha e3f750da7fbd7102f215c9e968039c5b7b9e78b3

fix: fix error categorization on NPE from streams (#6655) (#6668) * fix: fix error categorization on NPE from streams Fixes error categorization when streams exits due to an internal NPE - fixes the regex categorizer to correctly handle NPEs. NPEs have null descriptions, so this patch changes the categorizer to ignore the description if its null. - fixes the uncaught handler to categorize errors as UNKNOWN if the categorizer throws

view details

push time in 2 days

PR merged confluentinc/ksql

fix: fix error categorization on NPE from streams (#6655)

Fixes error categorization when streams exits due to an internal NPE

  • fixes the regex categorizer to correctly handle NPEs. NPEs have null descriptions, so this patch changes the categorizer to ignore the description if its null.
  • fixes the uncaught handler to categorize errors as UNKNOWN if the categorizer throws
+98 -9

0 comment

4 changed files

rodesai

pr closed time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-268

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-267

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.1.0-132

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-266

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.1.0-131

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-265

The event streaming database purpose-built for stream processing applications

created time in 2 days

issue commentconfluentinc/ksql

confluent-6.0.0 ksql cli throw an `Code has to be at least ad-hoc signed.` error on SELECT from TABLE statement

Can you try with java 8?

Hello @rodesai , thanks for the response and sorry for the late reply!

It works as expected with Java 8

 $> java -version
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.275-b01, mixed mode)

Also, it works without KSQL_OPTS="-Djava.io.tmpdir=/Users/bojanche/confluent-6.0.0/tmp/"!

Any idea whether is possible to make it work for Java >8?

Regards, Bojanche S.

bokjo

comment created time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-beta201125193552

The event streaming database purpose-built for stream processing applications

created time in 2 days

created tagconfluentinc/ksql

tagv6.2.0-264

The event streaming database purpose-built for stream processing applications

created time in 2 days

push eventconfluentinc/ksql

Andy Coates

commit sha 6c7a4381b1ad389fd8ab6955ca792fc0b1b3f60f

swap `Struct` key to `GenericKey` (#6667) * refactor: swap `Struct` key to `GenericKey` Part of the work to remove Connect `Struct` from the code base. This commit changes the type used to represent the key from Connect's `Struct` to a new `GenericKey` class, to match our existing `GenericRow` class for the value. `GenericKey`: * is a simple wrapper around a `List` of values. * is immutable * does not perform type validation. (`Struct` does, which is expensive and unnecessary). Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>

view details

push time in 2 days

PR merged confluentinc/ksql

swap `Struct` key to `GenericKey`

Description

Part of the work to remove Connect Struct from the code base. This commit changes the type used to represent the key from Connect's Struct to a new GenericKey class, to match our existing GenericRow class for the value.

Lot of files changed, but not much functionality.

GenericKey:

  • is a simple wrapper around a List of values, indexed by the position of the key column in the key schema.
  • is immutable, (keys should be immutable, but Struct is mutable)
  • does not perform type validation. (Struct does, which is expensive and unnecessary).

Reviewing notes.

Three main commits to make it easier to review:

  1. (88 files) First contains any files that simply change Struct with GenericKey. Not worth reviewing IMHO.
  2. (45 files) Second contains non-production files, mainly test, where there are slightly more changes, e.g. changing tests to create and expect GenericKey rather than Struct. Not worth reviewing IMHO.
  3. (27 files) Third contains the meat of the change, including associated test code changes. Review this!

Main changes of interest:

  1. GenericKey.
  2. GenericKeySerDe - changed to work with GenericKey rather than Struct.
  3. GenericSerializer & GenericDeserializer -> code pulled out from GenericRowSerde as its now common across key & value serde
  4. Changes in the ksqldb-streams module, which is where the key is manipulated.

Testing done

usual

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #<issue number>")
+1948 -1943

1 comment

161 changed files

big-andy-coates

pr closed time in 2 days

pull request commentconfluentinc/ksql

swap `Struct` key to `GenericKey`

Merging without review as the PR touches many files and will be a PITA to keep up to date with merges. Plus there's not actually much functionality change. Mainly a find-and-replace.

big-andy-coates

comment created time in 2 days

issue closedconfluentinc/ksql

Unable to execute CREATE OR REPLACE STREAM

Describe the bug My statement starting with CREATE OR REPLACE STREAM fails. If I use CREATE STREAM instead, it works correctly.

To Reproduce

  1. KSQL version 0.13.0
  2. Run statement similar to this example:
CREATE OR REPLACE STREAM teststream (connection BIGINT, value STRUCT<value BIGINT >) WITH ( kafka_topic = 'existing-topic', value_format = 'json');

The format of data within the topic or schema of a stream doesn't matter. If you remove OR REPLACE from the statement it will work.

Expected behavior I am able to use CREATE OR REPLACE syntax as described in the reference

Actual behaviour KSQL returns: line 1:8: no viable alternative at input 'CREATE OR'

closed time in 2 days

TomaszWegrzyn

issue commentconfluentinc/ksql

Unable to execute CREATE OR REPLACE STREAM

Turns out I was simply using incorrect version of ksql-server

TomaszWegrzyn

comment created time in 2 days

more