A tiny compiler for a small language written entirely in javascript.
A HTCPCP implementation using the nitcorn framework
itsWill/NitSyntaxHighlighter 1
Sublime Text 3 Package that adds syntax highlighting for the nit language
The song for rubyconf colombia 2019
Github mirror of Alembic
The blog I made for myself powered by Ruby on Rails
Repository for the ongoing development of the Brown-UMBC Reinforcement Learning And Planning (BURLAP) java library
A simple, fast, and fun package for building command line apps in Go
CoffeeScript adapter for the Rails asset pipeline. Also adds support for .coffee views.
push eventcwida/duckdb
commit sha 8f468d429ec11653202b2cb8dfe87bb01008f001
Add support for Lambda functions to parser
commit sha f8a702564d54682401ef3805d577298348724bba
Remove redundant blank line
commit sha d818a8b9abff27902bed93bfcedfb4d81d1e029e
Add support for lambda functions with multiple parameters
commit sha 2c8b106da531b1dff76d5e90efc33eb1eeeb7c4d
Merge branch 'master' into lambdas
commit sha 80d723ad404771c6dc9fa028af5a254b60165e43
Remove print and increase lambda operator precedence further so lambdas such as x -> x > 10 AND z < 20 are correctly parsed
commit sha 132ac405e5d966c83da86bb7d12a4ec6beb8fa02
Fix for single file compilation
commit sha e5a64a52f56083f989e2675a287d28f2f96adf69
Automatically replace generated calls to fprintf and exit in src_backend_parser_scan.cpp to avoid triggering R CRAN warnings
commit sha f79660c66b8d97e598e30390f6b638dd5ffd6ad2
Merge pull request #1313 from Mytherin/lambdas Add support for Lambda functions to parser
push time in 14 hours
PR merged cwida/duckdb
This PR adds basic support for lambda functions to the parser. They are not supported anywhere else yet and not bound yet, but the plan is to use them later on in functions that can apply to lists.
Lambda expressions look like this:
class LambdaExpression : public ParsedExpression {
public:
string capture_name;
unique_ptr<ParsedExpression> expression;
};
Example syntax:
SELECT map(i, x -> x + 1) FROM (VALUES (list_value(1, 2, 3))) tbl(i);
pr closed time in 14 hours
pull request commentcwida/duckdb
Parser clean up: no longer transform multi-node CASE and NULLIF in the transformer
Ha I think I may have been responsible for those in the first place. All that hard work 😭. Great this is cleaned up!
comment created time in 16 hours
PR opened cwida/duckdb
Previously a CASE statement with multiple WHEN ... THEN ...
nodes would be transformed into a chain of case statements in the transformer. In this PR we modify the case so that this step is only performed during binding. The case statement now looks like this after the transformer phase:
struct CaseCheck {
unique_ptr<ParsedExpression> when_expr;
unique_ptr<ParsedExpression> then_expr;
};
//! The CaseExpression represents a CASE expression in the query
class CaseExpression : public ParsedExpression {
vector<CaseCheck> case_checks;
unique_ptr<ParsedExpression> else_expr;
};
NULLIF(a, b)
used to be transformed into CASE WHEN a=b THEN NULL ELSE a
in the transformer phase. In this PR I have changed this to instead be transformed to a regular function call nullif(a, b)
and have created a macro that does this transformation during binding instead.
pr created time in 16 hours
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Nice. Thank you, Mark.
comment created time in 18 hours
pull request commentcwida/duckdb
Add support for Lambda functions to parser
All the changes are implemented now, lambda functions now look like this:
class LambdaExpression : public ParsedExpression {
vector<string> parameters;
unique_ptr<ParsedExpression> expression;
};
I also fixed several operator precedence rules so that lambda arrows take priority over other operators, which causes e.g. x -> x + 1 AND y + 1
to be correctly parsed as x + 1 AND y + 1
without requiring brackets.
select map(i, (x, y) -> x + y) from tbl;
-- lambda: parameters { x, y }, function: x + y
select map(i, x -> x + 1) from (values (list_value(1, 2, 3))) tbl(i);
-- lambda: parameters { x }, function: x + 1
select map(i, x -> x + 1 AND y + 1) from (values (list_value(1, 2, 3))) tbl(i);
-- lambda: parameters { x }, function: x -> x + 1 AND y + 1
comment created time in 18 hours
issue commentcwida/duckdb
Data conversion: Virtual table streaming and exporting with EXPORT DATABASE
Excellent. We missed that it is possible with COPY
but have found that all the info we needed is here. We can probably close this issue.
comment created time in 18 hours
pull request commentcwida/duckdb
Refactor and nested types support for Parquet Reader
Performance went down ~10% so need to investigate what's going on there.
comment created time in a day
PR opened cwida/duckdb
This PR refactors and extends the Parquet reader. A major feature addition is the support for nested types in Parquet files, which are mapped to DuckDB's STRUCT
and LIST
types. Under the hood the Parquet reader now does zero-copy of strings, which should increase performance.
pr created time in a day
pull request commentcwida/duckdb
Add support for Lambda functions to parser
That makes a lot of sense; will do. Thanks for the feedback!
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Mark, thanks for explaining. To clarify the terminology, lambda in filter(array_column, x -> x > int_column)
has a single capture, e.g. int_column, and a single argument, e.g. x. In map_filter(m, k > 10 and v < 0)
we have a lambda with no captures and 2 arguments: k and v. Hence, I think we should rename capture_name above into something like argument_names and make it a vector, not a single value.
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
+optional type specification for lambda
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
As for captures, the parser will not do anything besides transforming the expression. It is up to the binder to actually resolve columns. i.e. filter(array_column, x -> x > int_column)
will pass the parser just fine and generate a lambda expression containing the following:
capture_name: x
expression: `Comparison(Column(x), Column(int_column), GREATER_THAN)`
The binder is then in charge of resolving "x" back to the lambda, and "int_column" to another data source (e.g. a table).
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
It supports only single argument captures right now, e.g. filter(array_column, x -> x > column)
works, but map_filter(m, (k, v) -> k > 10 and v < 0)
does not. I can have a look at extending the lambdas to support multiple captures.
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Mark, this is great. To confirm, does this PR include support for multiple arguments for lambda, e.g. map_filter(m, (k, v) -> k > 10 AND v < 0)
and does it include support for captures, e.g. filter(array_column, x -> x > int_column)
?
comment created time in 2 days
pull request commentcwida/duckdb
Add support for Lambda functions to parser
CC @mbasmanova
comment created time in 2 days
PR opened cwida/duckdb
This PR adds basic support for lambda functions to the parser. They are not supported anywhere else yet and not bound yet, but the plan is to use them later on in functions that can apply to lists.
Lambda expressions look like this:
class LambdaExpression : public ParsedExpression {
public:
string capture_name;
unique_ptr<ParsedExpression> expression;
};
Example syntax:
SELECT map(i, x -> x + 1) FROM (VALUES (list_value(1, 2, 3))) tbl(i);
pr created time in 2 days
push eventcwida/duckdb
commit sha c7cd7bcee3b3c0213afbe1cb5c635507ef03d8e5
Avoid using arithmetic on strings in dbgen
commit sha d47baa52f1618bbf7a6f7dd97c0b658a959ffd72
Merge pull request #1312 from Mytherin/dbgenfix Avoid using arithmetic on strings in dbgen (minor compilation fix)
push time in 2 days
pull request commentcwida/duckdb
Avoid using arithmetic on strings in dbgen (minor compilation fix)
Thank you, Mark.
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
PhysicalPlanGenerator::ExtractAggregateExpressions(unique_ptr<PhysicalOperator> vector<unique_ptr<Expression>> expressions; vector<LogicalType> types; - for (idx_t group_idx = 0; group_idx < groups.size(); group_idx++) {- auto &group = groups[group_idx];+ for (auto &group : groups) { auto ref = make_unique<BoundReferenceExpression>(group->return_type, expressions.size()); types.push_back(group->return_type); expressions.push_back(move(group));- groups[group_idx] = move(ref);+ group = move(ref); } for (auto &aggr : aggregates) { auto &bound_aggr = (BoundAggregateExpression &)*aggr;- for (idx_t child_idx = 0; child_idx < bound_aggr.children.size(); child_idx++) {- auto &child = bound_aggr.children[child_idx];- auto ref = make_unique<BoundReferenceExpression>(child->return_type, expressions.size());- types.push_back(child->return_type);- expressions.push_back(move(child));- bound_aggr.children[child_idx] = move(ref);+ for (auto &child_ : bound_aggr.children) {+ bool already_in = false;+ for (size_t i = 0; i < expressions.size(); i++) {+ auto *base_expr = (BaseExpression *)expressions[i].get();+ if (child_->Equals(base_expr)) {
Is this necessary for correctness purposes; or just an optimization? Not that I disagree with adding this, just for clarification. I would like to move it to a function, use an expression_map_t
instead of a vector, and also use it for the bound_aggr.filter
.
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
+# name: test/sql/filter/test_filter_clause.test+# description: Test aggregation with filter clause
I would like some more test cases:
- Query with many different filter clauses (e.g. 5 aggregates, 5 different filters)
- Filter with some more complex aggregates:
COVAR_POP
(multiple input columns),STRING_AGG
(strings) andARRAY_AGG
(lists) - DISTINCT aggregates
Also; looking at these tests I would not be surprised if all of them use the perfect hash aggregate. You can force the regular hash aggregate to be used by using very spaced out groups (e.g. [0, 10000000, 20000000, ....]
).
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
void RemoveUnusedColumns::VisitOperator(LogicalOperator &op) { auto &aggr = (LogicalAggregate &)op; ClearUnusedExpressions(aggr.expressions, aggr.aggregate_index); - if (aggr.expressions.size() == 0 && aggr.groups.size() == 0) {- // removed all expressions from the aggregate: push a COUNT(*)- auto count_star_fun = CountStarFun::GetFunction();- aggr.expressions.push_back(- AggregateFunction::BindAggregateFunction(context, count_star_fun, {}, false));- }- }+ if (aggr.expressions.size() != 0 || aggr.groups.size() != 0) {
The change here to an empty if followed by an else doesn't make much sense. Perhaps a left-over from earlier code?
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
void PerfectAggregateHashTable::AddChunk(DataChunk &groups, DataChunk &payload) // after finding the group location we update the aggregates idx_t payload_idx = 0;- for (idx_t aggr_idx = 0; aggr_idx < aggregates.size(); aggr_idx++) {- auto &aggr = aggregates[aggr_idx];- auto input_count = (idx_t)aggr.child_count;- aggr.function.update(input_count == 0 ? nullptr : &payload.data[payload_idx], input_count, addresses,- payload.size());+ for (auto &aggregate : aggregates) {+ auto input_count = (idx_t)aggregate.child_count;+ if (aggregate.filter) {+ ExpressionExecutor filter_execution(aggregate.filter);+ SelectionVector true_sel(STANDARD_VECTOR_SIZE);
This seems like the exact same code as the regular AggregateHashtable. I would unify it with that code by using a static function (AggregateHashtable::UpdateAggregate(...)
).
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
class PhysicalHashAggregate : public PhysicalSink { //! Pointers to the aggregates vector<BoundAggregateExpression *> bindings; + //! Map between payload index and input index for filters+ unordered_map<Expression*,std::pair<bool,unordered_map<size_t,size_t>>> filter_map;
This seems overly complicated; why not just add the filter one after the regular payload of an aggregate?
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
string PhysicalPerfectHashAggregate::ParamsToString() const { result += groups[i]->GetName(); } for (idx_t i = 0; i < aggregates.size(); i++) {- if (i > 0 || groups.size() > 0) {+ if (i > 0 || !groups.empty()) { result += "\n"; } result += aggregates[i]->GetName();+ auto &aggregate = (BoundAggregateExpression &)*aggregates[i];+ if (aggregate.filter){+ result += aggregate.filter->GetName();
"FILTER " + ...
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
void PhysicalPerfectHashAggregate::Sink(ExecutionContext &context, GlobalOperato group_chunk.data[group_idx].Reference(input.data[bound_ref_expr.index]); } idx_t aggregate_input_idx = 0;- for (idx_t i = 0; i < aggregates.size(); i++) {- auto &aggr = (BoundAggregateExpression &)*aggregates[i];+ for (auto & aggregate : aggregates) {+ auto &aggr = (BoundAggregateExpression &)*aggregate; for (auto &child_expr : aggr.children) { D_ASSERT(child_expr->type == ExpressionType::BOUND_REF); auto &bound_ref_expr = (BoundReferenceExpression &)*child_expr; aggregate_input_chunk.data[aggregate_input_idx++].Reference(input.data[bound_ref_expr.index]); }+ if (aggr.filter) {+ vector<LogicalType> types;+ vector<vector<Expression *>> bound_refs;+ BoundAggregateExpression::GetColumnRef(aggr.filter.get(), bound_refs, types);
Same here; can't this just refer to the filter as computed in the projection above it?
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
void PhysicalSimpleAggregate::GetChunkInternal(ExecutionContext &context, DataCh string PhysicalSimpleAggregate::ParamsToString() const { string result; for (idx_t i = 0; i < aggregates.size(); i++) {+ auto &aggregate = (BoundAggregateExpression &)*aggregates[i]; if (i > 0) { result += "\n"; } result += aggregates[i]->GetName();+ if (aggregate.filter){+ result += aggregate.filter->GetName();
"FILTER " + ...
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
string PhysicalHashAggregate::ParamsToString() const { result += groups[i]->GetName(); } for (idx_t i = 0; i < aggregates.size(); i++) {- if (i > 0 || groups.size() > 0) {+ auto &aggregate = (BoundAggregateExpression &)*aggregates[i];+ if (i > 0 || !groups.empty()) { result += "\n"; } result += aggregates[i]->GetName();+ if (aggregate.filter) {+ result += aggregate.filter->GetName();
Maybe add "FILTER " before this to the output, to make it clear that this is a filter op (similar to how it appears in a SQL statement).
comment created time in 2 days
Pull request review commentcwida/duckdb
Implementing Filter Clause for aggregates
void PhysicalHashAggregate::Sink(ExecutionContext &context, GlobalOperatorState group_chunk.data[group_idx].Reference(input.data[bound_ref_expr.index]); } idx_t aggregate_input_idx = 0;- for (idx_t i = 0; i < aggregates.size(); i++) {- auto &aggr = (BoundAggregateExpression &)*aggregates[i];+ for (auto &aggregate : aggregates) {+ auto &aggr = (BoundAggregateExpression &)*aggregate; for (auto &child_expr : aggr.children) { D_ASSERT(child_expr->type == ExpressionType::BOUND_REF); auto &bound_ref_expr = (BoundReferenceExpression &)*child_expr; aggregate_input_chunk.data[aggregate_input_idx++].Reference(input.data[bound_ref_expr.index]); }+ if (aggr.filter) {+ vector<LogicalType> types;+ vector<vector<Expression *>> bound_refs;+ BoundAggregateExpression::GetColumnRef(aggr.filter.get(), bound_refs, types);
This seems very complicated; can't this just refer to the filter as computed in the projection above it?
comment created time in 2 days