cwida/duckdb 2485
DuckDB is an in-process SQL OLAP Database Management System
MonetDBLite as a Python Package
MonetDB as a shared library with a C API
TPC-H Query 01 Implementation Optimized for CPU-GPU co-processing
Panther is an open-source, highly efficient text editor written from scratch in C++.
Repository with extra data for DuckDB benchmarking
Detecting Logic Bugs in DBMS
Mytherin/MonetDBLiteBenchmarks 2
Benchmarks for the paper MonetDBLite: An Embedded Analytical Database
Fork of cwida/duckdb
pull request commentcwida/duckdb
Parser clean up: no longer transform multi-node CASE and NULLIF in the transformer
Ha I think I may have been responsible for those in the first place. All that hard work ๐ญ. Great this is cleaned up!
comment created time in 11 hours
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Nice. Thank you, Mark.
comment created time in 12 hours
issue commentcwida/duckdb
Data conversion: Virtual table streaming and exporting with EXPORT DATABASE
Excellent. We missed that it is possible with COPY
but have found that all the info we needed is here. We can probably close this issue.
comment created time in 13 hours
pull request commentcwida/duckdb
Refactor and nested types support for Parquet Reader
Performance went down ~10% so need to investigate what's going on there.
comment created time in 17 hours
PR opened cwida/duckdb
This PR refactors and extends the Parquet reader. A major feature addition is the support for nested types in Parquet files, which are mapped to DuckDB's STRUCT
and LIST
types. Under the hood the Parquet reader now does zero-copy of strings, which should increase performance.
pr created time in a day
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Mark, thanks for explaining. To clarify the terminology, lambda in filter(array_column, x -> x > int_column)
has a single capture, e.g. int_column, and a single argument, e.g. x. In map_filter(m, k > 10 and v < 0)
we have a lambda with no captures and 2 arguments: k and v. Hence, I think we should rename capture_name above into something like argument_names and make it a vector, not a single value.
comment created time in a day
pull request commentcwida/duckdb
Add support for Lambda functions to parser
+optional type specification for lambda
comment created time in a day
pull request commentcwida/duckdb
Add support for Lambda functions to parser
Mark, this is great. To confirm, does this PR include support for multiple arguments for lambda, e.g. map_filter(m, (k, v) -> k > 10 AND v < 0)
and does it include support for captures, e.g. filter(array_column, x -> x > int_column)
?
comment created time in a day
pull request commentcwida/duckdb
Add support for Lambda functions to parser
CC @mbasmanova
comment created time in a day
pull request commentcwida/duckdb
Avoid using arithmetic on strings in dbgen (minor compilation fix)
Thank you, Mark.
comment created time in 2 days
issue commentcwida/duckdb
Data conversion: Virtual table streaming and exporting with EXPORT DATABASE
That is extremely interesting, but leads to some questions.
- I am assuming the above would be single-threaded, is that correct? Are there portions of the workload run in parallel?
- Could the "SELECT * FROM read_csv_auto('test.csv'" include extended sql syntax (join, filter, aggregate, expressions, order)?
comment created time in 2 days
issue openedcwida/duckdb
Data conversion: Virtual table streaming and exporting with EXPORT DATABASE
DuckDB is powerful as it can also write Parquet as well as read it among other formats. This naturally brings up a question that can DuckDB be used to stream data from source to destination while doing "local" transformations that do not require full table presence (like perhaps sorting without an index).
In our tests we have noticed (@jupiter) that DuckDB uses quite a lot memory (compared to e.g. nodejs streaming solution) when reading e.g. from CSV file (read_csv_auto
) and exporting it to Parquet without any transformations or sorting. Does this mean that source data size contributes/scales with DuckDB memory consumption? What if the CSV file would be "infinite", never ending file?
created time in 2 days
issue commentcwida/duckdb
regexp_matches() does not recognise "(?!"
Yes, I think RE2 is POSIX and differs from PCRE in this manner. Some negative lookahead regexps can be converted to posivite and adding NOT
on SQL level, so this helps us. But I think it would nicer to have PCRE supported.
comment created time in 2 days
issue commentcwida/duckdb
Syntax error for WITH RECURSIVE query
@lnkuiper if you look at this, consider cleaning up https://github.com/cwida/duckdb/tree/master/test/ldbc which has an old version of the LDBC queries.
comment created time in 2 days
pull request commentcwida/duckdb
Read-only mode and shutdown for R client
Hi, not sure if this is still relevant. I managed to properly shutdown a duckdb on Windows, so that I did not get error when reopening it (as in issue #323):
` library(duckdb)
FAILS
con <- dbConnect(duckdb(), dbdir="test.duckdb") dbWriteTable(con, "iris", iris, overwrite = TRUE) dbDisconnect(con) con <- dbConnect(duckdb(), dbdir="test.duckdb")
Fehler in initialize(value, ...) :
duckdb_startup_R: Failed to open database
REMEDIES FAILURE, THEN SUCCEDS
dbDisconnect(con, shutdown=TRUE)
Warnmeldung:
Connection already closed.
con <- dbConnect(duckdb(), dbdir="test.duckdb") dbWriteTable(con, "iris", iris, overwrite = TRUE) dbDisconnect(con, shutdown=TRUE)
SUCCEDS :)
con <- dbConnect(duckdb(), dbdir="test.duckdb") dbWriteTable(con, "iris", iris, overwrite = TRUE) dbDisconnect(con, shutdown=TRUE)
con <- dbConnect(duckdb(), dbdir="test.duckdb") ## no error dbWriteTable(con, "iris", iris, overwrite = TRUE) dbDisconnect(con, shutdown=TRUE) `
comment created time in 2 days
issue openedcwida/duckdb
regexp_matches() does not recognise "(?!"
We have this kind of regular expression, which works e.g. with NodeJS string.match(), but not with DuckDB.
regexp_matches(header_content_type, ' *(?![tT][eE][xX][tT]/[Hh][Tt][Mm][Ll]).*')
invalid perl operator: (?!
Is this a bug or a feature? :)
created time in 3 days
startedjasonge27/fastQuantile
started time in 3 days
issue closedcwida/duckdb
Auto Increment Primary Key And/or Serial
While Auto-Incrementing ideas are more useful, common, and idiomatic is an OLTP store, they can be very useful for tracking changesets (especially for caching) in OLAP analytical tasks. Towards that end, it would be great to have the ability to specify an AUTO INCREMENT policy on a column (or something more advanced like PostgreSQLs Serial flag). While It's easy enough to do this manually with a prior COUNT(*)
query, a write-lock, and bulk insert statements, the only way to add such a column when using a scanner/reader like read_csv
is to add a new column and manually UPDATE
into that column (thereby ~defeating the purpose of those fast import mechanisms). Thoughts?
closed time in 3 days
williumissue commentcwida/duckdb
In general, we should provide Android packages. It appears like different SDKs should be used to create those builds, and then there is some CPU-differences in resulting builds. If possible, integrate those binaries into the normal JDBC driver, but I have my doubts.
comment created time in 3 days
issue closedcwida/duckdb
I' m a C++ beginner. I try to run C++ example in CLion and debug it (Win10 64).
I get error like "-lduckdb failed" so I download the duckdb library from websites and put it in my MinGW directory.
Now CLion can find the library but not recognized.
"D:\JetBrains\CLion 2020.3.1\bin\cmake\win\bin\cmake.exe" --build D:\code\duckdb\examples\embedded-c++\cmake-build-debug-mingw --target all -- -j 6
[ 50%] Linking CXX executable example.exe
D:/mingw/mingw32/bin/../lib/gcc/i686-w64-mingw32/8.1.0/../../../../lib/duckdb.dll: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
mingw32-make.exe[2]: *** [CMakeFiles\example.dir\build.make:106: example.exe] Error 1
mingw32-make.exe[1]: *** [CMakeFiles\Makefile2:95: CMakeFiles/example.dir/all] Error 2
mingw32-make.exe: *** [Makefile:103: all] Error 2
Are there any suggestions to solve this problem?
closed time in 3 days
BowenXiao1999issue closedcwida/duckdb
Does DuckDB provide regression analysis functions?
closed time in 3 days
waynelapierrePR opened cwida/duckdb
This PR gives a (potentially over-complicated binding process, please check that) implementation of the filter clause for aggregates #896
pr created time in 3 days
Pull request review commentcwida/duckdb
Pre-filtering data in zonemaps and #1303
+#include "duckdb/execution/expression_executor.hpp"+#include "duckdb/optimizer/rule/in_clause_simplification.hpp"+#include "duckdb/planner/expression/list.hpp"+#include "duckdb/planner/expression/bound_operator_expression.hpp"++namespace duckdb {++InClauseSimplificationRule::InClauseSimplificationRule(ExpressionRewriter &rewriter) : Rule(rewriter) {+ // match on InClauseExpression that has a ConstantExpression as a check+ auto op = make_unique<InClauseExpressionMatcher>();+ op->policy = SetMatcher::Policy::SOME;+ root = move(op);+}++unique_ptr<Expression> InClauseSimplificationRule::Apply(LogicalOperator &op, vector<Expression *> &bindings,+ bool &changes_made) {+ D_ASSERT(bindings[0]->expression_class == ExpressionClass::BOUND_OPERATOR);+ auto expr = (BoundOperatorExpression *)bindings[0];+ if (expr->children[0]->expression_class != ExpressionClass::BOUND_CAST) {+ return nullptr;+ }+ auto cast_expression = (BoundCastExpression *)expr->children[0].get();+ if (cast_expression->child->expression_class != ExpressionClass::BOUND_COLUMN_REF) {+ return nullptr;+ }+ //! Here we check if we can apply the expression on the constant side+ auto target_type = cast_expression->source_type();+ if (!BoundCastExpression::CastIsInvertible(target_type, cast_expression->return_type)) {+ return nullptr;+ }+ for (size_t i{1}; i < expr->children.size(); i++) {+ if (expr->children[i]->expression_class != ExpressionClass::BOUND_CONSTANT) {+ return nullptr;+ }+ D_ASSERT(expr->children[i]->IsFoldable());+ auto constant_value = ExpressionExecutor::EvaluateScalar(*expr->children[i]);+ auto new_constant = constant_value.TryCastAs(target_type);+ if (new_constant) {+ //! We can cast, so we move the new constant+ auto new_constant_expr = make_unique<BoundConstantExpression>(constant_value);+ expr->children[i] = move(new_constant_expr);
good catch
comment created time in 3 days
issue commentcwida/duckdb
Auto Increment Primary Key And/or Serial
Certainly:
echo -e '42\n43\n44' > /tmp/dummy
COPY a(b) FROM '/tmp/dummy';
SELECT * FROM a;
โโโโโฌโโโโโ
โ i โ b โ
โโโโโผโโโโโค
โ 1 โ 42 โ
โ 2 โ 43 โ
โ 3 โ 44 โ
โโโโโดโโโโโ
comment created time in 4 days
issue commentcwida/duckdb
Auto Increment Primary Key And/or Serial
oh neat! is there any way to use this alongside read_csv/COPY?
comment created time in 4 days
issue commentcwida/duckdb
Return empty json array in case of no results returned
This is again the SQLite shell that does this, not DuckDB
comment created time in 4 days
issue commentcwida/duckdb
Two options, 1) pull those columns into R, and run lm
there. 2) Implement a recursive CTE that computes the fit.
comment created time in 4 days
issue commentcwida/duckdb
Auto Increment Primary Key And/or Serial
How about using a sequence? For example
CREATE SEQUENCE seq;
CREATE TABLE a (i INTEGER DEFAULT NEXTVAL('seq'), b INTEGER);
INSERT INTO a (b) VALUES (42), (43);
SELECT * FROM a;
Result:
โโโโโฌโโโโโ
โ i โ b โ
โโโโโผโโโโโค
โ 1 โ 42 โ
โ 2 โ 43 โ
โโโโโดโโโโโ
comment created time in 4 days
issue openedcwida/duckdb
Auto Increment Primary Key And/or Serial
While Auto-Incrementing ideas are more useful, common, and idiomatic is an OLTP store, they can be very useful for tracking changesets (especially for caching) in OLAP analytical tasks. Towards that end, it would be great to have the ability to specify an AUTO INCREMENT policy on a column (or something more advanced like PostgreSQLs Serial flag). While It's easy enough to do this manually with a prior COUNT(*)
query, a write-lock, and bulk insert statements, the only way to add such a column when using a scanner/reader like read_csv
is to add a new column and manually UPDATE
into that column (thereby ~defeating the purpose of those fast import mechanisms). Thoughts?
created time in 4 days