profile
viewpoint
Guodong Jin ray6080 @dbiir Beijing, China http://iir.ruc.edu.cn/~guodong/ I am a Ph.D. candidate in the @dbiir Lab at Renmin University of China (RUC) working in database systems.

dbiir/rainbow 63

A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage

pixelsdb/pixels 3

A storage engine designed and optimized for big data analysis, especially on wide tables.

dbiir/mvn-repo 1

The maven repository of our group.

ray6080/duckdb 1

DuckDB is an embeddable SQL OLAP Database Management System

hagen666/pard 0

Parallel Database Running like a Leopard

Jan-zou/HackingPython 0

Reading Python 2.7.12 source code.

ray6080/6.824 0

https://pdos.csail.mit.edu/6.824/schedule.html

ray6080/agensgraph 0

AgensGraph, a transactional graph database based on PostgreSQL

ray6080/aircompressor 0

A port of Snappy, LZO and LZ4 to Java

ray6080/alluxio 0

Alluxio, formerly Tachyon, Memory Speed Virtual Distributed Storage System

push eventpostgres/postgres

Tom Lane

commit sha 0ab177bec1c546d2b98b5934a817b23bcc335ca9

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha 188cd4f440ed6bb2b3120ade9a2277c91d79215c

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha 6253159965d563dd0e416e064b760e381a60b8e8

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha 794562d0770ae0ba4096c57c116e80a2be043fbf

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha a0efda88a679edaee9855628cb05b2ab00d80a15

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha fac54bd5e216c18d921b7ba18b30e8f8139034b6

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

push eventpostgres/postgres

Tom Lane

commit sha fe8edbb8267adf24ba3b392ac6229b96c4287f93

Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us

view details

push time in 5 minutes

issue commentdbiir/UER-py

想完成“一对短语是否意思相同”这样的二分类,不知道怎样的方案好?

如果是对单个句子进行监督的二分类训练,那还是比较容易理解的。但是,我现在要分类的是“一对句子之间是否意思相同”。 感觉好像可以把两个句子拼合成一个句子,然后对这个合成句子进行二分类。但是,我不确定怎样的拼合方法是模型所能支持的。 我是否可以简单地用分号把两个短语连成一个句子,然后就这么标注?

label text

0 张三的高端电脑;张三的低端电脑

0 张三的高端电脑;张八的高端电脑

1 张三的高端电脑;张三的高端的电脑

1 张三的高端电脑;张三的高端计算机

标签0表示“意思不同”,即关键信息有区别 标签1表示“意思相同”,即关键信息没有区别 另外,还搜索到叫句子嵌入(Sentence Embedding)这类无监督方法,不知道二者进行句子嵌入再比较相似度是否可行?这方面UER是否有可以用的组件? 试试这个? https://www.sbert.net/examples/applications/cross-encoder/README.html

谢谢,我也考察一下这个

srhouyu

comment created time in an hour

issue commentdbiir/UER-py

想完成“一对短语是否意思相同”这样的二分类,不知道怎样的方案好?

您好,run_classifier.py支持文本对分类 数据集文件需要3列,label text_a text_b ,用\t分隔

具体可以参考 页面下的 https://github.com/dbiir/UER-py/wiki/%E4%B8%8B%E6%B8%B8%E4%BB%BB%E5%8A%A1%E5%BE%AE%E8%B0%83 run_classifier.py 文本对分类使用示例 其对lcqmc文本对分类数据集进行了分类 跑通lcqmc的例子,然后换成自己的数据集就可以了

谢谢,这个恰好跟我的需求一样

srhouyu

comment created time in 2 hours

Pull request review commentcwida/duckdb

Pre-filtering data in zonemaps and #1303

+#include "duckdb/execution/expression_executor.hpp"+#include "duckdb/optimizer/rule/in_clause_simplification.hpp"+#include "duckdb/planner/expression/list.hpp"+#include "duckdb/planner/expression/bound_operator_expression.hpp"++namespace duckdb {++InClauseSimplificationRule::InClauseSimplificationRule(ExpressionRewriter &rewriter) : Rule(rewriter) {+	// match on InClauseExpression that has a ConstantExpression as a check+	auto op = make_unique<InClauseExpressionMatcher>();+	op->policy = SetMatcher::Policy::SOME;+	root = move(op);+}++unique_ptr<Expression> InClauseSimplificationRule::Apply(LogicalOperator &op, vector<Expression *> &bindings,+                                                         bool &changes_made) {+	D_ASSERT(bindings[0]->expression_class == ExpressionClass::BOUND_OPERATOR);+	auto expr = (BoundOperatorExpression *)bindings[0];+	if (expr->children[0]->expression_class != ExpressionClass::BOUND_CAST) {+		return nullptr;+	}+	auto cast_expression = (BoundCastExpression *)expr->children[0].get();+	if (cast_expression->child->expression_class != ExpressionClass::BOUND_COLUMN_REF) {+		return nullptr;+	}+	//! Here we check if we can apply the expression on the constant side+	auto target_type = cast_expression->source_type();+	if (!BoundCastExpression::CastIsInvertible(target_type, cast_expression->return_type)) {+		return nullptr;+	}+	for (size_t i{1}; i < expr->children.size(); i++) {+		if (expr->children[i]->expression_class != ExpressionClass::BOUND_CONSTANT) {+			return nullptr;+		}+		D_ASSERT(expr->children[i]->IsFoldable());+		auto constant_value = ExpressionExecutor::EvaluateScalar(*expr->children[i]);+		auto new_constant = constant_value.TryCastAs(target_type);+		if (new_constant) {+			//! We can cast, so we move the new constant+			auto new_constant_expr = make_unique<BoundConstantExpression>(constant_value);+			expr->children[i] = move(new_constant_expr);

good catch

pdet

comment created time in 3 hours

startedalibaba/libvineyard

started time in 5 hours

startedmars-project/mars

started time in 5 hours

startedpopey/sosumi-snap

started time in 6 hours

push eventpostgres/postgres

Peter Eisentraut

commit sha f18aa1b203930ed28cfe42e82d3418ae6277576d

pageinspect: Change block number arguments to bigint Block numbers are 32-bit unsigned integers. Therefore, the smallest SQL integer type that they can fit in is bigint. However, in the pageinspect module, most input and output parameters dealing with block numbers were declared as int. The behavior with block numbers larger than a signed 32-bit integer was therefore dubious. Change these arguments to type bigint and add some more explicit error checking on the block range. (Other contrib modules appear to do this correctly already.) Since we are changing argument types of existing functions, in order to not misbehave if the binary is updated before the extension is updated, we need to create new C symbols for the entry points, similar to how it's done in other extensions as well. Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local

view details

push time in 8 hours

Pull request review commentcwida/duckdb

Pre-filtering data in zonemaps and #1303

+#include "duckdb/execution/expression_executor.hpp"+#include "duckdb/optimizer/rule/in_clause_simplification.hpp"+#include "duckdb/planner/expression/list.hpp"+#include "duckdb/planner/expression/bound_operator_expression.hpp"++namespace duckdb {++InClauseSimplificationRule::InClauseSimplificationRule(ExpressionRewriter &rewriter) : Rule(rewriter) {+	// match on InClauseExpression that has a ConstantExpression as a check+	auto op = make_unique<InClauseExpressionMatcher>();+	op->policy = SetMatcher::Policy::SOME;+	root = move(op);+}++unique_ptr<Expression> InClauseSimplificationRule::Apply(LogicalOperator &op, vector<Expression *> &bindings,+                                                         bool &changes_made) {+	D_ASSERT(bindings[0]->expression_class == ExpressionClass::BOUND_OPERATOR);+	auto expr = (BoundOperatorExpression *)bindings[0];+	if (expr->children[0]->expression_class != ExpressionClass::BOUND_CAST) {+		return nullptr;+	}+	auto cast_expression = (BoundCastExpression *)expr->children[0].get();+	if (cast_expression->child->expression_class != ExpressionClass::BOUND_COLUMN_REF) {+		return nullptr;+	}+	//! Here we check if we can apply the expression on the constant side+	auto target_type = cast_expression->source_type();+	if (!BoundCastExpression::CastIsInvertible(target_type, cast_expression->return_type)) {+		return nullptr;+	}+	for (size_t i{1}; i < expr->children.size(); i++) {

size_t i = 1 please

pdet

comment created time in 9 hours

Pull request review commentcwida/duckdb

Pre-filtering data in zonemaps and #1303

+#include "duckdb/execution/expression_executor.hpp"+#include "duckdb/optimizer/rule/in_clause_simplification.hpp"+#include "duckdb/planner/expression/list.hpp"+#include "duckdb/planner/expression/bound_operator_expression.hpp"++namespace duckdb {++InClauseSimplificationRule::InClauseSimplificationRule(ExpressionRewriter &rewriter) : Rule(rewriter) {+	// match on InClauseExpression that has a ConstantExpression as a check+	auto op = make_unique<InClauseExpressionMatcher>();+	op->policy = SetMatcher::Policy::SOME;+	root = move(op);+}++unique_ptr<Expression> InClauseSimplificationRule::Apply(LogicalOperator &op, vector<Expression *> &bindings,+                                                         bool &changes_made) {+	D_ASSERT(bindings[0]->expression_class == ExpressionClass::BOUND_OPERATOR);+	auto expr = (BoundOperatorExpression *)bindings[0];+	if (expr->children[0]->expression_class != ExpressionClass::BOUND_CAST) {+		return nullptr;+	}+	auto cast_expression = (BoundCastExpression *)expr->children[0].get();+	if (cast_expression->child->expression_class != ExpressionClass::BOUND_COLUMN_REF) {+		return nullptr;+	}+	//! Here we check if we can apply the expression on the constant side+	auto target_type = cast_expression->source_type();+	if (!BoundCastExpression::CastIsInvertible(target_type, cast_expression->return_type)) {+		return nullptr;+	}+	for (size_t i{1}; i < expr->children.size(); i++) {+		if (expr->children[i]->expression_class != ExpressionClass::BOUND_CONSTANT) {+			return nullptr;+		}+		D_ASSERT(expr->children[i]->IsFoldable());+		auto constant_value = ExpressionExecutor::EvaluateScalar(*expr->children[i]);+		auto new_constant = constant_value.TryCastAs(target_type);+		if (new_constant) {+			//! We can cast, so we move the new constant+			auto new_constant_expr = make_unique<BoundConstantExpression>(constant_value);+			expr->children[i] = move(new_constant_expr);

Shouldn't we first check if all children can be cast before actually modifying the IN operator? What if we have e.g.

SELECT x::VARCHAR IN ('1', y) FROM (VALUES (1, 2), (2, 3)) tbl(x, y);

The first element contains an invertible cast ('1' -> 1), but the second element is not invertible. Could you add a test that verifies this does not give problems?

pdet

comment created time in 9 hours

Pull request review commentcwida/duckdb

Pre-filtering data in zonemaps and #1303

 FilterPropagateResult StatisticsPropagator::PropagateComparison(BaseStatistics & 	default: 		return FilterPropagateResult::NO_PRUNING_POSSIBLE; 	}+	switch (right.type.InternalType()) {

Any reason for adding this check? The left and right type should be identical no?

pdet

comment created time in 9 hours

issue commentcwida/duckdb

Auto Increment Primary Key And/or Serial

Certainly:

echo -e '42\n43\n44' > /tmp/dummy
COPY a(b) FROM '/tmp/dummy';
SELECT * FROM a;
┌───┬────┐
│ i │ b  │
├───┼────┤
│ 1 │ 42 │
│ 2 │ 43 │
│ 3 │ 44 │
└───┴────┘
willium

comment created time in 9 hours

issue commentcwida/duckdb

Auto Increment Primary Key And/or Serial

oh neat! is there any way to use this alongside read_csv/COPY?

willium

comment created time in 10 hours

create barnchdbiir/UER-py

branch : t5_dev

created branch time in 10 hours

delete branch dbiir/UER-py

delete branch : t5_dev

delete time in 10 hours

issue commentcwida/duckdb

Return empty json array in case of no results returned

This is again the SQLite shell that does this, not DuckDB

burtgulash

comment created time in 10 hours

issue commentcwida/duckdb

Regression Analysis

Two options, 1) pull those columns into R, and run lm there. 2) Implement a recursive CTE that computes the fit.

waynelapierre

comment created time in 10 hours

issue commentcwida/duckdb

Auto Increment Primary Key And/or Serial

How about using a sequence? For example

CREATE SEQUENCE seq;
CREATE TABLE a (i INTEGER DEFAULT NEXTVAL('seq'), b INTEGER);
INSERT INTO a (b) VALUES (42), (43);
SELECT * FROM a;

Result:

┌───┬────┐
│ i │ b  │
├───┼────┤
│ 1 │ 42 │
│ 2 │ 43 │
└───┴────┘
willium

comment created time in 10 hours

push eventpostgres/postgres

Fujii Masao

commit sha ee79a548e746da9a99df0cac70a3ddc095f2829a

doc: Add note about the server name of postgres_fdw_get_connections() returns. Previously the document didn't mention the case where postgres_fdw_get_connections() returns NULL in server_name column. Users might be confused about why NULL was returned. This commit adds the note that, in postgres_fdw_get_connections(), the server name of an invalid connection will be NULL if the server is dropped. Suggested-by: Zhijie Hou Author: Bharath Rupireddy Reviewed-by: Zhijie Hou, Fujii Masao Discussion: https://postgr.es/m/e7ddd14e96444fce88e47a709c196537@G08CNEXMBPEKD05.g08.fujitsu.local

view details

push time in 12 hours

issue openedcwida/duckdb

Auto Increment Primary Key And/or Serial

While Auto-Incrementing ideas are more useful, common, and idiomatic is an OLTP store, they can be very useful for tracking changesets (especially for caching) in OLAP analytical tasks. Towards that end, it would be great to have the ability to specify an AUTO INCREMENT policy on a column (or something more advanced like PostgreSQLs Serial flag). While It's easy enough to do this manually with a prior COUNT(*) query, a write-lock, and bulk insert statements, the only way to add such a column when using a scanner/reader like read_csv is to add a new column and manually UPDATE into that column (thereby ~defeating the purpose of those fast import mechanisms). Thoughts?

created time in 13 hours

push eventpostgres/postgres

Amit Kapila

commit sha ed43677e20369040ca4e50c698010c39d5ac0f47

pgindent worker.c. This is a leftover from commit 0926e96c49. Changing this separately because this file is being modified for upcoming patch logical replication of 2PC. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com

view details

push time in 16 hours

issue commentcwida/duckdb

mavecentral java package failed in android

It looks like the version of Java might be too old? But even if you got past that, the binary inside the jar isn't compiled for ARM so it wouldn't work anyway

Grufy

comment created time in 16 hours

issue openedcwida/duckdb

mavecentral java package failed in android

I try adopt duckdb into android via mavecentral, compilation is fine but hit error on runtime

Able to provide fix in mavecentral gradle plugin?

Module gradle dependencies { implementation 'org.duckdb:duckdb_jdbc:0.2.3' }

Example java url = "jdbc:duckdb:/sdcard/app/test.db"; (DuckDBConnection) DriverManager.getConnection(url);

Error java.lang.NoClassDefFoundError: Failed resolution of: [Ljava/nio/file/attribute/FileAttribute; at org.duckdb.DuckDBNative.<clinit>(DuckDBNative.java:32) at org.duckdb.DuckDBDatabase.<init>(DuckDBDatabase.java:22) at org.duckdb.DuckDBDriver.connect(DuckDBDriver.java:35)

created time in 16 hours

issue commentdbiir/UER-py

源码在trainer里面构造evaluate统计量能不能独立成配置类?

若干run脚本也是一样的。

svjack

comment created time in 17 hours

more