profile
viewpoint
Uwe L. Korn xhochy @Quantco Karlsruhe http://uwekorn.com Data Engineering @Quantco; Apache Arrow and Parquet PMC, conda-forge core; PyData Südwest Organizer.

blue-yonder/turbodbc 434

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

conda/conda-pack 222

Package conda environments for redistribution

lorenzhs/ssssort 17

Super Scalar Sample Sort in modern C++

xhochy/altair-vue-vega-example 13

An example web app that display data using Altair, Vega and VueJS

lorenzhs/instavpn 10

ABANDONED :: Create and set up an instant VPN using DigitalOcean and sshuttle

lorenzhs/hubot-tell 8

We've moved! Head over to our repo on the organisation page:

leonhandreke/configdich 6

A configuration management system for medium to large-scale deployments of OpenWRT machines.

leonhandreke/turboolaf 5

Cutting-edge, hardware accelerated Olaf!

MichaelAquilina/SpamFilter 5

Classification of emails using machine learning and natural language processing techniques in Java

wesm/arrow 5

Mirror of Apache Arrow

PR opened apache/orc

ORC-739: Use Maven Wrapper in java/CMakeLists.txt

What changes were proposed in this pull request?

This PR aims to replace mvn with mvnw in java.CMakeLists.txt.

Why are the changes needed?

Some old OSes like CentOS7 may have the old Maven installation. Maven Wrapper will prevent build and test failures.

How was this patch tested?

Pass the CIs.

+3 -3

0 comment

1 changed file

pr created time in 17 minutes

Pull request review commentsnowflakedb/snowflake-connector-python

relax constraint on pyjwt version (fixes #586)

 def _get_arrow_lib_as_linker_input(self):         'pyOpenSSL>=16.2.0,<20.0.0',         'cffi>=1.9,<2.0.0',         'cryptography>=2.5.0,<4.0.0',-        'pyjwt<2.0.0',+        'pyjwt',

This is the specific reason why I have introduced PEP-517 build and I was hoping that having production grade build openly documented in our GitHub actions would further aid users of this group.

Can you link me to the docs on this? I wasn't aware of an alternative build of snowflake-connector-python that has looser constraints on dependencies.

I don't see an alternative wheel / sdist tarball at https://pypi.org/project/snowflake-connector-python/#files or on the latest release. That isn't mentioned in https://docs.snowflake.com/en/user-guide/python-connector-install.html or https://docs.snowflake.com/en/user-guide/python-connector-dependencies.html.

please add the next major pin

Sure, thanks for hearing me out. Added in https://github.com/snowflakedb/snowflake-connector-python/pull/604/commits/e8ff9a2d2dcec0f1dd75de2814b9665ab809808a

jameslamb

comment created time in 19 minutes

issue commentJuliaStrings/utf8proc

OSS-Fuzz integration

Sounds good!

randy408

comment created time in 37 minutes

push eventconda-forge/admin-migrations

cf-blacksmithy

commit sha 53974dc7160a249de41f75df29ff356e1e8cda6e

[ci skip] data for admin migration run

view details

push time in 41 minutes

Pull request review commentapache/arrow

ARROW-11270: [Rust] Array slice accessors

 impl<T: ArrowPrimitiveType> PrimitiveArray<T> {     }      /// Returns the primitive value at index `i`.-    ///-    /// Note this doesn't do any bound checking, for performance reason.-    /// # Safety-    /// caller must ensure that the passed in offset is less than the array len()+    #[inline]     pub fn value(&self, i: usize) -> T::Native {-        let offset = i + self.offset();-        unsafe { *self.raw_values.as_ptr().add(offset) }+        self.values()[i]

#9291 is good progress towards eliminating use of the function.

And certainly, we could 'split' the macro for primitives as a quick fix to get rid of the call to the function. I've been experimenting with an alternative approach that might be a bit more flexible to multiple use cases, described at the bottom of this comment.

I am quite torn about whether I think value should or should not be in the interface.

Reasons to drop value(i) -> T::Native

I think that even if value(i) was dropped from the PrimitiveArray impl's, efficient random access to items without a bounds check can still be achieved through unsafe{*primitive_array.values().get_unchecked(i)} (the extra * because get_unchecked() returns a ref to the value).

I'm not sure I have any example code or measurements to demonstrate it on hand, but I am certain I saw the silently-unsafe implementationx.values().iter().zip(y.values().iter()) did (slightly) outperform (0..x.len()).map(|i|{x.value(i),y.value(i)}. I believe it was when I was playing with non-simd arithmetic kernels.... So that is the root of my hesitancy, is I'm worried it doesn't actually escape any overhead, and unintentionally lead people away from a more reliable/performant way. IF there is a context where unsafe{x.value(i)} beats the performance of unsafe{*x.values().get_unchecked(i)}

Reasons to keep value(i) -> T::Native

All other array implementations have value functions as far as I recall, so it is a nice 'consistency'.

In the back of my mind, the biggest argument to keep value(i) is for api consistency... so long term, a 'trait' may be the place where it might fit best? Very roughly, I'm thinking:

trait TypedArrowArray : ArrowArray {
   type RefType;
   fn is_valid(i:usize) -> bool; //bounds check
   unsafe fn is_valid(i:usize) -> bool; //no bounds check
   fn value(i:usize) -> RefType;  //bounds check
   unsafe fn value_unchecked(i:usize) -> RefType; //no bounds checked
   fn iter() -> impl Iterator<Option<RefType>>;
   fn iter_values() -> impl Iterator<RefType>;
}
impl <T: ArrowPrimitiveType> TypedArrowArray<&T::Native> for PrimitiveArray<T> { ... }
impl TypedArrowArray<ArrayRef> for GenericListArray<T> { ... }
//and similar for string/binary. ... I am not sure whether struct arrays could fit... Dictionary would not give access to 'keys', only to the values referenced by each key?  Union would require some kind of RefType that can downcast into the actual value?

Of course, I am uncertain how much overhead the 'standarization' such a trait impl implies would bring... would any kernels actually benefit from using generic implementations against such an api, or will they always go down to the concrete type to squeeze little short-cuts out that don't fit in the generic interface? I'm unsure, so very (very, very) slowly experimenting...

Summary

So in short, my thoughts are:

  • I think that leaving value(i) safety consideration out of this PR makes sense. I've rebased to drop that out - although I did leave the additional values() test code.
  • Marking it unsafe in the near future is absolutely better than being silently-unsafe. The argument that adding bounds-checks could silently impact external users is reasonable, taking unsafe has the larger 'warning' so that the change isn't missed.
  • Longer term, the options of deprecating it, or explicitly moving it into an trait impl are both contenders in my mind... but neither option is directly relevant to this PR.

Let me know if that seems reasonable.

tyrelr

comment created time in 42 minutes

pull request commentconda-forge/staged-recipes

WIP: Add RStudio

The rebase didn't apply cleanly to 1.3.1073, had to rebase some patches manually. I suggest reviewing https://github.com/conda-forge/staged-recipes/pull/13760/commits/9c2d23d93c0296f5ab8098c4696ddf847a67e8a6 with whitespace changes ignored. Very likely a line-ending thing, which I can fix if absolutely necessary (but these are just the output of git format-patch tags/v1.3.1073; on windows, no less...)

h-vetinari

comment created time in an hour

pull request commentconda-forge/staged-recipes

Add cppbktree via greyskull

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/cppbktree) and found it was in an excellent condition.

asford

comment created time in an hour

PR opened conda-forge/staged-recipes

Add cppbktree via greyskull

<!-- Thank you very much for putting in this recipe PR!

This repository is very active, so if you need help with a PR or once it's ready for review, please let the right people know. There are language-specific teams for reviewing recipes.

Currently available teams are:

  • python @conda-forge/help-python
  • python/c hybrid @conda-forge/help-python-c
  • r @conda-forge/help-r
  • java @conda-forge/help-java
  • nodejs @conda-forge/help-nodejs
  • c/c++ @conda-forge/help-c-cpp
  • perl @conda-forge/help-perl
  • Julia @conda-forge/help-julia
  • ruby @conda-forge/help-ruby

If your PR doesn't fall into those categories please contact the full review team @conda-forge/staged-recipes.

Due to GitHub limitations first time contributors to conda-forge are unable to ping these teams. You can ping the team using a special command in a comment on the PR to get the attention of the staged-recipes team. You can also consider asking on our Gitter channel or on our Keybase chat if your recipe isn't reviewed promptly. -->

Checklist

  • [ ] Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml"
  • [ ] License file is packaged (see here for an example)
  • [ ] Source is from official source
  • [ ] Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged)
  • [ ] If static libraries are linked in, the license of the static library is packaged.
  • [ ] Build number is 0
  • [ ] A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details)
  • [ ] GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there
  • [ ] When in trouble, please check our knowledge base documentation before pinging a team.
+44 -0

0 comment

1 changed file

pr created time in an hour

Pull request review commentconda-forge/staged-recipes

WIP: Add RStudio

 source:       - patches/0039-Add-support-for-Qt-5.10.patch       - patches/0040-Revert-disable-macOS-specific-Cmd-Shift-handling-clo.patch       - patches/0041-Fix-missing-boost-placeholders.patch-      - patches/0042-Conda-We-use-a-shared-soci-library.patch

There was nothing about SOCI_LIBRARIES in 1.3.1073, particularly not in src/cpp/CMakeLists.txt. Skipped this patch.

h-vetinari

comment created time in an hour

pull request commentconda-forge/staged-recipes

Adding whiteboxgui recipe

@conda-forge/help-python Can someone help with the Linux build error? Python 3.9 comes with tkinter, but it throws an ImportError: libX11.so.6: cannot open shared object file: No such file or directory. Thanks.

import: 'whiteboxgui'
Traceback (most recent call last):
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/test_tmp/run_test.py", line 2, in <module>
    import whiteboxgui
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.9/site-packages/whiteboxgui/__init__.py", line 7, in <module>
    from .whiteboxgui import show
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.9/site-packages/whiteboxgui/whiteboxgui.py", line 8, in <module>
    import whitebox
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.9/site-packages/whitebox/__init__.py", line 10, in <module>
    from .wb_runner import Runner
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.9/site-packages/whitebox/wb_runner.py", line 24, in <module>
    import tkinter as tk
  File "/home/conda/staged-recipes/build_artifacts/whiteboxgui_1611356481727/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.9/tkinter/__init__.py", line 37, in <module>
    import _tkinter # If this fails your Python may not be configured for Tk
ImportError: libX11.so.6: cannot open shared object file: No such file or directory
Tests failed for whiteboxgui-0.1.2-pyh44b312d_0.tar.bz2 - moving package to /home/conda/staged-recipes/build_artifacts/broken
WARNING conda_build.build:tests_failed(2955): Tests failed for whiteboxgui-0.1.2-pyh44b312d_0.tar.bz2 - moving package to /home/conda/staged-recipes/build_artifacts/broken
TESTS FAILED: whiteboxgui-0.1.2-pyh44b312d_0.tar.bz2
giswqs

comment created time in 2 hours

pull request commentconda-forge/dask_labextension-feedstock

dask_labextension v5.0.0

Thank you @ocefpaf!

regro-cf-autotick-bot

comment created time in 2 hours

push eventsnowflakedb/snowflake-connector-python

Sophie Tan

commit sha e5e4979ca373fb6a639f5981e63e04c3aea18c0b

SNOW 266042 add logging for request id guid and query (#606) Co-authored-by: Mark Keller <63477823+sfc-gh-mkeller@users.noreply.github.com>

view details

push time in 2 hours

delete branch snowflakedb/snowflake-connector-python

delete branch : SNOW-266042-add-logging-for-request-id-guid-and-query-id

delete time in 2 hours

PR merged snowflakedb/snowflake-connector-python

Reviewers
SNOW 266042 add logging for request id guid and query

SNOW-266042 Description Added logging for Query ID before ping-pong of query result retrieval. This made debugging easier for the case when query result retrieval fails after a query succeeded. Added logging for Request ID and Request GUID.

+22 -19

0 comment

4 changed files

sfc-gh-stan

pr closed time in 2 hours

issue commentapache/iceberg

Rewrite metrics during schema transformation

So I'm thinking that rather than rewrite the metrics during schema transformation, when the schema is loaded we convert the from column names to references. What do folks think?

holdenk

comment created time in 2 hours

push eventconda-forge/admin-migrations

cf-blacksmithy

commit sha 4721c7b31b8764b2869945834133f063e62b1f3d

[ci skip] data for admin migration run

view details

push time in 2 hours

push eventdask/dask

James Bourbeau

commit sha 0b33708f4627c3f9c7613c9554d1033db147d4b0

Add cytoolz back to CI environment (#7103)

view details

push time in 3 hours

PR merged dask/dask

Add cytoolz back to CI environment

Following up on https://github.com/dask/dask/pull/7069, now that https://github.com/conda-forge/cytoolz-feedstock/issues/36 has been resolved we can add cytoolz back to the CI environment.

  • [ ] Tests added / passed
  • [ ] Passes black dask / flake8 dask
+0 -5

0 comment

1 changed file

jrbourbeau

pr closed time in 3 hours

PR opened apache/iceberg

Bump ORC to 1.6.7
+1 -1

0 comment

1 changed file

pr created time in 3 hours

push eventconda-forge/admin-migrations

cf-blacksmithy

commit sha 9eac61940484111e1c5ba94f4c069d4431f17c54

[ci skip] data for admin migration run

view details

push time in 3 hours

pull request commentsnowflakedb/snowflake-connector-python

SNOW-232777 Subclass-ed HTTPAdapter to add the header to proxy headers

Can we add a small test for this? Have you manually tested this?

This has been manually tested.

sfc-gh-hchaturvedi

comment created time in 3 hours

push eventsnowflakedb/snowflake-connector-python

sfc-gh-hchaturvedi

commit sha 363bfd03955062c89636f2d2d9add8a5db8696ee

Addressed comments on using vendored versions

view details

push time in 3 hours

pull request commentconda-forge/staged-recipes

add bpytop

Sorry for the mistake.

The win build failed, since the upstream doesn't support Windows (it calls external dependencies.) But this is noarch. Is it necessary to add skip: true # [win] in this case?

Thanks.

ickc

comment created time in 3 hours

Pull request review commentsnowflakedb/snowflake-connector-python

relax constraint on pyjwt version (fixes #586)

 def _get_arrow_lib_as_linker_input(self):         'pyOpenSSL>=16.2.0,<20.0.0',         'cffi>=1.9,<2.0.0',         'cryptography>=2.5.0,<4.0.0',-        'pyjwt<2.0.0',+        'pyjwt',

I agree with you James and this is something that many customers have been asking for (there are many other issues open with this request already).

Short answer of why this is that historically our library has been treated as an application, so now many of our customers demand that we minimize the possibility of errors, but this is only possible if we control dependency versions.

I wasn't here when this decision was made, so I can't comment on the specifics of why, but if I had to take a guess our library broke way too often. I have definitely seen this myself many times. For example this new pyjwt release had the possibility of breaking customer deployments as you found. Not once did we have to get online over the weekends to pin against certain newly released dependencies and do an unscheduled release of the connector to fix things. Over the years this lead to the state we are in now. I'm trying to make a better effort to keep things up to date, but I'd lying if I said that I'm on top of everything, this is why we introduced dependabot. It kicks off tests every dependency bump before actually moving our pin.

My other view is that if there are people that would like our library with less tightly coupled dependencies, they can really easily build it for themselves. This is the specific reason why I have introduced PEP-517 build and I was hoping that having production grade build openly documented in our GitHub actions would further aid users of this group.

I hope that this insight makes sense to you and that you find it to make at least some sense, that being said please add the next major pin, we should test at least breaking public API changes.

jameslamb

comment created time in 3 hours

PR opened dask/dask

Add cytoolz back to CI environment

Following up on https://github.com/dask/dask/pull/7069, now that https://github.com/conda-forge/cytoolz-feedstock/issues/36 has been resolved we can add cytoolz back to the CI environment.

  • [ ] Tests added / passed
  • [ ] Passes black dask / flake8 dask
+0 -5

0 comment

1 changed file

pr created time in 3 hours

pull request commentapache/arrow

ARROW-11350: [C++] Bump dependency versions

@kszucs that didn't trigger a Travis build for some reason

nealrichardson

comment created time in 3 hours

pull request commentapache/arrow

ARROW-11340: [C++] Add vcpkg.json manifest to cpp project root

FYI, the version of vcpkg that is currently preinstalled on the Github Actions Windows images is 2020.11.12 (as noted here). This version has a bug (https://github.com/awslabs/aws-c-common/issues/734) that causes the installation of aws-cpp-sdk to fail. When running vcpkg in Github Actions on Windows, remove the preinstalled vcpkg and install the newest version from source.

ianmcook

comment created time in 3 hours

issue commentconda-forge/poppler-feedstock

Empty file while creating Tiff-files from PDF

Yes. Tested with 21.01.0 and the aforementioned PDF and got an empty tiff.

oschwartz10612

comment created time in 3 hours

pull request commentconda-forge/staged-recipes

Add Yaspin

@dopplershift, sorry to bother you, I think this is the same case as in https://github.com/conda-forge/staged-recipes/pull/13527#pullrequestreview-574572704 , could you help merging this? Thanks.

ickc

comment created time in 3 hours

Pull request review commentsnowflakedb/snowflake-connector-python

SNOW-232777 Subclass-ed HTTPAdapter to add the header to proxy headers

 include_trailing_comma = True force_grid_wrap = 0 line_length = 120 known_first_party =snowflake,parameters,generate_test_files-known_third_party =Cryptodome,OpenSSL,asn1crypto,azure,boto3,botocore,certifi,chardet,cryptography,dateutil,idna,jwt,mock,ntlm,numpy,pendulum,pkg_resources,pytest,pytz,requests,setuptools+known_third_party =Cryptodome,OpenSSL,asn1crypto,azure,boto3,botocore,certifi,chardet,cryptography,dateutil,idna,jwt,mock,ntlm,numpy,pendulum,pkg_resources,pytest,pytz,requests,setuptools,urllib3

This line shouldn't change without adding urllib3 as a direct dependency of ours.

sfc-gh-hchaturvedi

comment created time in 3 hours

more