profile
viewpoint
Maximilian Roos max-sixty California

max-sixty/pytest-accept 1

quick poc of an inline expect test plugin for pytest

jab/pandas 0

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

max-sixty/argo 0

Argo Workflows: Get stuff done with Kubernetes.

max-sixty/arrow 0

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.

max-sixty/ballista 0

PoC of distributed compute platform using Rust, Apache Arrow, and Kubernetes!

max-sixty/beam 0

Mirror of Apache Beam

max-sixty/broot 0

A new way to see and navigate directory trees : https://dystroy.org/broot

max-sixty/dbt 0

dbt (data build tool) enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

max-sixty/edition-guide 0

A guide to changes between various editions of Rust

Pull request review commentpydata/xarray

rolling keep_attrs & default True

 def pipe(          >>> def adder(data, arg):         ...     return data + arg-        ...

@keewis any thoughts on these? I was getting similar corrections from blackdoc. The proposed code may look somwhat bunched?

mathause

comment created time in 2 hours

PullRequestReviewEvent

pull request commentpydata/xarray

rolling keep_attrs & default True

Wow, this was quite the herculean effort @mathause . Thanks. That solves any questions I had about whether the global configs are being honored...

To the extent we have to do more of these, we could consider whether we need to add keep_attrs kwargs, or we should recommend users removing attrs separately where needed. We'd still need to through the global defaults, though.

mathause

comment created time in 2 hours

Pull request review commentpydata/xarray

numpy_groupies

 def __init__(         self._groups = None         self._dims = None +    # TODO: is this correct? Should we be returning the dims of the result? This+    # will use the original dim where we're grouping by a coord.

Is the existing code for this property correct? Currently x.groupby(foo).dims != x.groupby(foo).sum(...).dims when we're grouping by an non-indexed coord

max-sixty

comment created time in 2 hours

PullRequestReviewEvent

pull request commentpydata/xarray

GH4228 Clearer error on scalar to dataframe

Thanks @PGijsbers !

PGijsbers

comment created time in 11 hours

push eventpydata/xarray

PGijsbers

commit sha 79df665ae77b0e01822bdf158eb27b91b8ac0591

GH4228 Clearer error on scalar to dataframe (#4533) * GH4228 Clearer error on scalar to dataframe * GH4228 Add change to Documentation

view details

push time in 11 hours

PR merged pydata/xarray

GH4228 Clearer error on scalar to dataframe

When attempting to convert a scalar to a dataframe, as e.g.:

xarray.DataArray([1], coords=[('onecoord', [2])]).sel(onecoord=2).to_dataframe(name='name')

raise

ValueError: cannot convert a scalar to a Dataframe

instead of

ValueError: no valid index for a 0-dimensional object

  • [x] Closes #4228
  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8.
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

I was not sure if this was "notable". Please let me know if I should add.

  • [ ] New functions/methods are listed in api.rst

n / a

I was not able to run all tests, but was able to run the changed one (more on that in the issue).

+7 -0

3 comments

3 changed files

PGijsbers

pr closed time in 11 hours

issue closedpydata/xarray

to_dataframe: no valid index for a 0-dimensional object

What happened: xr.DataArray([1], coords=[('onecoord', [2])]).sel(onecoord=2).to_dataframe(name='name') raise an exception ValueError: no valid index for a 0-dimensional object

What you expected to happen:

the same behavior as: xr.DataArray([1], coords=[('onecoord', [2])]).to_dataframe(name='name')

Anything else we need to know?:

I see that the array after the selection has no "dims" anymore, and this is what cause the error. but it still has one "coords", this is confusing. Is there any documentation about this difference ?

Environment:

<details> INSTALLED VERSIONS

commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4

xarray: 0.15.1 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.18.1 distributed: 2.18.0 matplotlib: 3.2.1 cartopy: None seaborn: 0.10.1 numbagg: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.1.1

</details>

closed time in 11 hours

ghislainp

PR opened pydata/xarray

numpy_groupies

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Closes https://github.com/pydata/xarray/issues/4473
  • [ ] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Very early effort — I found this harder than I expected — I was trying to use the existing groupby infra, but think I maybe should start afresh. The result of the numpy_groupies operation is a fully formed array, whereas we're used to handling an iterable of results which need to be concat.

I also added some type signature / notes and I was going through the existing code; mostly for my own understanding

If anyone has any thoughts, feel free to comment — otherwise I'll resume this soon

+144 -18

0 comment

3 changed files

pr created time in 16 hours

create barnchmax-sixty/xarray

branch : npg

created branch time in 16 hours

issue commentVSCodeVim/Vim

Transfer vim control to "peek definition/reference" popup window without having to click

Duplicate of https://github.com/VSCodeVim/Vim/issues/2675?

ecotner

comment created time in 18 hours

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords=dict(+    ...         lon=(["x", "y"], lon),+    ...         lat=(["x", "y"], lat),+    ...         time=time,+    ...         reference_time=reference_time,+    ...     ),+    ...     attrs=dict(+    ...         description="Ambient temperature.",+    ...         units="degC",+    ...     ),+    ... )+    >>> da+    <xarray.DataArray (x: 2, y: 2, time: 3)>+    array([[[29.11241877, 18.20125767, 22.82990387],+            [32.92714559, 29.94046392,  7.18177696]],+    <BLANKLINE>+           [[22.60070734, 13.78914233, 14.17424919],+            [18.28478802, 16.15234857, 26.63418806]]])+    Coordinates:+        lon             (x, y) float64 -99.83 -99.32 -99.79 -99.23+        lat             (x, y) float64 42.25 42.21 42.63 42.59+      * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08+        reference_time  datetime64[ns] 2014-09-05+    Dimensions without coordinates: x, y+    Attributes:+        description:  Ambient temperature.+        units:        degC++    Find out where the coldest temperature was:++    >>> coldest_temp = da.min() == da+    >>> da.where(cond=coldest_temp, drop=True)

I do think these are particularly canonical docs — which is why the PR is valuable — so we should ensure we have the canonical approach.

So I think it's fine to change to a different example if there's disagreement on this one, but if we are going to get the argmin, we should use argmin. And then maybe there's a separate discussion on whether the argmin signature is correct (I happen to think it is).

Does that make sense @Illviljan?

Illviljan

comment created time in 19 hours

PullRequestReviewEvent

push eventIllviljan/xarray

Maximilian Roos

commit sha cac512b05863c00e9a1eef04f3e4a8e049c84cb9

Update doc/whats-new.rst

view details

push time in 21 hours

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 Documentation   By `Sander van Rijn <https://github.com/sjvrijn>`_. - Removed skipna argument from :py:meth:`DataArray.count`, :py:meth:`DataArray.any`, :py:meth:`DataArray.all`. (:issue:`755`)   By `Sander van Rijn <https://github.com/sjvrijn>`_+- update the contributing guide to use merges instead of rebasing and state+  that we squash-merge. (:pull:`4355`) By `Justus Magin <https://github.com/keewis>`_.

This is my fault! Sorry

Illviljan

comment created time in 21 hours

PullRequestReviewEvent

pull request commentpydata/xarray

Improve Dataset and DataArray docstrings

OK.

Tangentially — one weirdness is that DataArray.units doesn't exist, I think...

Illviljan

comment created time in 21 hours

pull request commentpydata/xarray

fix upstream-dev CI

Ah, thanks @keewis !

keewis

comment created time in a day

pull request commentpydata/xarray

GH4228 Clearer error on scalar to dataframe

Good question — I would say documentation

PGijsbers

comment created time in a day

pull request commentpydata/xarray

Improve Dataset and DataArray docstrings

@Illviljan I moved the whatsnew to 0.16.2 — you should pull from your branch to get the changes. You don't need to do anything else.

Illviljan

comment created time in a day

push eventIllviljan/xarray

Maximilian Roos

commit sha bf573c41663399fe46dace8f7d4cb232c2a69b90

Move whatsnew to 0.16.2

view details

push time in a day

push eventIllviljan/xarray

keewis

commit sha 2bc8e33b319d54f9a6e89a88ac3161f4fb569fcf

use the fallback_version option to avoid errors on source checkouts (#4358)

view details

Maximilian Roos

commit sha 34aa056320654868e1f4591e1c8f466bae85efb7

Run `pyupgrade --py36-plus **/*.py` (#4368)

view details

Mathias Hauser

commit sha ece8f4a7b356e8e63598c9e17ec82be8dc4be80f

mention all ignored flake8 errors (#4371)

view details

Mathias Hauser

commit sha a75248a499f9445fee9b994b0ce688e377712086

fix apply_ufunc with exclude_dims and vectorize (#4130) * enumerate exclude_dims * add tests * xfail dask test (depends on 4060) * add whats new * add tests again (removed in merge) * move exclude_dims to to_gufunc_string * adapt tests * move whats new to 16.1 * update docstring Co-authored-by: Keewis <keewis@posteo.de>

view details

Mathias Hauser

commit sha d3536b9a6e92f97401865d9daf5d48cee52e40da

Silence plot warnings (#4365) * silence plot warnings (matplotlib 3.3) * whats new * updates from review * suggestions from code review * use assert_array_equal

view details

Maximilian Roos

commit sha 1a11d249a8338dad7c533f2ea7c365a823022d15

Allow cov & corr to handle missing values (#4351) * Allow cov & corr to handle missing values * Remove artifacts * Fix floating assert * Update xarray/tests/test_computation.py Co-authored-by: keewis <keewis@users.noreply.github.com> * Add test for multiple explicit dims * Use np.ma rather than drop=True * Add whatsnew * reformat Co-authored-by: keewis <keewis@users.noreply.github.com>

view details

keewis

commit sha a36d0a1d4657c848dcdd76d0ecb9c783ad464057

per-variable fill values (#4237) * implement the fill_value mapping * get per-variable fill_values to work in DataArray.reindex * Update xarray/core/dataarray.py Co-authored-by: Stephan Hoyer <shoyer@google.com> * check that the default value is used * check that merge works with multiple fill values * check that concat works with multiple fill values * check that combine_nested works with multiple fill values * check that Dataset.reindex and DataArray.reindex work * check that aligning Datasets works * check that Dataset.unstack works * allow passing multiple fill values to full_like with datasets * also allow overriding the dtype by variable * document the dict fill values in Dataset.reindex * document the changes to DataArray.reindex * document the changes to unstack * document the changes to align * document the changes to concat and merge * document the changes to Dataset.shift * document the changes to combine_* Co-authored-by: Stephan Hoyer <shoyer@google.com>

view details

Aleksandar Jelenak

commit sha 9c85dd5f792805bea319f01f08ee51b83bde0f3b

Allow chunk_store argument when opening Zarr datasets (#3804) * Allow chunk store for Zarr datasets * Add test for open_zarr() chunk_store argument * Add "chunk_store" argument to to_zarr() * Simplify chunk_store argument handling * blacken * Add minimum zarr version requirement in docstring * Update xarray/tests/test_backends.py Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com> Co-authored-by: dcherian <deepak@cherian.net> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

view details

jenssss

commit sha ffce4ec93c2e401a37dba1f0bf33dbfc648aeec9

linear interp with NaNs in nd indexer (#4233) * Added test for nd interpolation with nan * Now ignoring NaNs in missing._localize When interpolating with an nd indexer that contains NaN's, the code previously threw a KeyError from the missing._localize function. This commit fixes this by swapping `np.min` and `np.max` with `np.nanmin` and `np.nanmax`, ignoring any NaN values. * Added `@requires_scipy` to test. Also updated what's new. * Added numpy>=1.18 checks with `LooseVersion` * Added checks for np.datetime64 type This means the PR now also works for numpy < 1.18, as long as index is not with datetime * Removed `raise ValueError` from previous commit It seems that np.min/max works in place of nanmin/nanmax for datetime types for numpy < 1.18, see https://github.com/pydata/xarray/pull/3924/files * Added datetime `NaT` test. Also added a test for `Dataset` to `test_interpolate_nd_with_nan`, and "Missing values are skipped." to the dosctring of `interp` and `interp_like` methods of `DataArray` and `Dataset`.

view details

keewis

commit sha ce153852771fe6b0a45534df20b061a6f559842e

run black and blackdoc (#4381)

view details

darikg

commit sha 4aa7622b6ff16647df64fe69f39438b7cbe9576c

Use deepcopy recursively on numpy arrays (#4379) Closes #4362

view details

keewis

commit sha 13caf96efb3f121e232a35aafceed80c832a9876

remove the spurious trailing comma (#4384)

view details

Kai Mühlbauer

commit sha ac38d191c1898a5e73cef13b8bb925a6c88af728

move kwarg's `output_sizes` and `meta` to `dask_gufunc_kwargs` for in… (#4391) * move kwarg's `output_sizes` and `meta` to `dask_gufunc_kwargs` for internal use of `apply_ufunc` (follow-up to #4060, fixes #4385) * add pull request referenz to `whats-new.rst`

view details

Maximilian Roos

commit sha 385dc15e75c80984bc4398c52a2d42ac3333fcc2

Remove deprecated usages of drop (#4387) * Remove deprecated usages of drop * Formatting

view details

Maximilian Roos

commit sha bea4d618678c2f54d3dc625dd9ab581317d566c6

Remove null pytest env option (#4357)

view details

Maximilian Roos

commit sha 55480de69096cc5ae003f639c2c953066e829120

Pin pre-commit versions (#4388) * Pin pre-commit versions * whatsnew * Update doc/whats-new.rst Co-authored-by: keewis <keewis@users.noreply.github.com> * Update doc/whats-new.rst Co-authored-by: keewis <keewis@users.noreply.github.com>

view details

keewis

commit sha 2acd0fc6563c3ad57f16e6ee804d592969419938

update the isort and blackdoc pre-commit hooks (#4396)

view details

Samnan Rahee

commit sha 9ee0f018aac45a83df7a65b1499263412dab9bed

Expose use_cftime option in open_zarr #2886 (#3229) * Expose use_cftime option in open_zarr #2886 * Add test for open_zarr w/ use_cftime * Formatting only * Add entry in `whats-new.rst` * Remove space Co-authored-by: Anderson Banihirwe <axbanihirwe@ualr.edu>

view details

Russell Manser

commit sha dc2dd89b999b16e08ba51e9cf623896b01be7297

Change isinstance checks to duck Dask Array checks #4208 (#4221) * Change isinstance checks to duck Dask Array checks #4208 * Use is_dask_collection in is_duck_dask_array * Use is_dask_collection in is_duck_dask_array * Revert to isinstance checks according to review discussion * Move is_duck_dask_array to pycompat.py and use tokenize for comparisons * isort * Implement `is_duck_array` to replace `is_array_like` * Rename `is_array_like` to `is_duck_array` * `is_duck_array` checks for `__array_function__` and `__array_ufunc__` in addition to previous checks * Replace checks for `is_duck_dask_array` and `__array_function__` with `is_duck_array` * Skip numpy duck array tests when NEP18 is not active * Use utils.is_duck_array in xarray/core/formatting.py * Replace locally defined `is_duck_array` in _diff_mapping_repr * Replace `"__array_function__"` and `is_duck_dask_array` check in `short_data_repr` * Revert back to isinstance check for iris cube * Add is_duck_array_or_ndarray function to utils * Use is_duck_array_or_ndarray for duck array checks without NEP18 * Remove is_duck_dask_array_or_ndarray, replace checks with is_duck_array * Add explicit check for NumPy array to is_duck_array * Replace is_duck_array_or_ndarray checks with is_duck_array * Remove is_duck_array check for deep copy Co-authored-by: keewis <keewis@users.noreply.github.com> * Use is_duck_array check in load * Move duck dask array tokenize tests from test_units.py to test_dask.py * Use _importorskip to require pint >=0.15 instead of pytest.mark.skipif Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: keewis <keewis@users.noreply.github.com>

view details

Deepak Cherian

commit sha 9756e51a32c81d5e3c2d2b9e19b581d99427db4e

Dask/cleanup (#4383) * Remove meta_from_array * Switch to dask.array.map_blocks * No need to vendor median anymore.

view details

push time in a day

issue openedpydata/xarray

Failing main branch — test_save_mfdataset_compute_false_roundtrip

<!-- Please include a self-contained copy-pastable example that generates the issue if possible.

Please be concise with code posted. See guidelines below on how to provide a good bug report:

  • Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
  • Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve

Bug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly. -->

We had the main branch passing for a while, but unfortunately another test failure. Now in our new Linux py38-backend-api-v2 test case, intest_save_mfdataset_compute_false_roundtrip

link

self = <xarray.tests.test_backends.TestDask object at 0x7f821a0d6190>

    def test_save_mfdataset_compute_false_roundtrip(self):
        from dask.delayed import Delayed
    
        original = Dataset({"foo": ("x", np.random.randn(10))}).chunk()
        datasets = [original.isel(x=slice(5)), original.isel(x=slice(5, 10))]
        with create_tmp_file(allow_cleanup_failure=ON_WINDOWS) as tmp1:
            with create_tmp_file(allow_cleanup_failure=ON_WINDOWS) as tmp2:
                delayed_obj = save_mfdataset(
                    datasets, [tmp1, tmp2], engine=self.engine, compute=False
                )
                assert isinstance(delayed_obj, Delayed)
                delayed_obj.compute()
                with open_mfdataset(
                    [tmp1, tmp2], combine="nested", concat_dim="x"
                ) as actual:
>                   assert_identical(actual, original)
E                   AssertionError: Left and right Dataset objects are not identical
E                   
E                   
E                   Differing data variables:
E                   L   foo      (x) float64 dask.array<chunksize=(5,), meta=np.ndarray>
E                   R   foo      (x) float64 dask.array<chunksize=(10,), meta=np.ndarray>

/home/vsts/work/1/s/xarray/tests/test_backends.py:3274: AssertionError

AssertionError: Left and right Dataset objects are not identical

Differing data variables:
L   foo      (x) float64 dask.array<chunksize=(5,), meta=np.ndarray>
R   foo      (x) float64 dask.array<chunksize=(10,), meta=np.ndarray>

@aurghs & @alexamici — are you familiar with this? Thanks in advance

created time in a day

pull request commentpydata/xarray

fix upstream-dev CI

Is this fixed now?

keewis

comment created time in a day

pull request commentpydata/xarray

Improve Dataset and DataArray docstrings

Hmmm, nor do I.

@keewis might you recognize that error?

Illviljan

comment created time in a day

pull request commentpydata/xarray

Improve Dataset and DataArray docstrings

Docs are raising a warning: /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/4532/xarray/core/dataarray.py:docstring of xarray.DataArray:121: WARNING: duplicate object description of xarray.DataArray.units, other instance in generated/xarray.DataArray, use :noindex: for one of them — lmk if you need a hand figuring it out.

Illviljan

comment created time in a day

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),+    ...         "lat": (["x", "y"], lat),+    ...         "time": time,+    ...         "reference_time": reference_time,+    ...     },+    ... )

That's fine — we actually disable the PEP8 line checks in preference of the black ones (your editor / IDE should be picking these configs up — lmk if you get warnings that you'd like to disable).

If it causes any checks to fail, we can skip the checks for this section, it's a good tradeoff!

Illviljan

comment created time in a day

PullRequestReviewEvent

pull request commentpydata/xarray

Improve Dataset and DataArray docstrings

Excellent @Illviljan !

Would you like to add a line under Documentation in the whatsnew, giving yourself credit? Then I'll merge

Illviljan

comment created time in a day

pull request commentpydata/xarray

GH4228 Clearer error on scalar to dataframe

Thanks @PGijsbers and welcome to xarray!

I was not sure if this was "notable". Please let me know if I should add.

Please feel free to!

PGijsbers

comment created time in a day

pull request commentpydata/xarray

Add isocalendar to dt fields

Do you think we should keep supporting week & weekofyear? (using isocalendar under the hood)

I would low confidence vote to align with other libraries — though because I don't know any better — and if anyone has an actual view then I defer to them...

max-sixty

comment created time in a day

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords=dict(

Nice! Thanks for joining the movement 😁

Illviljan

comment created time in a day

PullRequestReviewEvent

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),+    ...         "lat": (["x", "y"], lat),+    ...         "time": time,+    ...         "reference_time": reference_time,+    ...     },+    ... )++    Find out where the coldest temperature was:++    >>> coldest_temp = da.min() == da+    >>> da.where(cond=coldest_temp, drop=True)+    <xarray.DataArray (x: 1, y: 1, time: 1)>+    array([[[7.18177696]]])+    Coordinates:+        lon             (x, y) float64 -99.32+        lat             (x, y) float64 42.21+      * time            (time) datetime64[ns] 2014-09-08+        reference_time  datetime64[ns] 2014-09-05+    Dimensions without coordinates: x, y

Unless anyone knows off-hand, no stress at all!

Illviljan

comment created time in a day

PullRequestReviewEvent

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),+    ...         "lat": (["x", "y"], lat),+    ...         "time": time,+    ...         "reference_time": reference_time,+    ...     },+    ... )

Could we have: >>> da to print what the array looks like?

Illviljan

comment created time in a day

PullRequestReviewEvent

issue openedmtkennerly/poetry-dynamic-versioning

Installing with --no-root still requires vcs

We install dependencies in a docker build step prior to installing the python package. This allows docker to cache the dependency installations, which is useful given their cost and frequency of change.

I've tried using --no-root to ensure only the dependencies are installed, and hoping the .git path wouldn't be required by poetry-dynamic-versioning. But it still seems to raise an error:

Step 17/22 : RUN poetry config virtualenvs.create false     && poetry config experimental.new-installer false     && poetry install --no-root --no-interaction --no-ansi     && poetry check
 ---> Running in 2cf1426e52dd

  RuntimeError

  Unable to detect version control system.

  at /usr/local/lib/python3.8/site-packages/dunamai/__init__.py:145 in _detect_vcs
      141│             if shutil.which(command.split()[0]):
      142│                 code, _ = _run_cmd(command, codes=[])
      143│                 if code == 0:
      144│                     return vcs
    → 145│         raise RuntimeError("Unable to detect version control system.")
      146│
      147│
      148│ @total_ordering
      149│ class Version:
The command '/bin/sh -c poetry config virtualenvs.create false     && poetry config experimental.new-installer false     && poetry install --no-root --no-interaction --no-ansi     && poetry check' returned a non-zero code: 1

created time in 2 days

PR opened pydata/xarray

Remove unused kwarg

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Passes isort . && black . && mypy . && flake8

This doesn't seem to be used? Unless it's required for compatibility with an interface?

+1 -1

0 comment

1 changed file

pr created time in 2 days

create barnchmax-sixty/xarray

branch : kwarg

created branch time in 2 days

push eventmax-sixty/xarray

Maximilian Roos

commit sha 7484546d4ab8eb8d49309b75b222f1462509220e

Add another ignore location

view details

push time in 2 days

PR opened pydata/xarray

Adjust tests to use updated pandas syntax for offsets

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Closes #4535 (somewhat)
  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
+14 -8

0 comment

2 changed files

pr created time in 2 days

push eventmax-sixty/xarray

Maximilian Roos

commit sha d25e98398e7f37ce72b5c968a79e0035c9b4a6bf

Revert irrelevant change

view details

push time in 2 days

create barnchmax-sixty/xarray

branch : loffset

created branch time in 2 days

push eventmax-sixty/xarray

Maximilian Roos

commit sha 9f467502965f635bb47f23de1477d6e7d32e1f3c

Update xarray/tests/test_dataarray.py Co-authored-by: keewis <keewis@users.noreply.github.com>

view details

push time in 2 days

Pull request review commentpydata/xarray

Remove numpy warnings, add more complete tests

     source_ndarray, ) +pytestmark = pytest.mark.filterwarnings("error:Mean of empty slice")+pytestmark = pytest.mark.filterwarnings("error:All-NaN slice encountered")

Great point, thanks!

max-sixty

comment created time in 2 days

PullRequestReviewEvent

issue commentpydata/xarray

Support operations with pandas Offset objects

Yes, good point, that would be easy and probably sufficient for most use-cases.

max-sixty

comment created time in 2 days

PR opened pydata/xarray

Remove numpy warnings, add more complete tests

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

Found a couple more gremlins. Have added better test coverage so we catch any of these in the future.

+10 -3

0 comment

4 changed files

pr created time in 2 days

create barnchmax-sixty/xarray

branch : numpy-warnings

created branch time in 2 days

issue openedpydata/xarray

Support operations with pandas Offset objects

<!-- Please do a quick search of existing issues to make sure that this has not been asked before. -->

Is your feature request related to a problem? Please describe.

Currently xarray objects containting datetimes don't operate with pandas' offset objects:

times = pd.date_range("2000-01-01", freq="6H", periods=10)
ds = xr.Dataset(
    {
        "foo": (["time", "x", "y"], np.random.randn(10, 5, 3)),
        "bar": ("time", np.random.randn(10), {"meta": "data"}),
        "time": times,
    }
)
ds.attrs["dsmeta"] = "dsdata"
ds.resample(time="24H").mean("time").time + to_offset("8H")

raises:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-f9de46fe6c54> in <module>
----> 1 ds.resample(time="24H").mean("time").time + to_offset("8H")

/usr/local/lib/python3.8/site-packages/xarray/core/dataarray.py in func(self, other)
   2763 
   2764             variable = (
-> 2765                 f(self.variable, other_variable)
   2766                 if not reflexive
   2767                 else f(other_variable, self.variable)

/usr/local/lib/python3.8/site-packages/xarray/core/variable.py in func(self, other)
   2128             with np.errstate(all="ignore"):
   2129                 new_data = (
-> 2130                     f(self_data, other_data)
   2131                     if not reflexive
   2132                     else f(other_data, self_data)

TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'pandas._libs.tslibs.offsets.Hour'

This is an issue because pandas resampling has deprecated loffset — from our test suite:

xarray/tests/test_dataset.py::TestDataset::test_resample_loffset
  /Users/maximilian/workspace/xarray/xarray/tests/test_dataset.py:3844: FutureWarning: 'loffset' in .resample() and in Grouper() is deprecated.

  >>> df.resample(freq="3s", loffset="8H")

  becomes:

  >>> from pandas.tseries.frequencies import to_offset
  >>> df = df.resample(freq="3s").mean()
  >>> df.index = df.index.to_timestamp() + to_offset("8H")

    ds.bar.to_series().resample("24H", loffset="-12H").mean()

...and so we'll need to support something like this in order to maintain existing behavior.

Describe the solution you'd like I'm not completely sure; I think probably supporting the operations between xarray objects containing datetime objects and pandas' offset objects.

created time in 2 days

PR opened pydata/xarray

Add isocalendar to dt fields

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

This currently fails tests — IIUC because our infra for this expects scalars rather than tuples, and isocalendar returns a tuple.

The existing week & weekofyear are going to stop working at some point, so if we can move off those that would be ideal:

xarray/tests/test_accessor_dt.py::TestDatetimeAccessor::test_field_access[weekofyear]
xarray/tests/test_accessor_dt.py::TestDatetimeAccessor::test_field_access[week]
  /Users/maximilian/workspace/xarray/xarray/tests/test_accessor_dt.py:72: FutureWarning: weekofyear and week have been deprecated, please use DatetimeIndex.isocalendar().week instead, which returns a Series.  To exactly reproduce the behavior of week and weekofyear and return an Index, you may call pd.Int64Index(idx.isocalendar().week)
    getattr(self.times, field), name=field, coords=[self.times], dims=["time"]

xarray/tests/test_accessor_dt.py::TestDatetimeAccessor::test_field_access[weekofyear]
xarray/tests/test_accessor_dt.py::TestDatetimeAccessor::test_field_access[week]
  /Users/maximilian/workspace/xarray/xarray/core/accessor_dt.py:44: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.
    field_values = getattr(values_as_series.dt, name).values

-- Docs: https://docs.pytest.org/en/stable/warnings.html

I'd be very happy for someone else to take this on...

+9 -0

0 comment

3 changed files

pr created time in 2 days

create barnchmax-sixty/xarray

branch : isocalendar

created branch time in 2 days

issue openedmtkennerly/poetry-dynamic-versioning

Is it possible to choose the behavior when not installed?

Currently, if poetry-dynamic-versioning isn't installed on a system, the version doesn't change from 0.0.0.

I can't immediately see how PEP 518 handles the case when something in requires isn't present.

Is it possible to elect to raise an error? Or even install poetry-dynamic-versioning?

created time in 2 days

issue commentargoproj/argo

Retry Strategy doesn't work in GKE preemptible node (pod deleted)

Would anyone have any information on whether argo will retry tasks based on retryStrategy that are interrupted by GCE preemptible / EC2 spot nodes shutting down?

If I delete the pod manually, argo seems not to retry the task — but that's an imperfect replication of a node shutting down.

fabio-rigato

comment created time in 2 days

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),+    ...         "lat": (["x", "y"], lat),+    ...         "time": time,+    ...         "reference_time": reference_time,+    ...     },+    ... )++    Find out where the coldest temperature was:++    >>> coldest_temp = da.min() == da+    >>> da.where(cond=coldest_temp, drop=True)+    <xarray.DataArray (x: 1, y: 1, time: 1)>+    array([[[7.18177696]]])+    Coordinates:+        lon             (x, y) float64 -99.32+        lat             (x, y) float64 42.21+      * time            (time) datetime64[ns] 2014-09-08+        reference_time  datetime64[ns] 2014-09-05+    Dimensions without coordinates: x, y

Could we add references to http://xarray.pydata.org/en/stable/data-structures.html#dataarray?

I'm actually not sure we can add references to normal doc pages, only to API objects, if anyone knows?

Illviljan

comment created time in 2 days

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr

Generally we don't include these, and they're in the standard prelude

Illviljan

comment created time in 2 days

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),

I guess I'm in the minority who prefer the lesser punctuation incoords=dict(lon=..., lat=...)?

Illviljan

comment created time in 2 days

Pull request review commentpydata/xarray

Improve Dataset and DataArray docstrings

 def __setitem__(self, key, value) -> None: class DataArray(AbstractArray, DataWithCoords):     """N-dimensional array with labeled coordinates and dimensions. -    DataArray provides a wrapper around numpy ndarrays that uses labeled-    dimensions and coordinates to support metadata aware operations. The API is-    similar to that for the pandas Series or DataFrame, but DataArray objects-    can have any number of dimensions, and their contents have fixed data-    types.+    DataArray provides a wrapper around numpy ndarrays that uses+    labeled dimensions and coordinates to support metadata aware+    operations. The API is similar to that for the pandas Series or+    DataFrame, but DataArray objects can have any number of dimensions,+    and their contents have fixed data types.      Additional features over raw numpy arrays:      - Apply operations over dimensions by name: ``x.sum('time')``.-    - Select or assign values by integer location (like numpy): ``x[:10]``-      or by label (like pandas): ``x.loc['2014-01-01']`` or+    - Select or assign values by integer location (like numpy):+      ``x[:10]`` or by label (like pandas): ``x.loc['2014-01-01']`` or       ``x.sel(time='2014-01-01')``.-    - Mathematical operations (e.g., ``x - y``) vectorize across multiple-      dimensions (known in numpy as "broadcasting") based on dimension names,-      regardless of their original order.-    - Keep track of arbitrary metadata in the form of a Python dictionary:-      ``x.attrs``+    - Mathematical operations (e.g., ``x - y``) vectorize across+      multiple dimensions (known in numpy as "broadcasting") based on+      dimension names, regardless of their original order.+    - Keep track of arbitrary metadata in the form of a Python+      dictionary: ``x.attrs``     - Convert to a pandas Series: ``x.to_series()``. -    Getting items from or doing mathematical operations with a DataArray-    always returns another DataArray.+    Getting items from or doing mathematical operations with a+    DataArray always returns another DataArray.++    Parameters+    ----------+    data : array_like+        Values for this array. Must be an ``numpy.ndarray``, ndarray+        like, or castable to an ``ndarray``. If a self-described xarray+        or pandas object, attempts are made to use this array's+        metadata to fill in other unspecified arguments. A view of the+        array's data is used instead of a copy if possible.+    coords : sequence or dict of array_like, optional+        Coordinates (tick labels) to use for indexing along each+        dimension. The following notations are accepted:++        - mapping {dimension name: array-like}+        - sequence of tuples that are valid arguments for+          ``xarray.Variable()``+          - (dims, data)+          - (dims, data, attrs)+          - (dims, data, attrs, encoding)++        Additionally, it is possible to define a coord whose name+        does not match the dimension name, or a coord based on multiple+        dimensions, with one of the following notations:++        - mapping {coord name: DataArray}+        - mapping {coord name: Variable}+        - mapping {coord name: (dimension name, array-like)}+        - mapping {coord name: (tuple of dimension names, array-like)}++    dims : hashable or sequence of hashable, optional+        Name(s) of the data dimension(s). Must be either a hashable+        (only for 1D data) or a sequence of hashables with length equal+        to the number of dimensions. If this argument is omitted,+        dimension names default to ``['dim_0', ... 'dim_n']``.+    name : str or None, optional+        Name of this array.+    attrs : dict_like or None, optional+        Attributes to assign to the new instance. By default, an empty+        attribute dictionary is initialized.++    Examples+    --------+    Import modules:++    >>> import numpy as np+    >>> import pandas as pd+    >>> import xarray as xr++    Create data:++    >>> np.random.seed(0)+    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)+    >>> precipitation = 10 * np.random.rand(2, 2, 3)+    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]+    >>> lat = [[42.25, 42.21], [42.63, 42.59]]+    >>> time = pd.date_range("2014-09-06", periods=3)+    >>> reference_time = pd.Timestamp("2014-09-05")++    Initialize a dataarray with multiple dimensions:++    >>> da = xr.DataArray(+    ...     data = temperature,+    ...     dims = ["x", "y", "time"],+    ...     coords={+    ...         "lon": (["x", "y"], lon),+    ...         "lat": (["x", "y"], lat),+    ...         "time": time,+    ...         "reference_time": reference_time,+    ...     },+    ... )

Can we show da here?

Illviljan

comment created time in 2 days

PullRequestReviewEvent
PullRequestReviewEvent

issue closedpydata/xarray

Unexpected warning when taking mean of all-NaN slice in chunked DataArray

Problem description

When taking the mean of a DataArray with an all-NaN axis, I get the following warning:

data = np.array([[np.nan,0],[np.nan,1]])
xc = xr.DataArray(data).chunk()
xc.mean('dim_0').compute()

/nbhome/xrc/anaconda2/envs/py361/lib/python3.6/site-packages/dask/array/numpy_compat.py:28: RuntimeWarning: invalid value encountered in true_divide
  x = np.divide(x1, x2, out)

<xarray.DataArray (dim_1: 2)>
array([ nan,  0.5])
Dimensions without coordinates: dim_1

This confused me because the warning suggests a 0/0 division and/or issues with typecasting. Furthermore, the warning is different when the data is not chunked:



x = xr.DataArray(data)
x.mean('dim_0')

/nbhome/xrc/anaconda2/envs/py361/lib/python3.6/site-packages/xarray/core/nanops.py:161: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)

<xarray.DataArray (dim_1: 2)>
array([ nan,  0.5])
Dimensions without coordinates: dim_1

Also, using pure dask does not raise this warning.

xd = da.from_array(data,chunks=(1,1))

xd.mean(axis=0).compute()

array([ nan,  0.5])

Expected Output

Either the warning from a non-chunked DataArray or no warning would be preferred.

Output of xr.show_versions()

<details> commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.30.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: en_US.ISO8859-1

xarray: 0.11.0 pandas: 0.23.4 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None cftime: 1.0.2.1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.0 cyordereddict: None dask: 0.20.1 distributed: 1.24.0 matplotlib: 3.0.1 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 35.0.2 pip: 18.1 conda: None pytest: 3.0.7 IPython: 6.5.0 sphinx: None

</details>

closed time in 3 days

chuaxr

delete branch max-sixty/xarray

delete branch : clean

delete time in 3 days

push eventpydata/xarray

Maximilian Roos

commit sha 1597e3a91eaf96626725987d23bbda2a80d2bae7

Remove unused config files (#4519)

view details

push time in 3 days

PR merged pydata/xarray

Remove unused config files

<!-- Feel free to remove check-list items aren't relevant to your change -->

Root-level config files that we no longer use

+0 -39

1 comment

3 changed files

max-sixty

pr closed time in 3 days

issue commentGoogleCloudPlatform/gsutil

TypeError: cannot pickle '_io.TextIOWrapper' object

That's an excellent workaround, thanks @dilipped !

lorddaedra

comment created time in 3 days

pull request commentpydata/xarray

Remove unused config files

I'll hit the green button later unless anyone has comments

max-sixty

comment created time in 3 days

issue commentpydata/xarray

Inconsistency between sel and isel when working with slice

What would your ideal design be @shoyer ? That sel operates the same as isel?

Honestly, I think this is a design mistake in pandas

Honestly, I think this is a design mistake in C — 0 based indexing! 😀

we'd have come up with our own replacement for slice (xarray.Slice?) and deal with breaking lots of old code in subtle ways.

One elegant way that rust deals with this is multiple range types — Range vs RangeInclusive — for each permutation of open / closed.

It wouldn't be impossible to do this — inherit from slice and implement these, without backward-incompat changes. But would be lots of work with (I think) modest benefits

vincentchabot

comment created time in 3 days

push eventpydata/xarray

Maximilian Roos

commit sha 15da7eb27153600ac35c09df2073b973a2be4b0e

Request a reproducible example for SO questions (#4528)

view details

push time in 4 days

delete branch max-sixty/xarray

delete branch : so

delete time in 4 days

PR merged pydata/xarray

Request a reproducible example for SO questions

<!-- Feel free to remove check-list items aren't relevant to your change -->

I've been trying to answer most questions on SO over the past month or so. The quality of questions is not great — is there a way we could encourage people to post reproducible examples? I think our push to do that on GH issues has been broadly successful.

I occasionally reply asking for a reproducible example, but it's not that friendly an approach.

There are some SO docs: https://stackoverflow.com/help/minimal-reproducible-example, but they're not explicitly linked to from SO.

+4 -2

1 comment

1 changed file

max-sixty

pr closed time in 4 days

PR opened pydata/xarray

Request a reproducible example for SO questions

<!-- Feel free to remove check-list items aren't relevant to your change -->

I've been trying to answer most questions on SO over the past month or so. The quality of questions is not great — is there a way we could encourage people to post reproducible examples? I think our push to do that on GH issues has been broadly successful.

I occasionally reply asking for a reproducible example, but it's not that friendly an approach.

There are some SO docs: https://stackoverflow.com/help/minimal-reproducible-example, but they're not explicitly linked to from SO.

+4 -2

0 comment

1 changed file

pr created time in 5 days

create barnchmax-sixty/xarray

branch : so

created branch time in 5 days

issue commentGoogleCloudPlatform/gsutil

TypeError: cannot pickle '_io.TextIOWrapper' object

While I recognize comments like "Is tHiS fiXEd??" are not helpful — would it be possible for someone on the Google side to acknowledge this is a bug in gsutil and plan to resolve?

Currently, IIUC, gsutil breaks on python 3.8 — a version released a year ago, and the default brew version. Workarounds like installing another version of python are not small adjustments, and difficult for less technical colleagues. There are 49 :+1:s on the issue.

lorddaedra

comment created time in 5 days

startedcxed/FAERSFix

started time in 6 days

issue commentfish-shell/fish-shell

Flat to set which appends if not present?

Thanks @faho — sorry I missed that. Great to see!

max-sixty

comment created time in 6 days

issue openedfish-shell/fish-shell

Flat to set which appends if not present?

<!-- Please tell us which fish version you are using by executing the following:

fish --version echo $version

Please tell us which operating system and terminal you are using. The output of uname -a and echo $TERM may be helpful in this regard although other commands might be relevant in your specific situation.

Please tell us if you tried fish without third-party customizations by executing this command and whether it affected the behavior you are reporting:

sh -c 'env HOME=$(mktemp -d) fish'

Tell us how to reproduce the problem. Including an asciinema.org recording is useful for problems that involve the visual display of fish output such as its prompt. -->

To preface — thank you for building such a brilliant tool, I use fish everyday and it makes my life easier and my work more productive.

It's great we have a canonical way to set paths as defined, now at the top of https://github.com/fish-shell/fish-shell/issues/527

There was some discussion at the bottom of the thread about having a more succinct way of adding paths in a script that's run repeatedly. For example, consistent with the canonical approach, I currently I have a bunch of lines in config.fish like:

    contains "$HOME/.cargo/bin" $fish_user_paths; or set -Ua fish_user_paths "$HOME/.cargo/bin"
    contains "$HOME/.local/bin" $fish_user_paths; or set -Ua fish_user_paths "$HOME/.local/bin"
    contains "$HOME/.poetry/bin" $fish_user_paths; or set -Ua fish_user_paths "$HOME/.poetry/bin"

One idea I wanted to raise is a flag for set which would only append the value to the variable if the value is not already present in the variable (aka "overwrite" it). If the flag were -o the above lines would then become:

    set -Uao $fish_user_paths "$HOME/.cargo/bin" 
    set -Uao $fish_user_paths "$HOME/.local/bin"
    set -Uao $fish_user_paths "$HOME/.poetry/bin"

This would also be more flexible than a custom solution for paths, like fish_add_path, which was also suggested on the linked issue.

Any thoughts?

created time in 6 days

issue commentpydata/pandas-gbq

Import error with pandas_gbq

Thanks @aoliu95 , that's v helpful

It's confusing that we have an error on collections_abc, since that should only be present on Py2, from @tswast 's link

Add six.moves.collections_abc, which aliases the collections module on Python 2-3.2 and the collections.abc on Python 3.3 and greater.

It's unfortunate we lose the stack trace of the original import error; that would be helpful to work out what's going on...

This doesn't repro on a new image:

docker run -it python:3.8 bash

pip install six pandas-gbq

root@a51e72da80a0:/# python -c "import pandas_gbq" # works

root@a51e72da80a0:/# pip list
Package                       Version
----------------------------- ---------
cachetools                    4.1.1
certifi                       2020.6.20
cffi                          1.14.3
chardet                       3.0.4
google-api-core               1.23.0
google-auth                   1.22.1
google-auth-oauthlib          0.4.1
google-cloud-bigquery         2.2.0
google-cloud-bigquery-storage 2.0.0
google-cloud-core             1.4.3
google-crc32c                 1.0.0
google-resumable-media        1.1.0
googleapis-common-protos      1.52.0
grpcio                        1.32.0
idna                          2.10
libcst                        0.3.13
mypy-extensions               0.4.3
numpy                         1.19.2
oauthlib                      3.1.0
pandas                        1.1.3
pandas-gbq                    0.14.0
pip                           20.2.3
proto-plus                    1.11.0
protobuf                      3.13.0
pyarrow                       1.0.1
pyasn1                        0.4.8
pyasn1-modules                0.2.8
pycparser                     2.20
pydata-google-auth            1.1.0
python-dateutil               2.8.1
pytz                          2020.1
PyYAML                        5.3.1
requests                      2.24.0
requests-oauthlib             1.3.0
rsa                           4.6
setuptools                    50.3.0
six                           1.15.0
typing-extensions             3.7.4.3
typing-inspect                0.6.0
urllib3                       1.25.11
wheel                         0.35.1

What happens if you upgrade google-cloud-bigquery @aoliu95 ?

winsonhys

comment created time in 6 days

issue commentpydata/pandas-gbq

Import error with pandas_gbq

@aoliu95 can you upgrade pandas-gbq? The latest is 0.14.0, I see yours is 0.12.0?

winsonhys

comment created time in 6 days

issue commentpydata/pandas-gbq

Import error with pandas_gbq

Thanks @aoliu95 , I'll look at this. Do you have a stack trace to hand?

winsonhys

comment created time in 6 days

issue commentpydata/xarray

Inconsistency between sel and isel when working with slice

I understand this can be surprising at first glance, but the alternatives are more confusing — a label wouldn't select itself — so selecting a range including a label would require knowing what label followed the end of the range.

From the docs:

Like pandas, label based indexing in xarray is inclusive of both the start and stop bounds.

While the design is fixed, we'd take a PR to make the documentation clearer, if you have a view for how to improve it

vincentchabot

comment created time in 6 days

issue commentpydata/pandas-gbq

Import error with pandas_gbq

If anyone is on python3 and still has this problem, please post the versions of this library, google-cloud-bigquery, and six.

winsonhys

comment created time in 6 days

issue commentpydata/pandas-gbq

Import error with pandas_gbq

my python version is 2.7, how can I solve this?

This library requires python3

winsonhys

comment created time in 7 days

issue commentgoogleapis/google-api-python-client

`socket.timeout: timed out` when using full IPv6 stack.

This was fixed with the rollback and is not longer an issue for us, I should have made that clearer above.

There was an issue related to API discovery service between 2020-10-06 to 2020-10-08 from GCE VMs which may have caused this timeout issue as well and the changes were rolled back to mitigate the issue.

I can't see this on the incident history, I'm guessing you see this internally

n8felton

comment created time in 7 days

issue commentpython-poetry/poetry

Poetry picks up the wrong version of python

Yes, I had some separate issues with PATH when using that approach, which using Brew takes care of. But potentially dealing with those issues is less work than dealing with these issues.

ericriff

comment created time in 7 days

issue commentpython-poetry/poetry

Poetry picks up the wrong version of python

Unfortunately poetry env use 3.8 is fickle, and doesn't work on a colleague's system, who still sees:

image

Is this expected? Am I doing something wrong to make this happen or is this a bug? Is there anything I can do to fix this?

ericriff

comment created time in 7 days

issue commentpydata/xarray

Wrap numpy-groupies to speed up Xarray's groupby aggregations

Here's a very quick POC:

from numpy_groupies.aggregate_numba import aggregate

def npg_groupby(da: xr.DataArray, dim, func='sum'):
    group_idx, labels = pd.factorize(da.indexes[dim])
    axis = da.get_axis_num(dim)
    array = npg.aggregate(group_idx=group_idx, a=da, func=func, axis=axis)
    return array

Run on this array:

size_factor = 1000

da = xr.DataArray(
    np.arange(1440 * size_factor).reshape(45 * size_factor, 8, 4),
    dims=("x", "y", "z"),
    coords=dict(x=list(range(45)) * size_factor, y=[1, 2, 3, 4] * 2, z=[1, 2] * 2),
)

It's about 2x as fast, though only generates the numpy array:

%%timeit 
npg_groupby(da, 'x')
# 15 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
da.groupby('x').sum()
# 37.6 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Any thoughts on any of:

  • What's the best way of reconstituting the coords etc, after npg produces the array?
  • Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling .map to loop over each group in python.
  • Presumably we're going to need to keep the existing logic around for dask — is it reasonable for an initial version to defer to the existing logic for all dask arrays? (+ @shoyer 's thoughts above on this)
shoyer

comment created time in 8 days

issue commentpydata/xarray

Wrap numpy-groupies to speed up Xarray's groupby aggregations

Here's a very quick POC:

from numpy_groupies.aggregate_numba import aggregate

def npg_groupby(da: xr.DataArray, dim, func='sum'):
    group_idx, labels = pd.factorize(da.indexes[dim])
    axis = da.get_axis_num(dim)
    array = npg.aggregate(group_idx=group_idx, a=da, func=func, axis=axis)
    return array

Run on this array:

size_factor = 1000

da = xr.DataArray(
    np.arange(1440 * size_factor).reshape(45 * size_factor, 8, 4),
    dims=("x", "y", "z"),
    coords=dict(x=list(range(45)) * size_factor, y=[1, 2, 3, 4] * 2, z=[1, 2] * 2),
)

It's about 2x as fast, though only generates the numpy array:

%%timeit 
npg_groupby(da, 'x')
# 15 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
da.groupby('x').sum()
# 37.6 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Any thoughts on any of:

  • What's the best way of reconstituting the coords etc, after npg produces the array?
  • Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling .map to loop over each group in python.
  • Presumably we're going to need to keep the existing infra around for dask, is that right?
shoyer

comment created time in 8 days

delete branch max-sixty/xarray

delete branch : isort

delete time in 8 days

push eventpydata/xarray

Maximilian Roos

commit sha 97e26257e81b0ba35af4a34be43a3e9cc666b9bc

Use black profile for isort (#4518)

view details

push time in 8 days

PR merged pydata/xarray

Use black profile for isort

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Passes isort . && black . && mypy . && flake8

Purely aesthetic simplification

+3 -5

0 comment

1 changed file

max-sixty

pr closed time in 8 days

Pull request review commentpydata/xarray

Allow fsspec/zarr/mfdataset

 Cloud Storage Buckets  It is possible to read and write xarray datasets directly from / to cloud storage buckets using zarr. This example uses the `gcsfs`_ package to provide-a ``MutableMapping`` interface to `Google Cloud Storage`_, which we can then-pass to xarray::+an interface to `Google Cloud Storage`_.++From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you+automatically when reading, such that you can open a dataset ina  single+call. You should include any arguments to the storage backend as the+key ``storage_options``, part of ``backend_kwargs``.++.. code:: python++    ds_gcs = xr.open_dataset(+        "gcs://<bucket-name>/path.zarr",+        backend_kwargs={"storage_options": {"project":  '<project-name>', "token": None}},+        engine="zarr"+    )++This also works with ``open_mfdataset``, allowing you to pass a list of paths or+a URL to be interpreted as a glob string.++For older versions, and for writing, you must explicitly set up a ``MutibleMapping``
For older versions, and for writing, you must explicitly set up a ``MutableMapping``
martindurant

comment created time in 8 days

PullRequestReviewEvent
PullRequestReviewEvent

issue commentmtkennerly/poetry-dynamic-versioning

Is the `version` needed in pyproject.toml?

Thank you v much @mtkennerly

2m

comment created time in 8 days

pull request commentpydata/xarray

doc.yml: pin eccodes

Thanks @mathause

mathause

comment created time in 8 days

PR opened pydata/xarray

Remove unused config files

<!-- Feel free to remove check-list items aren't relevant to your change -->

Root-level config files that we no longer use

+0 -39

0 comment

3 changed files

pr created time in 9 days

create barnchmax-sixty/xarray

branch : clean

created branch time in 9 days

PR opened pydata/xarray

Use black profile for isort

<!-- Feel free to remove check-list items aren't relevant to your change -->

  • [x] Passes isort . && black . && mypy . && flake8

Purely aesthetic simplification

+3 -5

0 comment

1 changed file

pr created time in 9 days

more