profile
viewpoint

innobi/pantab 44

Read/Write pandas DataFrames with Tableau Hyper Extracts

WillAyd/BOESDKParser 4

Sample Web Server and Module to Showcase how you can interact with the BusinessObjects RESTful Web Services SDK

WillAyd/datadev-challenge2 1

2nd Challenge for Tableau Datadev

WillAyd/thumbs_up 1

Dead-Simple Vote and Karma Management

WillAyd/async-phish 0

Asynchronous Library for Interacting with Phish.NET API

WillAyd/BOATParser 0

Business Objects AdminTools Parser

WillAyd/cpython 0

The Python programming language

WillAyd/cython 0

The most widely used Python to C compiler

WillAyd/numpy 0

The fundamental package for scientific computing with Python.

WillAyd/pandas 0

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

PullRequestReviewEvent

push eventpandas-dev/pandas

attack68

commit sha ff11c056f0a3d5a7ee2a20b52f85c9c04af95e38

PERF: styler uuid control and security (#36345)

view details

push time in 2 days

pull request commentpandas-dev/pandas

PERF: styler uuid control and security

Thanks @attack68

attack68

comment created time in 2 days

PR merged pandas-dev/pandas

PERF: styler uuid control and security Styler
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [x] whatsnew entry

Styler uuid is randomly generated from 16bytes, or 128bit entropy, which is poorly formatted for data transmission over the web. Styler uuid1 is super-ceded by uuid4 which is more network secure.

This PR addresses the two above items by switching uuid method, and then coding the default entropy to 5 characters (20bit) which should be more than sufficient to avoid HTML table collision on a webpage, but make substantial data transfer savings for large tables.

uuid length remains configurable.

+34 -3

1 comment

3 changed files

attack68

pr closed time in 2 days

PullRequestReviewEvent

pull request commentpandas-dev/pandas

BUG: Concat typing

Within the scope of this PR I think it make sense to not try and use the Union type of two objects in the parent.

Probably fine to have that Union from an API perspective but doesn't need to be forced internally

rhshadrach

comment created time in 3 days

pull request commentpandas-dev/pandas

REF: dont set ndarray.data in libreduction

This thing is super thorny...nice effort in any case here. We will figure it out one of these days

jbrockmendel

comment created time in 3 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

Document Tips for Debugging C Extensions

+.. _debugging_c_extensions:++{{ header }}++======================+Debugging C extensions+======================++Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful. These steps are geared towards using lldb as a debugger, though the steps for gdb will be similar.++First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:++.. code-block:: sh++   python setup.py build_ext --inplace -j4 --with-debugging-symbols++Using a debugger+================++You can create a script that hits the extension module you are looking to debug and place it in the project root. Thereafter launch a Python process under lldb:

To be clear I don't recommend one over the other - should just use whatever comes with your build system.

I'll try to reword

WillAyd

comment created time in 3 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

PERF: styler uuid control and security

 class Styler:         List of {selector: (attr, value)} dicts; see Notes.     uuid : str, default None         A unique identifier to avoid CSS collisions; generated automatically.+    uuid_len : int, default 5

Minor but should append this to the end of the signature, in case people are calling positionally

attack68

comment created time in 4 days

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

pull request commentpandas-dev/pandas

Fix documentation for new float_precision on read_csv

Thanks @Dr-Irv

Dr-Irv

comment created time in 4 days

push eventpandas-dev/pandas

Irv Lustig

commit sha a607bd7de51b12bc7b77f9cb54b7514b5759cdef

Fix documentation for new float_precision on read_csv (#36358)

view details

push time in 4 days

PR merged pandas-dev/pandas

Reviewers
Fix documentation for new float_precision on read_csv Docs
  • [x] closes https://github.com/pandas-dev/pandas/pull/36228#discussion_r487587534
  • [ ] tests added / passed N/A
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ ] whatsnew entry N/A

A followup to PR #36228 to get the documentation string right for version changed. (was in wrong place, missing blank line)

+11 -7

3 comments

1 changed file

Dr-Irv

pr closed time in 4 days

PullRequestReviewEvent

pull request commentpandas-dev/pandas

BUG: Concat typing

It is worth noting that FrameOrSeriesUnion reports errors if concat called from an abstract/base class

Unrelated to this PR but what is the point of FrameOrSeriesUnion vs just using NDFrame? Just more user-friendly from an API perspective right?

rhshadrach

comment created time in 4 days

pull request commentpandas-dev/pandas

CI/BLD: Restrict ci/code_checks.sh to tracked repo files

This happens because the CI script is much more complex than a simple black or flake8 or whatever, and is configured in a very specific way.

These in particular are managed through the configuration file, so they won't differ from being run in pre-commit which has the added bonus of being cross platform

plammens

comment created time in 4 days

pull request commentpandas-dev/pandas

BLD/CI fix arm64 build #36397

I think should back port unless we are really sure that 1.1.3 is the last bug fix release in the 1.1. series

VirosaLi

comment created time in 4 days

Pull request review commentpandas-dev/pandas

Truncate columns list to match tr_frame for correct dict formatters lookup

 def test_to_string_with_formatters(self):         )         assert result == result2 +    def test_to_string_with_truncated_formatters(self):+        df = DataFrame(+            {+                "int": [1, 2, 3],+                "float": [1.0, 2.0, 3.0],+                "object": [(1, 2), True, False],+            },+            columns=["int", "float", "object"],+        )++        formatters = [+            ("int", lambda x: f"[1] {x}"),+            ("float", lambda x: f"[2] {x}"),+            ("object", lambda x: f"[3] {x}"),+        ]+        result = df.to_string(formatters=dict(formatters), max_cols=2)

The suggestion was to parametrize the arguments to formatters; one parametrize for the dict input and one for the list

kesmit13

comment created time in 4 days

PullRequestReviewEvent

pull request commentpandas-dev/pandas

BLD: Restrict ci/code_checks.sh to tracked repo files

I don't think it's worth adding complexity here. This script is intended for our CI processes. For local development you can use pre-commit:

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#pre-commit

plammens

comment created time in 5 days

pull request commentpandas-dev/pandas

Py39 Support - Remove Deprecated Unicode Length Check

Also removed pre per https://github.com/pandas-dev/pandas/issues/36296#issuecomment-693106050

WillAyd

comment created time in 6 days

push eventWillAyd/pandas

Will Ayd

commit sha 76f499e33526ed45b2d342b300487011eab520ae

remove pre

view details

push time in 6 days

push eventWillAyd/pandas

Maxim Ivanov

commit sha 4dc58879161bf8eb09d51d488e02e1725be86dbd

CLN: pandas/io/parsers.py (#36269)

view details

Felix Claessen

commit sha 06b3f5d9815b8545264c4726a6fa015766da5b03

Resample fix dst transition (#36264)

view details

Richard Shadrach

commit sha 39c5e29ad1be85d2ee19006d4f7eb9e151ff48f1

CLN: _wrap_applied_output (#36260)

view details

jbrockmendel

commit sha 7e0bf1c8ca62c402f937d7b31472d22d2854aac3

REF: implement Categorical._validate_listlike (#36274)

view details

jbrockmendel

commit sha 3a6aedc535c4becb5f25c3022a220f279a8f73b9

CLN: simplify Categorical comparisons (#36250)

view details

jbrockmendel

commit sha 65f78c7628f69e42aea61b4e26ccafe4e5a09741

searchsorted numpy compat for Period dtype (#36254)

view details

Asish Mahapatra

commit sha a9f8d3c0de14cfbe4c76a695c995fab632cbcfad

BUG: na parameter for str.startswith and str.endswith not propagating for Series with categorical dtype (#36249)

view details

jbrockmendel

commit sha cb58dbbb4adbce80a82fa30b853300310617122f

PERF: JoinUnit.is_na (#36312)

view details

Terji Petersen

commit sha 6100425aae528542291a5da5864dde0299d0f31a

PERF: creating string Series/Arrays from sequence with many strings (#36304)

view details

Yanxian Lin

commit sha c6e3af7170f87d387fa11c8cda22aa6da28de495

TST: add test case for sort_index on multiindexed Frame with sparse columns (#36236)

view details

jbrockmendel

commit sha c10462245a388655145b9d2fdd2d1765b8057aeb

REF: use BlockManager.apply in csv code (#36150)

view details

jbrockmendel

commit sha 4729d8f6e6e7c0e71fee3808ea8dee5106a6b02d

STY/WIP: check for private imports/lookups (#36055)

view details

Sam Ezebunandu

commit sha bed96566728b642572bb0880aee35c03734667e9

DOC: Fix DataFrame.query contradiction on use of Python keywords as identifiers (#36311)

view details

Richard Shadrach

commit sha ab5b38d560e266ac0d813ffe83d025420fbf98af

BUG/CLN: Decouple Series/DataFrame.transform (#35964)

view details

Avinash Pancham

commit sha b8f22ad3b980cd7a3687f55586a25e10ad951329

DEPR: Deprecate pandas/io/date_converters.py (#35741)

view details

Richard Shadrach

commit sha 822dc6f901fafd646257de2fc5ea918bbec82f93

REGR: Series access with Index of tuples/frozenset (#36147)

view details

jbrockmendel

commit sha 229722e6a24b9658a27fc06b617701ad02aaa435

ENH: consistently cast strings for DTA/TDA/PA.__setitem__ (#36261) * ENH: consistently cast strings for DTA/TDA/PA.__setitem__ * whatsnew

view details

Fangchen Li

commit sha e47d5ebbe0f714d199aa08eda68cb4fbe42503a2

CI: install numpy from pip #36296 (#36323)

view details

jbrockmendel

commit sha 5647251706075dc20507cdd9b408e5a74311e4fb

REF: _convert_for_op -> _validate_fill_value (#36318)

view details

jbrockmendel

commit sha 28aab6535e67279e5038490d97ee88ac422874d6

REF: separate out helpers from iLoc._setitem_with_indexer (#36315)

view details

push time in 6 days

pull request commentpandas-dev/pandas

CI: xfail failing parquet test

@jbrockmendel OK by you?

jbrockmendel

comment created time in 6 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

BUG: Python Parser skipping over items if BOM present in first element of header

 def _check_for_bom(self, first_row):             return [new_row] + first_row[1:]          elif len(first_row_bom) > 1:-            return [first_row_bom[1:]]+            return [first_row_bom[1:]] + first_row[1:]

Hmm I don't find this very clear why we would do this - can you try refactoring the code above this to better suit the requirement?

asishm

comment created time in 6 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

Document Tips for Debugging C Extensions

+.. _debugging_c_extensions:++{{ header }}++**********************+Debugging C extensions+**********************++Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful. These steps are geared towards using lldb as a debugger, though the steps for gdb will be similar.++First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:++.. code-block:: sh++   python setup.py build_ext --inplace -j4 --with-debugging-symbols++Next you can create a script that hits the extension module you are looking to debug and place it in the project root. Thereafter launch a Python process under lldb:++.. code-block:: sh++   lldb python++If desired, set breakpoints at various file locations using the below syntax:++.. code-block:: sh++   breakpoint set --file pandas/_libs/src/ujson/python/objToJSON.c --line 1547++At this point you may get *WARNING:  Unable to resolve breakpoint to any actual locations.*. If you have not yet executed anything it is possible that this module has not been loaded into memory, which is why the location cannot be resolved. You can simply ignore for now as it will bind when we actually execute code.++Finally go ahead and execute your script:++.. code-block:: sh++   run <the_script>.py++Code execution will halt at the breakpoint defined or at the occurance of any segfault. LLDB's `GDB to LLDB command map <https://lldb.llvm.org/use/map.html>`_ provides a listing of debugger command that you can execute using either debugger.++Another option to execute the entire test suite under the debugger would be to run the following:++.. code-block:: sh++   lldb -- python -m pytest++Or for gdb++.. code-block:: sh++   gdb --args python -m pytest

I'm trying to avoid adding too much detail here since this issue is more of a pyenv thing than a debugger issue

WillAyd

comment created time in 7 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

Document Tips for Debugging C Extensions

+.. _debugging_c_extensions:++{{ header }}++======================+Debugging C extensions+======================++Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful. These steps are geared towards using lldb as a debugger, though the steps for gdb will be similar.++First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:++.. code-block:: sh++   python setup.py build_ext --inplace -j4 --with-debugging-symbols++Using a debugger+================++You can create a script that hits the extension module you are looking to debug and place it in the project root. Thereafter launch a Python process under lldb:

I have only used on macOS where it comes bundled with the Xcode tools. Not sure about other systems

WillAyd

comment created time in 7 days

PullRequestReviewEvent

PR closed pandas-dev/pandas

Clarify docs for df.to_sql `chunksize` Docs Needs Discussion
  • [x] closes #35891
  • [ ] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ ] whatsnew entry

Note: this is a docs pr, so not adding tests or a whatsnew entry.

+2 -1

1 comment

1 changed file

gbrova

pr closed time in 7 days

pull request commentpandas-dev/pandas

Clarify docs for df.to_sql `chunksize`

Thanks for the PR but looks like more discussion is needed in original issue before doing anything here

gbrova

comment created time in 7 days

push eventWillAyd/pandas

jbrockmendel

commit sha 2067d7e306ae720d455f356e4da21f282a8a762e

CLN: typo cleanups (#36276) * typo cleanups * typo fixup

view details

jbrockmendel

commit sha 15fd0e74a4a188332d2484539101ebd171f40d6e

REF: de-duplicate _wrap_joined_index (#36282)

view details

jbrockmendel

commit sha 21fe97204227df0a641bd363b2a7e6732dad9091

REF: de-duplicate sort_values (#36301)

view details

jbrockmendel

commit sha 2da7c343abf0c3b94004c537e53fa368051eefdd

PERF: get_dtype_kinds (#36309)

view details

Will Ayd

commit sha 0a4925a63332aa5d153928c842996c2eb9cea6bd

Merge remote-tracking branch 'upstream/master' into py39-depr-fix

view details

push time in 9 days

pull request commentpandas-dev/pandas

Py39 Support - Remove Deprecated Unicode Length Check

/azp run

WillAyd

comment created time in 10 days

PR opened pandas-dev/pandas

Py39 Support - Remove Deprecated Unicode Length Check
  • [X] closes #36279
  • [ ] tests added / passed
  • [ ] passes black pandas
  • [ ] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ ] whatsnew entry
+2 -2

0 comment

1 changed file

pr created time in 10 days

create barnchWillAyd/pandas

branch : py39-depr-fix

created branch time in 10 days

issue commentpandas-dev/pandas

QST: Python 3.9 testing in CI

Strange - I made the edits above and it seemed to get rid of those warnings. I'll push up a PR soon

wumpus

comment created time in 10 days

issue commentpandas-dev/pandas

QST: Python 3.9 testing in CI

We have the warnings turned on but they must not be throwing a deprecation warning until 3.9.

Can you post the full error list?

wumpus

comment created time in 10 days

issue commentpandas-dev/pandas

QST: Python 3.9 testing in CI

@wumpus if you'd like to try I think you just need to update this and any occurrence within the module

https://github.com/pandas-dev/pandas/blob/ddf2f05e25ca94794e295cf173e3fbc351581a78/pandas/_libs/writers.pyx#L5

wumpus

comment created time in 10 days

issue commentpandas-dev/pandas

QST: Python 3.9 testing in CI

I think the problem is we import PyUnicode_GET_SIZE in writers.pyx which is deprecated. Might be able to swap out with PyUnicode_GET_LENGTH assuming that is available in Cython

https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_GET_SIZE

wumpus

comment created time in 10 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

CLN: pandas/io/parsers.py

 def __init__(self, f, engine=None, **kwds):         self.nrows = options.pop("nrows", None)         self.squeeze = options.pop("squeeze", False) -        # might mutate self.engine-        self.engine = self._check_file_or_buffer(f, engine)+        self._check_file_or_buffer(f, engine)         self.options, self.engine = self._clean_options(options, engine)          if "has_index_names" in kwds:             self.options["has_index_names"] = kwds["has_index_names"] -        self._make_engine(self.engine)+        self._engine = self._make_engine(self.engine)

Ah OK I see - thanks for clarifying

ivanovmg

comment created time in 10 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

PERF: pd.to_datetime, unit='s' much slower for float64 than for int64

 def array_with_unit_to_datetime(      assert is_ignore or is_coerce or is_raise -    if unit == 'ns':-        if issubclass(values.dtype.type, np.integer):-            result = values.astype('M8[ns]')+    if unit == "ns":+        if issubclass(values.dtype.type, np.integer) or issubclass(
        if issubclass(values.dtype.type, (np.integer, np.float)):
arw2019

comment created time in 10 days

Pull request review commentpandas-dev/pandas

PERF: pd.to_datetime, unit='s' much slower for float64 than for int64

 def array_with_unit_to_datetime(      assert is_ignore or is_coerce or is_raise -    if unit == 'ns':-        if issubclass(values.dtype.type, np.integer):-            result = values.astype('M8[ns]')+    if unit == "ns":+        if issubclass(values.dtype.type, np.integer) or issubclass(+            values.dtype.type, np.float_+        ):+            result = values.astype("M8[ns]")         else:             result, tz = array_to_datetime(values.astype(object), errors=errors)         return result, tz -    m = cast_from_unit(None, unit)+    m, p = precision_from_unit(unit)      if is_raise:--        # try a quick conversion to i8+        # try a quick conversion to i8/f8         # if we have nulls that are not type-compat         # then need to iterate-        if values.dtype.kind == "i":-            # Note: this condition makes the casting="same_kind" redundant-            iresult = values.astype('i8', casting='same_kind', copy=False)-            # fill by comparing to NPY_NAT constant++        if values.dtype.kind == "i" or values.dtype.kind == "f":+            iresult = values.astype("i8", copy=False)+            # fill missing values by comparing to NPY_NAT             mask = iresult == NPY_NAT             iresult[mask] = 0-            fvalues = iresult.astype('f8') * m+            fvalues = values.astype("f8") * m             need_to_iterate = False -        # check the bounds         if not need_to_iterate:--            if ((fvalues < Timestamp.min.value).any()-                    or (fvalues > Timestamp.max.value).any()):+            # check the bounds+            if (fvalues < Timestamp.min.value).any() or (+                fvalues > Timestamp.max.value
                (fvalues > Timestamp.max.value).any()

Shouldn't the any be here instead of outside the parentheses?

arw2019

comment created time in 10 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentpandas-dev/pandas

DOC: Update groupby.rst

Thanks @Nikhil1O1

Nikhil1O1

comment created time in 10 days

push eventpandas-dev/pandas

Nikhil Choudhary

commit sha ddf2f05e25ca94794e295cf173e3fbc351581a78

DOC: Update groupby.rst (#36238)

view details

push time in 10 days

PR merged pandas-dev/pandas

DOC: Update groupby.rst Docs

The previous description still had a conflict between the column name and index level

  • [x] closes #16870
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [x] whatsnew entry
+3 -5

0 comment

1 changed file

Nikhil1O1

pr closed time in 10 days

issue closedpandas-dev/pandas

(DOC) A `string` passed to `groupby` is hard to understand based on current doc

Code Sample, a copy-pastable example if possible

From Here

For DataFrame objects, a string indicating a column to be used to group. Of course 
df.groupby('A') is just syntactic sugar for df.groupby(df['A']), but 
it makes life simpler
For DataFrame objects, a string indicating an index level to be used to group.

Problem description

These two sentences are in a kind of conflict with each other, until one read until she read the note below.

Expected Output

Reword to make it clear that a string may indicate column or index level

Output of pd.show_versions()

<details> INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0.dev+193.gb2b5dc32e pytest: 3.1.2 pip: 9.0.1 setuptools: 36.0.1 Cython: 0.25.2 numpy: 1.13.0 scipy: 0.19.0 xarray: None IPython: 6.0.0 sphinx: 1.6.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: None numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None </details>

closed time in 10 days

BranYang

push eventparkdj1/pandas

jbrockmendel

commit sha a09259b5699b0f2667761356cb3b22ef43ef37f0

BUG: DataFrame.any with axis=1 and bool_only=True (#36106)

view details

Justin Essert

commit sha 6c8b923db4a5de5caf3fc05ab861e2c52b0ab8c5

BUG: instantiation using a dict with a period scalar (#35966)

view details

jbrockmendel

commit sha f8d5fba2eb2962b5bd4d65daeffecca37700ecc5

REF: share more EA methods (#36209)

view details

jbrockmendel

commit sha 49b342b90d6a8a940d260803e31a5f189a336909

CLN: simplify Categorical comparisons (#36237)

view details

Will Ayd

commit sha 9189cfd37039b05aa5e27b196ba1f2fc61da11d7

Merge remote-tracking branch 'upstream/master' into parkdj1-master

view details

Will Ayd

commit sha 4cadb0f832ba4dab8c0131102e660ea4b140a4bf

removed space

view details

push time in 10 days

pull request commentpandas-dev/pandas

[TST]: Add test for 24768

The test itself looks good just not sure on location - @jreback should this go in pandas/tests/extension/test_integer.py::TestReshaping or is that intended for something else?

phofl

comment created time in 11 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentpandas-dev/pandas

ENH: add peek function

This has been discussed previously in #18691 - not sure we want to add this to the API

Quetzalcohuatl

comment created time in 11 days

Pull request review commentpandas-dev/pandas

CLN: pandas/io/parsers.py

 def __init__(self, f, engine=None, **kwds):         self.nrows = options.pop("nrows", None)         self.squeeze = options.pop("squeeze", False) -        # might mutate self.engine-        self.engine = self._check_file_or_buffer(f, engine)+        self._check_file_or_buffer(f, engine)         self.options, self.engine = self._clean_options(options, engine)          if "has_index_names" in kwds:             self.options["has_index_names"] = kwds["has_index_names"] -        self._make_engine(self.engine)+        self._engine = self._make_engine(self.engine)

Does assigning this back to self._engine buy us anything? IIUC would be better to just rename this method _check_engine and have it raise for an invalid one without returning anything

ivanovmg

comment created time in 11 days

PullRequestReviewEvent

push eventparkdj1/pandas

Souris Ash

commit sha 767f2ab92a10e97f56575c05a8096b6290d8a516

Removed outdated examples for pd.Interval (pandas-dev#36002) (#36026)

view details

Simon Hawkins

commit sha fd6a55f0689a5e62b61e5e673619a0d754183f81

TYP: misc typing cleanup in core/indexes/multi.py (#36007) * TYP: misc typing cleanup in core/indexes/multi.py * update per comments

view details

Anshoo Rajput

commit sha 329e1c7d3333931222dc84241d8b2bc84ff8a6c8

remove trailing commas for #35925 (#36029)

view details

jbrockmendel

commit sha 303e40b378b0b3e264a9f55c7f25747cb50485ef

TYP: annotate plotting._matplotlib.misc (#36017)

view details

jbrockmendel

commit sha 6c3c695664f8567477686a610e85cd0367929208

TYP: Annotate plotting stacker (#36016)

view details

Fangchen Li

commit sha ecc5015cd2c6e7461a031d08f3672803c181ae70

TYP/CLN: cleanup `_openpyxl.py`, add type annotation #36021 (#36022)

view details

Richard Shadrach

commit sha 2a624dc674166607eea14c62ffff198148318f9a

CLN: _wrap_applied_output (#36053)

view details

jbrockmendel

commit sha 4f2251c6e6f5fb71a330d0318f96ce338671f746

REF: implement Block._replace_list (#36020)

view details

jbrockmendel

commit sha bdd5d4c13bccc8fe2c4e4444136119ad69c2efcc

BUG: PeriodIndex.get_loc incorrectly raising ValueError instead of KeyError (#36015)

view details

Simon Hawkins

commit sha 67429d4acf08f6355888879591dcffeaf9958004

CI: Unpin MyPy (#36012)

view details

jbrockmendel

commit sha 79919db18f2569151a68fa414059e0f6cbd35617

ENH: vendor typing_extensions (#36000)

view details

Erfan Nariman

commit sha 0bc407ab82f186a3f37512264570078779aa175f

Added numba as an argument (#35778)

view details

jbrockmendel

commit sha 075ed8b35880a649f6d1297770c388bd380ff1a0

REF: handle axis=None case inside DataFrame.any/all to simplify _reduce (#35899) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * REF: handle axis=None cases inside DataFrame.all/any * annotate * dummy commit to force Travis Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

view details

jbrockmendel

commit sha 20569007ea256d3126214ab845190d08f88c73a9

BUG: BlockSlider not clearing index._cache (#35937) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * BUG: BlockSlider not clearing index._cache * update whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

view details

jbrockmendel

commit sha 73c1d3269830d787c8990de8f02bf4279d2720ab

BUG: NDFrame.replace wrong exception type, wrong return when size==0 (#36045) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * BUG: NDFrame.replace wrong exception type, wrong return when size==0 * bool->bool_t * whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

view details

Jonathan Shreckengost

commit sha 1fc244f89108347804ea33c7e912f933c2aa63da

Comma cleanup for #35925 (#36058)

view details

Kaiqi Dong

commit sha 19c3d4013d41198d1fa40dbaea3140bd11e0564a

API: replace dropna=False option with na_sentinel=None in factorize (#35852) * remove \n from docstring * fix issue 17038 * revert change * revert change * add dropna doc for factorize * rephrase the doc * flake8 * fixup * use NaN * add dropna in series.factorize * black * add test * linting * linting * doct * fix black * fixup * fix doctest * add whatsnew * linting * fix test * try one time * hide dropna and use na_sentinel=None * update whatsnew * rename test function * remove dropna from factorize * update doc * docstring * update doc * add comment * code change on review * update doc * code change on review * minor move in whatsnew * add default example * doc * one more try * explicit doc * add space

view details

Simon Hawkins

commit sha 7d047c24748be6d7efaaa7054fd581922d3c1781

TYP: update setup.cfg (#36067)

view details

Simon Hawkins

commit sha 09c6124caae58b51617e723b9926e749715f3d36

TYP: statically define attributes in plotting._matplotlib.core (#36068) pandas\plotting\_matplotlib\core.py:231: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:232: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:233: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:235: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:385: error: "MPLPlot" has no attribute "label"; maybe "ylabel" or "xlabel"? [attr-defined] pandas\plotting\_matplotlib\core.py:553: error: "MPLPlot" has no attribute "mark_right" [attr-defined] pandas\plotting\_matplotlib\core.py:732: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:733: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:735: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:738: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:739: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:741: error: "MPLPlot" has no attribute "style" [attr-defined] pandas\plotting\_matplotlib\core.py:1008: error: "ScatterPlot" has no attribute "label" [attr-defined] pandas\plotting\_matplotlib\core.py:1075: error: "LinePlot" has no attribute "stacked" [attr-defined] pandas\plotting\_matplotlib\core.py:1180: error: "LinePlot" has no attribute "stacked" [attr-defined] pandas\plotting\_matplotlib\core.py:1269: error: "AreaPlot" has no attribute "stacked" [attr-defined] pandas\plotting\_matplotlib\core.py:1351: error: "BarPlot" has no attribute "stacked" [attr-defined] pandas\plotting\_matplotlib\core.py:1427: error: "BarPlot" has no attribute "stacked" [attr-defined]

view details

jbrockmendel

commit sha c41e500b2e145dcd1a07196daff14181c88fc379

BUG: frame._item_cache not cleared when Series is altered (#36051)

view details

push time in 11 days

Pull request review commentpandas-dev/pandas

#34640: CLN: remove 'private_key' and 'verbose' from gbq

 ExtensionArray  Other ^^^^^--++- Removed ``private_key`` and ``verbose`` from :func:`pandas.read_gbq` (:issue:`34654` :issue: `30200`)

Great - thanks for the callout

parkdj1

comment created time in 11 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

Fix issue #36271 to disambiguate json string

 def is_fsspec_url(url: FilePathOrBuffer) -> bool:     return (         isinstance(url, str)         and "://" in url+        and not " " in url

@martindurant

tbachlechner

comment created time in 11 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

Truncate columns list to match tr_frame for correct dict formatters lookup

 def test_to_string_with_formatters(self):         )         assert result == result2 +    def test_to_string_with_truncated_formatters(self):+        df = DataFrame(+            {+                "int": [1, 2, 3],+                "float": [1.0, 2.0, 3.0],+                "object": [(1, 2), True, False],+            },+            columns=["int", "float", "object"],+        )++        formatters = [+            ("int", lambda x: f"[1] {x}"),+            ("float", lambda x: f"[2] {x}"),+            ("object", lambda x: f"[3] {x}"),+        ]+        result = df.to_string(formatters=dict(formatters), max_cols=2)

Check out the pytest.mark.parametrize decorator - it is used by a few other tests in this module already

kesmit13

comment created time in 11 days

PullRequestReviewEvent

Pull request review commentpandas-dev/pandas

BUG: read_excel for ods files raising UnboundLocalError in certain cases

 def _get_cell_string_value(self, cell) -> str:         Find and decode OpenDocument text:s tags that represent         a run length encoded sequence of space characters.         """-        from odf.element import Element, Text+        from odf.element import Element         from odf.namespaces import TEXTNS-        from odf.text import P, S+        from odf.text import S -        text_p = P().qname         text_s = S().qname -        p = cell.childNodes[0]-         value = []-        if p.qname == text_p:-            for k, fragment in enumerate(p.childNodes):-                if isinstance(fragment, Text):-                    value.append(fragment.data)-                elif isinstance(fragment, Element):-                    if fragment.qname == text_s:-                        spaces = int(fragment.attributes.get((TEXTNS, "c"), 1))++        for fragment in cell.childNodes:+            if isinstance(fragment, Element):+                if fragment.qname == text_s:+                    spaces = int(fragment.attributes.get((TEXTNS, "c"), 1))                     value.append(" " * spaces)+                else:+                    # recursive impl needed in case of nested fragments

As a reader it isn't clear to me what this means; is this a bug that needs to be fixed upstream?

asishm

comment created time in 11 days

PullRequestReviewEvent

pull request commentpandas-dev/pandas

Value counts normalize

@DataInformer is this still active? If so can you merge master and try to get CI green?

DataInformer

comment created time in 11 days

pull request commentpandas-dev/pandas

solve "Int64 with null value mangles large-ish integers" problem

@rushabh-v can you see if you can get this green? If so someone can review

rushabh-v

comment created time in 11 days

Pull request review commentpandas-dev/pandas

Truncate columns list to match tr_frame for correct dict formatters lookup

 def test_to_string_with_formatters(self):         )         assert result == result2 +    def test_to_string_with_truncated_formatters(self):+        df = DataFrame(+            {+                "int": [1, 2, 3],+                "float": [1.0, 2.0, 3.0],+                "object": [(1, 2), True, False],+            },+            columns=["int", "float", "object"],+        )++        formatters = [+            ("int", lambda x: f"[1] {x}"),+            ("float", lambda x: f"[2] {x}"),+            ("object", lambda x: f"[3] {x}"),+        ]+        result = df.to_string(formatters=dict(formatters), max_cols=2)

Rather than this can you just parametrize the inputs?

kesmit13

comment created time in 11 days

PullRequestReviewEvent
PullRequestReviewEvent

PR closed pandas-dev/pandas

API: read_csv, to_csv line_terminator keyword inconsistency API - Consistency IO CSV
  • [x] closes #9568
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ ] whatsnew entry (pending 1.2)

I followed discussion in #9568. The aim of this PR is to allow both line_terminator and lineterminator keyword args to read_csv and to_csv but only document line_terminator. As per @jorisvandenbossche's suggestion we preserve lineterminator to stay compatible with csv dialects.

+85 -11

17 comments

5 changed files

arw2019

pr closed time in 11 days

pull request commentpandas-dev/pandas

API: read_csv, to_csv line_terminator keyword inconsistency

Closing this PR as I think still needs some discussion. If a point of interest please comment back on the original issue to get alignment on design approach

arw2019

comment created time in 11 days

PR closed pandas-dev/pandas

On traces of #33982

To pay attention on https://github.com/pandas-dev/pandas/pull/33982#issuecomment-643246205. Sorry, I don't know another way now.

/cc @mproszewska , @jreback

Thanks.

+1 -1

4 comments

1 changed file

kuraga

pr closed time in 11 days

pull request commentpandas-dev/pandas

On traces of #33982

Closing as its not clear what this is or solves; if you can clarify further and add a test case can certainly reopen

kuraga

comment created time in 11 days

pull request commentpandas-dev/pandas

CLN: replaced Appender with doc

@smartvinnetou can you fix conflicts?

smartvinnetou

comment created time in 11 days

pull request commentpandas-dev/pandas

Coding Style Guidline.rst

@Stockfoot still active? If so can you simplify and address comments?

Stockfoot

comment created time in 11 days

pull request commentpandas-dev/pandas

ENH:column-wise DataFrame.fillna and duplicated DataFrame.fillna with Series and Dict

@proost can you fix merge conflict and try to fix CI failure?

proost

comment created time in 11 days

pull request commentpandas-dev/pandas

Added test test_datetimeField_after_setitem for issue #6942

@anirudnits can you merge master and fix conflicts?

anirudnits

comment created time in 11 days

pull request commentpandas-dev/pandas

[WIP] PERF: pd.to_datetime, unit='s' much slower for float64 than for int64

@arw2019 is this still active?

arw2019

comment created time in 11 days

pull request commentpandas-dev/pandas

Document Tips for Debugging C Extensions

Any other comments here? Otherwise plan on merging in a few days

WillAyd

comment created time in 11 days

pull request commentpandas-dev/pandas

BUG: Check for duplicate names columns and index in crosstab

@cuchoi can you fix the merge conflict and see if you can get CI green?

cuchoi

comment created time in 11 days

pull request commentpandas-dev/pandas

ENH: Fix `by` in .plot.hist

@charlesdong1991 still active? Can you move the note?

charlesdong1991

comment created time in 11 days

PR closed pandas-dev/pandas

Reviewers
[#16737] Index type for Series with empty data Deprecate Dtypes Index
  • [x] closes #16737
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [x] whatsnew entry

I picked up all the notes from #16737 where it was suggested to use Index over RangeIndex for empty data.

+373 -145

16 comments

64 changed files

SaturnFromTitan

pr closed time in 11 days

pull request commentpandas-dev/pandas

[#16737] Index type for Series with empty data

Closing as I think stale but ping @SaturnFromTitan if you'd like to pick back up and can address merge conflicts

SaturnFromTitan

comment created time in 11 days

PR closed pandas-dev/pandas

Series repr html only IO HTML
  • [x] closes #5563
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [x] whatsnew entry
+667 -84

15 comments

7 changed files

big-o

pr closed time in 11 days

pull request commentpandas-dev/pandas

Series repr html only

Closing as I think stale but ping if you'd like to give the suggested design approach a shot

big-o

comment created time in 11 days

pull request commentpandas-dev/pandas

DEPR: DataFrame.lookup

@erfannariman can you move the note to 1.2?

erfannariman

comment created time in 11 days

PR closed pandas-dev/pandas

ENH: Add NDFrame.format for easier conversion to string dtype API Design Output-Formatting
  • [x] closes #17211
  • [x] tests added / passed
  • [x] passes black pandas
  • [x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [x] whatsnew entry

This adds a format method to DataFrame and Series. This is useful for data transformation.

This method allows/makes it easier to do more complex conversion from arbitrary dtypes to string series, including combining several columns in a DataFrame to make the string series. For example we can now do this conversion quite easily:

>>> df = pd.DataFrame({
...     'state_name': ['California', 'Texas', 'Florida'],
...     'state_abbreviation': ['CA', 'TX', 'FL'],
...     'population': [39_512_223, 28_995_881, 21_477_737],
...     }, index=[1, 2, 3])
>>> df
   state_name state_abbreviation  population
1  California                 CA    39512223
2       Texas                 TX    28995881
3     Florida                 FL    21477737

>>>  df.format("{state_name:<10} ({state_abbreviation}): {population:,}")
1    California (CA): 39,512,223
2    Texas      (TX): 28,995,881
3    Florida    (FL): 21,477,737
dtype: string

I still need to update text.rst, but would like feedback on this first, as this is a bit different than discussed in #17211. In that issue we e.g. only discussed a format method for Series, while this also adds it for DataFrame. In #17211 I also aired the idea of allowing series methods in the format string. I think that is technically quite difficult, so is not part of this PR.

+295 -1

5 comments

7 changed files

topper-123

pr closed time in 11 days

pull request commentpandas-dev/pandas

ENH: Add NDFrame.format for easier conversion to string dtype

Closing as I think this is stale but ping @topper-123 if you'd like to pick back up

topper-123

comment created time in 11 days

more