profile
viewpoint

energy-modelling-toolkit/Dispa-SET 51

The Dispa-SET unit-commitment and optimal dispatch model, developed at the JRC

flowirtz/complat 1

#eehack2017 winner for "most convincing innovative solution for the challenge 'Making energy consumption data actionable for industrial managers'"

matzech/labelme 1

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

matzech/segmentation_models 1

Segmentation models with pretrained backbones. Keras and TensorFlow Keras.

matzech/100-times-faster-nlp 0

🚀100 Times Faster Natural Language Processing in Python - iPython notebook

matzech/atlite 0

Atlite: Light-weight version of Aarhus RE Atlas for converting weather data to power systems data

matzech/celery-dashboard 0

A dashboard to monitor your celery app

matzech/datasets 0

Datasets used in Plotly examples and documentation

startedjhuckaby/Cronicle

started time in 3 days

startedjump-dev/Pavito.jl

started time in 5 days

created repositoryguidocioni/gfs

created time in 10 days

startedmicrosoft/UST

started time in 11 days

push eventjhamman/scikit-downscale

Joseph Hamman

commit sha 1064a598db9452696c914e9eef33bd355b4072b8

small comment fix

view details

push time in 11 days

pull request commentjhamman/scikit-downscale

Feature/add spatial disaggregation recipe

@dgergel - I just merged some changes in that should fix the CI for you. Go ahead and pull those in and then I'll take a look here.

dgergel

comment created time in 11 days

issue closedjhamman/scikit-downscale

importError

Dear J.hamman: When I run the command "from utils import get_sample_data"from the file“2020ECAHM-scikit-downscale.ipynb”,I got an error as the following: ImportError: cannot import name 'get_sample_data' from 'utils' (/root/anaconda3/lib/python3.7/site-packages/utils/init.py)

why? I had installed the module 'utils' by the command ‘pip install utils’. I'm so confused. Help me!

closed time in 11 days

Damonfruit

push eventjhamman/scikit-downscale

Joseph Hamman

commit sha cc8d6fe995fdb390007f0b72f929b9e06d6dca25

initial commit of arrm feature

view details

Joseph Hamman

commit sha c3fa80947194aa644b07b367a8f135858cac9783

Merge branch 'master' of github.com:jhamman/scikit-downscale into feature/arrm

view details

Joe Hamman

commit sha c15fc5033e690c0dd4cb1cbee76972869953d219

Merge remote-tracking branch 'origin/feature/arrm' into feature/arrm

view details

Joe Hamman

commit sha 88409113ba755e417bb3b20bdec27452f661520e

add draft of QuantileMappingReressor + minor fixes to arrm model

view details

Joseph Hamman

commit sha e6e2b3c421d25eb65e3c3f5149b3c56cd933e7e3

Merge branch 'master' of github.com:jhamman/scikit-downscale into feature/arrm

view details

Joseph Hamman

commit sha d45ff6df81aebe17997ea4841e88ed57383e659c

Merge branch 'feature/arrm' of github.com:jhamman/scikit-downscale into feature/arrm

view details

Joseph Hamman

commit sha 5f21d9da879c5156eb8b398a14e962a89b35bbce

udpate quantile mapping regressor

view details

Joseph Hamman

commit sha b1a387357aefdc5560ad5c9f7e99899c04032911

update to sklearn 0.24

view details

Joseph Hamman

commit sha 98115767f04fc645daa7538029b6f248dc9877cf

update docs

view details

Joe Hamman

commit sha 779fa501f699278eba964ad1b29feb447217e70e

Merge pull request #42 from jhamman/feature/arrm Add asynchronous regional regression model

view details

push time in 11 days

PR merged jhamman/scikit-downscale

Add asynchronous regional regression model

This PR adds the asynchronous regional regression model to the pointwise models from Stoner et 2012.

The first commit only includes the piecewise linear regression and ARRM breakpoint components. I'll follow up soon with the rest of the model.

+1094 -215

1 comment

18 changed files

jhamman

pr closed time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))

It is similar but this is only being applied to the X data in the predict method.

jhamman

comment created time in 11 days

push eventjhamman/scikit-downscale

Joseph Hamman

commit sha 98115767f04fc645daa7538029b6f248dc9877cf

update docs

view details

push time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):+        self._X_mean_fit = X.mean()+        self._y_mean_fit = y.mean()++        y_trend = LinearTrendTransformer()+        y_detrend = y_trend.fit_transform(y)++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        self.qm_estimator.fit(x_detrend, y_detrend)++    def predict(self, X):++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        y_hat = self.qm_estimator.predict(x_detrend).reshape(-1, 1)++        # add the trend back+        # slope from X (predict)+        # delta: X (predict) - X (fit) + y+        trendline = X_trend.lr_model_.coef_[0, 0] * np.arange(len(y_hat)).reshape(-1, 1)

Just the slope. cleaned up and added comments.

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]

Yes, they are sorted by definition.

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)

Good question. But yes, this is actually the right place to do this. The model doesn't see any data before fit() so all data validation must go here before the rest of the fit steps.

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):+        self._X_mean_fit = X.mean()+        self._y_mean_fit = y.mean()++        y_trend = LinearTrendTransformer()+        y_detrend = y_trend.fit_transform(y)++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        self.qm_estimator.fit(x_detrend, y_detrend)++    def predict(self, X):++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        y_hat = self.qm_estimator.predict(x_detrend).reshape(-1, 1)++        # add the trend back+        # slope from X (predict)+        # delta: X (predict) - X (fit) + y+        trendline = X_trend.lr_model_.coef_[0, 0] * np.arange(len(y_hat)).reshape(-1, 1)

or just the intercept?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):+        self._X_mean_fit = X.mean()+        self._y_mean_fit = y.mean()++        y_trend = LinearTrendTransformer()+        y_detrend = y_trend.fit_transform(y)++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        self.qm_estimator.fit(x_detrend, y_detrend)++    def predict(self, X):++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        y_hat = self.qm_estimator.predict(x_detrend).reshape(-1, 1)++        # add the trend back+        # slope from X (predict)+        # delta: X (predict) - X (fit) + y+        trendline = X_trend.lr_model_.coef_[0, 0] * np.arange(len(y_hat)).reshape(-1, 1)+        y_hat += trendline - trendline.mean() + self._y_mean_fit + (X.mean() - self._X_mean_fit)

This line could use a comment- a lot going on and honestly I'm a little confused reading it. Maybe some variable definitions elsewhere could fix it all? ''' create the appropriately trending biascorrected values add back in (1) the trendline while (2) subtracting the mean of that trendline to center the trend for the timeseries. Add that to (3) the quantile mapped (but detrended) data and then add in (4) the anomalies from ______ <--- okay that's where I get lost!

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"

oh maybe not - i guess above it's just setting it up and below you're filling those values

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):+        self._X_mean_fit = X.mean()+        self._y_mean_fit = y.mean()++        y_trend = LinearTrendTransformer()+        y_detrend = y_trend.fit_transform(y)++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        self.qm_estimator.fit(x_detrend, y_detrend)++    def predict(self, X):++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        y_hat = self.qm_estimator.predict(x_detrend).reshape(-1, 1)++        # add the trend back+        # slope from X (predict)+        # delta: X (predict) - X (fit) + y+        trendline = X_trend.lr_model_.coef_[0, 0] * np.arange(len(y_hat)).reshape(-1, 1)

is .coef_[0, 0] the intercept and the slope?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):

I'll do the same thing here (offer up some explanatory comments to add to the code but also a test of my understanding) ''' Detrend the two input datasets then create the quantile mapping between the two detrended datasets. '''

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(+        self, data, sort=True, extrapolate=None, pp_min=SYNTHETIC_MIN, pp_max=SYNTHETIC_MAX+    ):++        n = len(data)++        # plotting positions+        pp = np.empty(n + 2)+        pp[1:-1] = plotting_positions(n)++        # extended data values (sorted)+        if data.ndim == 2:+            data = data[:, 0]+        if sort:+            data = np.sort(data)+        vals = np.full(n + 2, np.nan)+        vals[1:-1] = data+        vals[0] = data[0]+        vals[-1] = data[-1]++        # Add endpoints to the vector of plotting positions+        if extrapolate in [None, '1to1']:+            pp[0] = pp[1]+            pp[-1] = pp[-2]+        elif extrapolate == 'both':+            pp[0] = pp_min+            pp[-1] = pp_max+        elif extrapolate == 'max':+            pp[0] = pp[1]+            pp[-1] = pp_max+        elif extrapolate == 'min':+            pp[0] = pp_min+            pp[-1] = pp[-2]+        else:+            raise ValueError('unknown value for extrapolate: %s' % extrapolate)++        if extrapolate in ['min', 'max', 'both']:++            model = LinearRegression()++            # extrapolate lower end point+            if extrapolate in ['min', 'both']:+                s = slice(1, self.n_endpoints + 1)+                # fit linear model to first n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[0]+                vals[0] = model.predict(pp[0].reshape(-1, 1))++            # extrapolate upper end point+            if extrapolate in ['max', 'both']:+                s = slice(-self.n_endpoints - 1, -1)+                # fit linear model to last n_endpoints+                model.fit(pp[s].reshape(-1, 1), vals[s].reshape(-1, 1))+                # calculate the data value pp[-1]+                vals[-1] = model.predict(pp[-1].reshape(-1, 1))++        return Cdf(pp, vals)++    def _more_tags(self):+        return {+            '_xfail_checks': {+                'check_estimators_dtypes': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_score_takes_y': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_fit_returns_self(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_dtype_object': 'QuantileMappingReressor only suppers 1 feature',+                'check_pipeline_consistency': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_nan_inf': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_overwrite_params': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_pickle': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_predict1d': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_subset_invariance': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit2d_1sample': 'QuantileMappingReressor only suppers 1 feature',+                'check_dict_unchanged': 'QuantileMappingReressor only suppers 1 feature',+                'check_dont_overwrite_parameters': 'QuantileMappingReressor only suppers 1 feature',+                'check_fit_idempotent': 'QuantileMappingReressor only suppers 1 feature',+                'check_n_features_in': 'QuantileMappingReressor only suppers 1 feature',+                'check_estimators_empty_data_messages': 'skip due to odd sklearn string matching in unit test',+                'check_regressors_train': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_train(readonly_memmap=True,X_dtype=float32)': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressor_data_not_an_array': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_no_decision_function': 'QuantileMappingReressor only suppers 1 feature',+                'check_supervised_y_2d': 'QuantileMappingReressor only suppers 1 feature',+                'check_regressors_int': 'QuantileMappingReressor only suppers 1 feature',+                'check_methods_sample_order_invariance': 'QuantileMappingReressor only suppers 1 feature',+            },+        }+++class TrendAwareQuantileMappingRegressor(RegressorMixin, BaseEstimator):+    def __init__(self, qm_estimator=None):+        self.qm_estimator = qm_estimator++    def fit(self, X, y):+        self._X_mean_fit = X.mean()+        self._y_mean_fit = y.mean()++        y_trend = LinearTrendTransformer()+        y_detrend = y_trend.fit_transform(y)++        X_trend = LinearTrendTransformer()+        x_detrend = X_trend.fit_transform(X)++        self.qm_estimator.fit(x_detrend, y_detrend)++    def predict(self, X):+

''' Detrend the input dataset, then given that detrended dataset, take the fitted quantile mapper from fit() and create an estimated y_hat. Then add the trend back from the reference dataset to create your final bias-corrected timeseries. '''

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]

self._X_cdf and self._y_cdf are definitely sorted, right? might it be good to do min/max instead of select based upon indexing? I guess it's faster the indexing way.

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the

what is "like"?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"

is this comment not more appropriately placed in line 172?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))

I'm a little unclear on how this is different from what happens in lines 285-298?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)

Should all these checks be done before running the fit? like in a separate test? Or is this the best place to go? (This isn't a suggestion - it's really just curiosity. Perhaps these check_array uses aren't formal tests, they're just informal?

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(+            X, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=True+        )+        y = check_array(+            y, dtype='numeric', ensure_min_samples=2 * self.n_endpoints + 1, ensure_2d=False+        )++        X = check_max_features(X, n=1)++        self._X_cdf = self._calc_extrapolated_cdf(X, sort=True, extrapolate=self.extrapolate)+        self._y_cdf = self._calc_extrapolated_cdf(y, sort=True, extrapolate=self.extrapolate)++        return self++    def predict(self, X, **kwargs):++        check_is_fitted(self, self._fit_attributes)+        X = check_array(X, ensure_2d=True)++        X = X[:, 0]++        sort_inds = np.argsort(X)++        X_cdf = self._calc_extrapolated_cdf(X[sort_inds], sort=False, extrapolate=self.extrapolate)++        left = -np.inf if self.extrapolate in ['min', 'both'] else None+        right = np.inf if self.extrapolate in ['max', 'both'] else None+        X_cdf.pp[:] = np.interp(+            X_cdf.vals, self._X_cdf.vals, self._X_cdf.pp, left=left, right=right+        )++        # Extrapolate the tails beyond 1.0 to handle "new extremes"+        if np.isinf(X_cdf.pp).any():+            lower_inds = np.nonzero(-np.inf == X_cdf.pp)[0]+            upper_inds = np.nonzero(np.inf == X_cdf.pp)[0]+            model = LinearRegression()+            if len(lower_inds):+                s = slice(lower_inds[-1] + 1, lower_inds[-1] + 1 + self.n_endpoints)+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[lower_inds] = model.predict(X_cdf.vals[lower_inds].reshape(-1, 1))+            if len(upper_inds):+                s = slice(upper_inds[0] - self.n_endpoints, upper_inds[0])+                model.fit(X_cdf.pp[s].reshape(-1, 1), X_cdf.vals[s].reshape(-1, 1))+                X_cdf.pp[upper_inds] = model.predict(X_cdf.vals[upper_inds].reshape(-1, 1))++        # do the full quantile mapping+        y_hat = np.full_like(X, np.nan)+        y_hat[sort_inds] = np.interp(X_cdf.pp, self._y_cdf.pp, self._y_cdf.vals)[1:-1]++        # If extrapolate is 1to1, apply the offset between ref and like to the+        # tails of y_hat+        if self.extrapolate == '1to1':+            X_fit_len = len(self._X_cdf.vals)+            X_fit_min = self._X_cdf.vals[0]+            X_fit_max = self._X_cdf.vals[-1]++            y_fit_len = len(self._y_cdf.vals)+            y_fit_min = self._y_cdf.vals[0]+            y_fit_max = self._y_cdf.vals[-1]++            # adjust values over fit max+            inds = X > X_fit_max+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_max)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_max = np.interp(+                        self._y_cdf.pp[-1], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = y_fit_max + (X[inds] - X_fit_at_y_fit_max)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_max = np.interp(+                        self._X_cdf.pp[-1], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_max + (X[inds] - X_fit_max)++            # adjust values under fit min+            inds = X < X_fit_min+            if inds.any():+                if X_fit_len == y_fit_len:+                    y_hat[inds] = y_fit_min + (X[inds] - X_fit_min)+                elif X_fit_len > y_fit_len:+                    X_fit_at_y_fit_min = np.interp(+                        self._y_cdf.pp[0], self._X_cdf.pp, self._X_cdf.vals+                    )+                    y_hat[inds] = X_fit_min + (X[inds] - X_fit_at_y_fit_min)+                elif X_fit_len < y_fit_len:+                    y_fit_at_X_fit_min = np.interp(+                        self._X_cdf.pp[0], self._y_cdf.pp, self._y_cdf.vals+                    )+                    y_hat[inds] = y_fit_at_X_fit_min + (X[inds] - X_fit_min)++        return y_hat++    def _calc_extrapolated_cdf(

brief comment describing what this does? I'll take a stab at it to save you time and maybe also can be a test of my own understanding: ''' This function takes the tails of the cdf and extrapolates them out further via a linear model using the specified number of endpoints n_endpoints. '''

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints++        if self.n_endpoints < 2:+            raise ValueError('Invalid number of n_endpoints, must be >= 2')++    def fit(self, X, y, **kwargs):+        X = check_array(

didn't know about this function- I like it.

jhamman

comment created time in 11 days

Pull request review commentjhamman/scikit-downscale

Add asynchronous regional regression model

+import collections++import numpy as np+from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin+from sklearn.linear_model import LinearRegression+from sklearn.preprocessing import QuantileTransformer, quantile_transform+from sklearn.utils import check_array+from sklearn.utils.validation import check_is_fitted++from .trend import LinearTrendTransformer+from .utils import check_max_features, default_none_kwargs++SYNTHETIC_MIN = -1e20+SYNTHETIC_MAX = 1e20++Cdf = collections.namedtuple('CDF', ['pp', 'vals'])+++def plotting_positions(n, alpha=0.4, beta=0.4):+    '''Returns a monotonic array of plotting positions.++    Parameters+    ----------+    n : int+        Length of plotting positions to return.+    alpha, beta : float+        Plotting positions parameter. Default is 0.4.++    Returns+    -------+    positions : ndarray+        Quantile mapped data with shape from `input_data` and probability+        distribution from `data_to_match`.++    See Also+    --------+    scipy.stats.mstats.plotting_positions+    '''+    return (np.arange(1, n + 1) - alpha) / (n + 1.0 - alpha - beta)+++class QuantileMapper(TransformerMixin, BaseEstimator):+    """Transform features using quantile mapping.++    Parameters+    ----------+    detrend : boolean, optional+        If True, detrend the data before quantile mapping and add the trend+        back after transforming. Default is False.+    lt_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the LinearTrendTransformer+    qm_kwargs : dict, optional+        Dictionary of keyword arguments to pass to the QuantileMapper++    Attributes+    ----------+    x_cdf_fit_ : QuantileTransformer+        QuantileTranform for fit(X)+    """++    def __init__(self, detrend=False, lt_kwargs=None, qt_kwargs=None):++        self.detrend = detrend+        self.lt_kwargs = lt_kwargs+        self.qt_kwargs = qt_kwargs++    def fit(self, X, y=None):+        """Fit the quantile mapping model.++        Parameters+        ----------+        X : array-like, shape  [n_samples, n_features]+            Training data.+        """+        # TO-DO: fix validate data fctn+        X = self._validate_data(X)++        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)++        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        # maybe detrend the input datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_to_cdf = LinearTrendTransformer(**lt_kwargs).fit_transform(X)+        else:+            x_to_cdf = X++        # calculate the cdfs for X+        # TODO: replace this transformer with something that uses robust+        # empirical cdf plotting positions+        qt = QuantileTransformer(**qt_kws)++        self.x_cdf_fit_ = qt.fit(x_to_cdf)++        return self++    def transform(self, X):+        """Perform the quantile mapping.++        Parameters+        ----------+        X : array_like, shape [n_samples, n_features]+            Samples.+        """+        # validate input data+        check_is_fitted(self)+        # TO-DO: fix validate_data fctn+        X = self._validate_data(X)++        # maybe detrend the datasets+        if self.detrend:+            lt_kwargs = default_none_kwargs(self.lt_kwargs)+            x_trend = LinearTrendTransformer(**lt_kwargs).fit(X)+            x_to_cdf = x_trend.transform(X)+        else:+            x_to_cdf = X++        # do the final mapping+        qt_kws = default_none_kwargs(self.qt_kwargs, copy=True)+        if 'n_quantiles' not in qt_kws:+            qt_kws['n_quantiles'] = len(X)++        x_quantiles = quantile_transform(x_to_cdf, copy=True, **qt_kws)+        x_qmapped = self.x_cdf_fit_.inverse_transform(x_quantiles)++        # add the trend back+        if self.detrend:+            x_qmapped = x_trend.inverse_transform(x_qmapped)++        return x_qmapped++    def _more_tags(self):+        return {'_xfail_checks': {'check_methods_subset_invariance': 'because'}}+++class QuantileMappingReressor(RegressorMixin, BaseEstimator):++    _fit_attributes = ['_X_cdf', '_y_cdf']++    def __init__(self, extrapolate=None, n_endpoints=10):+        self.extrapolate = extrapolate+        self.n_endpoints = n_endpoints

maybe some documentation/explanation about what endpoints are?

jhamman

comment created time in 11 days

push eventjhamman/scikit-downscale

Joseph Hamman

commit sha b1a387357aefdc5560ad5c9f7e99899c04032911

update to sklearn 0.24

view details

push time in 11 days

push eventjhamman/scikit-downscale

Joseph Hamman

commit sha 5f21d9da879c5156eb8b398a14e962a89b35bbce

udpate quantile mapping regressor

view details

push time in 11 days

more