profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/rtyler/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

mitchellh/vagrant-aws 2610

Use Vagrant to manage your EC2 and VPC instances.

ninjudd/drip 1520

Fast JVM launching without the hassle of persistent JVMs.

cheetahtemplate/cheetah 134

Cheetah, the Python-powered templating engine

jenkins-infra/evergreen 95

An automatically self-updating Jenkins distribution

pzim/reaktor 91

Reaktor is a modular post-receive hook designed to work with r10k

reiseburo/hotdog 37

Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.

jenkinsci/jenkins-charm 15

Juju charm to deploy and scale Jenkins

reiseburo/offtopic 13

Offtopic is a simple web application built with Ratpack for inspecting and consuming events from Kafka

CodeValet/codevalet 12

Radically transparent

rtyler/ada-playground 11

Collection of experiments with the Ada language

push eventdelta-incubator/riverbank

Kris Geusebroek

commit sha 83214689bbf391e7d299aae077a849c74302bce8

Fix broken stuff. Add Dockerfile and docker-compose for running the riverbank server

view details

R. Tyler Croy

commit sha 975ce3400bbf7f50019324ebb532112d32ebb169

fmt

view details

R. Tyler Croy

commit sha 50fa313278289aec79f29fef2994fc3bd4838254

Merge pull request #5 from krisgeus/kg-docker-based-server Fix broken stuff. Add Dockerfile ...

view details

push time in 3 days

PR merged delta-incubator/riverbank

Fix broken stuff. Add Dockerfile ...

and docker-compose for running the riverbank server

Solves #4

+199 -19

0 comment

10 changed files

krisgeus

pr closed time in 3 days

PullRequestReviewEvent

push eventkrisgeus/riverbank

R. Tyler Croy

commit sha 975ce3400bbf7f50019324ebb532112d32ebb169

fmt

view details

push time in 3 days

PullRequestReviewEvent

push eventjenkins-infra/repository-permissions-updater

Mark Waite

commit sha 1d62cf2bee7555da6af42b747c9d7181192429cc

Disallow releases to artifactory (#2068) co-authored-by: rtyler@brokenco.de

view details

push time in 21 days

PR merged jenkins-infra/repository-permissions-updater

Disallow releases to artifactory

Description

Block releases

co-authored-by: rtyler@brokenco.de

+1 -1

0 comment

1 changed file

MarkEWaite

pr closed time in 21 days

PullRequestReviewEvent

PR closed delta-io/delta-rs

add MANIFEST.in enhancement binding/python

This includes the license in the build (cc. https://github.com/conda-forge/staged-recipes/pull/16045)

+1 -0

9 comments

1 changed file

raybellwaves

pr closed time in 21 days

pull request commentdelta-io/delta-rs

add MANIFEST.in

Superseded by #422

raybellwaves

comment created time in 21 days

PullRequestReviewEvent

pull request commentdelta-io/delta-rs

add MANIFEST.in

Thanks for all the details! Considering the packaging of wheels in python/ using pyo3 I'm actually not sure whether this will solve the desired problem. Perhaps @fvaleye has thoughts to share here

raybellwaves

comment created time in 23 days

pull request commentdelta-io/delta-rs

Read a DeltaTable using a Data Catalog

Interesting!

fvaleye

comment created time in 23 days

pull request commentdelta-io/delta-rs

add MANIFEST.in

@raybellwaves would you please add some comments into the file about its purpose/etc? As somebody who barely understands what or why conda is, it would be helpful to understand the purpose before we merge this small addition :smile:

raybellwaves

comment created time in 23 days

PullRequestReviewEvent

push eventdelta-io/delta-rs

Brandon Ogle

commit sha c64e311dfe5f9af25e53d76771b831c87b393b82

BUGFIX: writes to gcs must include the content length header

view details

Brandon Ogle

commit sha 5f8f44e92cd827ff9dcf4a1378e40eaa38187214

BUGFIX: gcs rewrite already removes the src file, no need to subsequently delete it

view details

Brandon Ogle

commit sha ba3e7ea53e10143fe723ebdc6af9a88c87903984

Revert "BUGFIX: gcs rewrite already removes the src file, no need to subsequently delete it" This reverts commit 888fe107231f165450e6ef65d6949c2ddd5b2e06.

view details

push time in 24 days

PR merged delta-io/delta-rs

Reviewers
Gcs writer bugs binding/rust

Description

After putting together some code to actually write into gcs, I had encountered a few issues.

  • Content Length header must be included on POST requests, else a 411 status is returned
  • ~The storage backend rename method was attempting to cleanup the src file, but it turns out this is unnecessary`~ EDIT: nope that delete in rename is needed
+1 -0

1 comment

1 changed file

blogle

pr closed time in 24 days

PullRequestReviewEvent

issue commentnextcloud/calendar

Date picker partly hidden behind calendar view, invisible UI elements

Unfortunately I can also confirm that 2.3.3 in Firefox 91.0.1 presents the same behavior as before :cry:

TheRealCrusher

comment created time in 24 days

Pull request review commentdelta-io/delta-rs

Add S3StorageOptions to allow configuring S3 backend explicitly

 use uuid::Uuid;  pub mod dynamodb_lock; -const AWS_S3_ASSUME_ROLE_ARN: &str = "AWS_S3_ASSUME_ROLE_ARN";-const AWS_S3_ROLE_SESSION_NAME: &str = "AWS_S3_ROLE_SESSION_NAME";+const AWS_ENDPOINT_URL: &str = "AWS_ENDPOINT_URL"; const AWS_WEB_IDENTITY_TOKEN_FILE: &str = "AWS_WEB_IDENTITY_TOKEN_FILE";

AWS_ENDPOINT_URL will be required for anybody working with an S3 compatible API but not using S3 directly, which is why I think it should be included. Yes we're using it for localstack, but excluding it from the options limits the potential for users to use non-S3 storage providers with this code :shrug:

xianwill

comment created time in 24 days

PullRequestReviewEvent

pull request commentrtyler/zap

Build static MUSL target with vendored openssl

@sempervictus thanks for the pull request, would going full rustls be more useful? (not that I have any problems with the pull request as written, just curious)

sempervictus

comment created time in 25 days

Pull request review commentdelta-io/kafka-delta-ingest

[WIP] Add DeadLetterQueue to handle bad messages and failed parquet writes

+use async_trait::async_trait;+use chrono::prelude::*;+use core::fmt::Debug;+use log::{error, info};+use parquet::errors::ParquetError;+use serde::{Deserialize, Serialize};+use serde_json::Value;++use crate::{deltalake_ext::*, transforms::TransformError};++#[derive(Clone, Debug, Serialize, Deserialize)]+pub struct DeadLetter {+    pub base64_bytes: Option<String>,+    pub json_string: Option<String>,+    pub error: Option<String>,+    pub timestamp: String,+    pub date: String,+}++impl DeadLetter {+    pub fn from_failed_deserialization(bytes: &[u8], err: serde_json::Error) -> Self {+        let timestamp = Utc::now();+        Self {+            base64_bytes: Some(base64::encode(bytes)),+            json_string: None,+            error: Some(err.to_string()),+            timestamp: timestamp.to_rfc3339(),+            date: timestamp.date().to_string(),+        }+    }++    pub fn from_failed_transform(value: &Value, err: TransformError) -> Self {+        let timestamp = Utc::now();+        match serde_json::to_string(value) {+            Ok(s) => Self {+                base64_bytes: None,+                json_string: Some(s),+                error: Some(err.to_string()),+                timestamp: timestamp.to_rfc3339(),+                date: timestamp.date().to_string(),+            },+            _ => unreachable!(),+        }+    }++    pub fn from_failed_parquet_row(value: &Value, err: ParquetError) -> Self {+        let timestamp = Utc::now();+        match serde_json::to_string(value) {+            Ok(s) => Self {+                base64_bytes: None,+                json_string: Some(s),+                error: Some(err.to_string()),+                timestamp: timestamp.to_rfc3339(),+                date: timestamp.date().to_string(),+            },+            _ => unreachable!(),+        }+    }+}++#[derive(thiserror::Error, Debug)]+pub enum DeadLetterQueueError {+    #[error("JSON serialization failed: {source}")]+    SerdeJson {+        #[from]+        source: serde_json::Error,+    },++    #[error("Delta write failed: {source}")]+    DeltaWriter {+        #[from]+        source: DeltaWriterError,+    },+}++pub struct DeadLetterQueueOptions {+    /// Table URI of the delta table to write dead letters to. Implies usage of the DeltaSinkDeadLetterQueue.+    pub delta_table_uri: Option<String>,+}++#[async_trait]+pub trait DeadLetterQueue: Send + Sync {+    async fn write_dead_letter(+        &mut self,+        dead_letter: DeadLetter,+    ) -> Result<(), DeadLetterQueueError> {+        self.write_dead_letters(vec![dead_letter]).await+    }++    async fn write_dead_letters(+        &mut self,+        dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError>;+}++pub async fn dlq_from_opts(+    options: DeadLetterQueueOptions,+) -> Result<Box<dyn DeadLetterQueue>, DeadLetterQueueError> {+    if let Some(table_uri) = options.delta_table_uri {+        Ok(Box::new(+            DeltaSinkDeadLetterQueue::for_table_uri(table_uri.as_str()).await?,+        ))+    } else {+        Ok(Box::new(NoopDeadLetterQueue {}))+    }+}++pub struct NoopDeadLetterQueue {}++#[async_trait]+impl DeadLetterQueue for NoopDeadLetterQueue {+    async fn write_dead_letters(+        &mut self,+        _dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError> {+        // noop+        Ok(())+    }+}++pub struct LoggingDeadLetterQueue {}++#[async_trait]+impl DeadLetterQueue for LoggingDeadLetterQueue {+    async fn write_dead_letters(+        &mut self,+        dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError> {+        for dead_letter in dead_letters {+            info!("DeadLetter: {:?}", dead_letter);+        }++        Ok(())+    }+}++pub struct DeltaSinkDeadLetterQueue {+    delta_writer: DeltaWriter,+}++impl DeltaSinkDeadLetterQueue {+    pub async fn for_table_uri(table_uri: &str) -> Result<Self, DeadLetterQueueError> {+        Ok(Self {+            delta_writer: DeltaWriter::for_table_path(table_uri).await?,+        })+    }+}++#[async_trait]+impl DeadLetterQueue for DeltaSinkDeadLetterQueue {+    async fn write_dead_letters(+        &mut self,+        dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError> {+        let values: Result<Vec<Value>, _> = dead_letters+            .iter()+            .map(|dl| serde_json::to_value(dl))+            .collect();+        let values = values?;++        info!("Starting insert_all");+        let version = self.delta_writer.insert_all(values).await?;++        // TODO: take opt for checkpoint creation+        if version % 10 == 0 {+            // TODO: create checkpoint on every 10th version

I presume this TODO is going to be addressed inside of this change?

But also, doesn't the delta writer handle that for us already?

xianwill

comment created time in 25 days

Pull request review commentdelta-io/kafka-delta-ingest

[WIP] Add DeadLetterQueue to handle bad messages and failed parquet writes

+use async_trait::async_trait;+use chrono::prelude::*;+use core::fmt::Debug;+use log::{error, info};+use parquet::errors::ParquetError;+use serde::{Deserialize, Serialize};+use serde_json::Value;++use crate::{deltalake_ext::*, transforms::TransformError};++#[derive(Clone, Debug, Serialize, Deserialize)]+pub struct DeadLetter {+    pub base64_bytes: Option<String>,+    pub json_string: Option<String>,+    pub error: Option<String>,+    pub timestamp: String,+    pub date: String,+}++impl DeadLetter {+    pub fn from_failed_deserialization(bytes: &[u8], err: serde_json::Error) -> Self {+        let timestamp = Utc::now();+        Self {+            base64_bytes: Some(base64::encode(bytes)),+            json_string: None,+            error: Some(err.to_string()),+            timestamp: timestamp.to_rfc3339(),+            date: timestamp.date().to_string(),+        }+    }++    pub fn from_failed_transform(value: &Value, err: TransformError) -> Self {+        let timestamp = Utc::now();+        match serde_json::to_string(value) {+            Ok(s) => Self {+                base64_bytes: None,+                json_string: Some(s),+                error: Some(err.to_string()),+                timestamp: timestamp.to_rfc3339(),+                date: timestamp.date().to_string(),+            },+            _ => unreachable!(),+        }+    }++    pub fn from_failed_parquet_row(value: &Value, err: ParquetError) -> Self {+        let timestamp = Utc::now();+        match serde_json::to_string(value) {+            Ok(s) => Self {+                base64_bytes: None,+                json_string: Some(s),+                error: Some(err.to_string()),+                timestamp: timestamp.to_rfc3339(),+                date: timestamp.date().to_string(),+            },+            _ => unreachable!(),+        }+    }+}++#[derive(thiserror::Error, Debug)]+pub enum DeadLetterQueueError {+    #[error("JSON serialization failed: {source}")]+    SerdeJson {+        #[from]+        source: serde_json::Error,+    },++    #[error("Delta write failed: {source}")]+    DeltaWriter {+        #[from]+        source: DeltaWriterError,+    },+}++pub struct DeadLetterQueueOptions {+    /// Table URI of the delta table to write dead letters to. Implies usage of the DeltaSinkDeadLetterQueue.+    pub delta_table_uri: Option<String>,+}++#[async_trait]+pub trait DeadLetterQueue: Send + Sync {+    async fn write_dead_letter(+        &mut self,+        dead_letter: DeadLetter,+    ) -> Result<(), DeadLetterQueueError> {+        self.write_dead_letters(vec![dead_letter]).await+    }++    async fn write_dead_letters(+        &mut self,+        dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError>;+}++pub async fn dlq_from_opts(+    options: DeadLetterQueueOptions,+) -> Result<Box<dyn DeadLetterQueue>, DeadLetterQueueError> {+    if let Some(table_uri) = options.delta_table_uri {+        Ok(Box::new(+            DeltaSinkDeadLetterQueue::for_table_uri(table_uri.as_str()).await?,+        ))+    } else {+        Ok(Box::new(NoopDeadLetterQueue {}))+    }+}++pub struct NoopDeadLetterQueue {}++#[async_trait]+impl DeadLetterQueue for NoopDeadLetterQueue {+    async fn write_dead_letters(+        &mut self,+        _dead_letters: Vec<DeadLetter>,+    ) -> Result<(), DeadLetterQueueError> {+        // noop+        Ok(())+    }+}

For what purpose does this Noop thing exist?

xianwill

comment created time in 25 days

PullRequestReviewEvent

Pull request review commentdelta-io/kafka-delta-ingest

[WIP] Add DeadLetterQueue to handle bad messages and failed parquet writes

+use async_trait::async_trait;+use chrono::prelude::*;+use core::fmt::Debug;+use log::{error, info};+use parquet::errors::ParquetError;+use serde::{Deserialize, Serialize};+use serde_json::Value;++use crate::{deltalake_ext::*, transforms::TransformError};++#[derive(Clone, Debug, Serialize, Deserialize)]+pub struct DeadLetter {+    pub base64_bytes: Option<String>,+    pub json_string: Option<String>,+    pub error: Option<String>,+    pub timestamp: String,+    pub date: String,

I'm confused as to why both timestamp and date are necessary, but also why they're both stored as Strings, rather than letting serde do the serialize/deserialize for us?

xianwill

comment created time in 25 days

PullRequestReviewEvent

Pull request review commentdelta-io/delta-rs

Add S3StorageOptions to allow configuring S3 backend explicitly

 use uuid::Uuid;  pub mod dynamodb_lock; -const AWS_S3_ASSUME_ROLE_ARN: &str = "AWS_S3_ASSUME_ROLE_ARN";-const AWS_S3_ROLE_SESSION_NAME: &str = "AWS_S3_ROLE_SESSION_NAME";+const AWS_ENDPOINT_URL: &str = "AWS_ENDPOINT_URL"; const AWS_WEB_IDENTITY_TOKEN_FILE: &str = "AWS_WEB_IDENTITY_TOKEN_FILE"; +mod s3_storage_options {+    pub const AWS_REGION: &str = "AWS_REGION";+    pub const AWS_S3_ASSUME_ROLE_ARN: &str = "AWS_S3_ASSUME_ROLE_ARN";+    pub const AWS_S3_LOCKING_PROVIDER: &str = "AWS_S3_LOCKING_PROVIDER";+    pub const AWS_S3_ROLE_SESSION_NAME: &str = "AWS_S3_ROLE_SESSION_NAME";++    pub const S3_OPTS: &[&str] = &[+        AWS_REGION,+        AWS_S3_ASSUME_ROLE_ARN,+        AWS_S3_LOCKING_PROVIDER,+        AWS_S3_ROLE_SESSION_NAME,+    ];+}++/// Options used to configure the S3StorageBackend.+///+/// Available options are described below.+///+/// The same key shown in the table below should be used whether passing a key in the hashmap or setting it as an environment variable.+/// Provided keys may include configuration for the S3 backend and also the optional DynamoDb lock used for atomic rename.+///+/// | name/key                 | description                                                                                                                                                    |+/// | ======================== | ============================================================================================================================================================== |+/// | AWS_REGION               | The AWS region.                                                                                                                                                |+/// | AWS_S3_ASSUME_ROLE_ARN   | The role to assume for S3 writes.                                                                                                                              |+/// | AWS_S3_ROLE_SESSION_NAME | The role session name to use for assume role. If not provided a random session name is generated.                                                              |+/// | AWS_S3_LOCKING_PROVIDER  | The locking provider to use. For safe atomic rename, this should be `dynamodb`. If empty, no locking provider is used and safe atomic rename is not available. |+///+/// Unconsumed `extra_opts` are passed as a `HashMap` to the `dynamodb_lock` module for configuring the dynamodb client used for safe atomic rename.+///+/// [dynamodb_lock::DynamoDbOptions] describes the available options.+///+/// Two environment variables are not included as options (and not described in the table above).+/// These must be set as environment variables when desired and are described below:+///+/// * AWS_ENDPOINT_URL - This variable is used specifically for testing against localstack and should be specified in the environment.+/// * AWS_WEB_IDENTITY_TOKEN_FILE - file describing k8s configuration.+///   env vars to satisfy https://docs.rs/rusoto_sts/0.47.0/rusoto_sts/struct.WebIdentityProvider.html#method.from_k8s_env should be provided+pub struct S3StorageOptions {+    region: Region,+    assume_role_arn: Option<String>,+    role_session_name: Option<String>,+    use_web_identity: bool,+    locking_provider: Option<String>,+    extra_opts: HashMap<String, String>,+}++impl S3StorageOptions {+    /// Creates an instance of S3StorageOptions from environment variables.+    pub fn from_env() -> S3StorageOptions {

I might recommending moving this to a Default trait implementation.

xianwill

comment created time in 25 days

Pull request review commentdelta-io/delta-rs

Add S3StorageOptions to allow configuring S3 backend explicitly

 use uuid::Uuid;  pub mod dynamodb_lock; -const AWS_S3_ASSUME_ROLE_ARN: &str = "AWS_S3_ASSUME_ROLE_ARN";-const AWS_S3_ROLE_SESSION_NAME: &str = "AWS_S3_ROLE_SESSION_NAME";+const AWS_ENDPOINT_URL: &str = "AWS_ENDPOINT_URL"; const AWS_WEB_IDENTITY_TOKEN_FILE: &str = "AWS_WEB_IDENTITY_TOKEN_FILE"; +mod s3_storage_options {+    pub const AWS_REGION: &str = "AWS_REGION";+    pub const AWS_S3_ASSUME_ROLE_ARN: &str = "AWS_S3_ASSUME_ROLE_ARN";+    pub const AWS_S3_LOCKING_PROVIDER: &str = "AWS_S3_LOCKING_PROVIDER";+    pub const AWS_S3_ROLE_SESSION_NAME: &str = "AWS_S3_ROLE_SESSION_NAME";++    pub const S3_OPTS: &[&str] = &[+        AWS_REGION,+        AWS_S3_ASSUME_ROLE_ARN,+        AWS_S3_LOCKING_PROVIDER,+        AWS_S3_ROLE_SESSION_NAME,+    ];+}++/// Options used to configure the S3StorageBackend.+///+/// Available options are described below.+///+/// The same key shown in the table below should be used whether passing a key in the hashmap or setting it as an environment variable.+/// Provided keys may include configuration for the S3 backend and also the optional DynamoDb lock used for atomic rename.+///+/// | name/key                 | description                                                                                                                                                    |+/// | ======================== | ============================================================================================================================================================== |+/// | AWS_REGION               | The AWS region.                                                                                                                                                |+/// | AWS_S3_ASSUME_ROLE_ARN   | The role to assume for S3 writes.                                                                                                                              |+/// | AWS_S3_ROLE_SESSION_NAME | The role session name to use for assume role. If not provided a random session name is generated.                                                              |+/// | AWS_S3_LOCKING_PROVIDER  | The locking provider to use. For safe atomic rename, this should be `dynamodb`. If empty, no locking provider is used and safe atomic rename is not available. |+///+/// Unconsumed `extra_opts` are passed as a `HashMap` to the `dynamodb_lock` module for configuring the dynamodb client used for safe atomic rename.+///+/// [dynamodb_lock::DynamoDbOptions] describes the available options.+///+/// Two environment variables are not included as options (and not described in the table above).+/// These must be set as environment variables when desired and are described below:+///+/// * AWS_ENDPOINT_URL - This variable is used specifically for testing against localstack and should be specified in the environment.+/// * AWS_WEB_IDENTITY_TOKEN_FILE - file describing k8s configuration.+///   env vars to satisfy https://docs.rs/rusoto_sts/0.47.0/rusoto_sts/struct.WebIdentityProvider.html#method.from_k8s_env should be provided+pub struct S3StorageOptions {+    region: Region,+    assume_role_arn: Option<String>,+    role_session_name: Option<String>,+    use_web_identity: bool,+    locking_provider: Option<String>,+    extra_opts: HashMap<String, String>,+}++impl S3StorageOptions {+    /// Creates an instance of S3StorageOptions from environment variables.+    pub fn from_env() -> S3StorageOptions {+        let empty_opts = HashMap::new();+        Self::from_map(&empty_opts)+    }++    /// Creates an instance of S3StorageOptions from the given HashMap+    pub fn from_map(options: &HashMap<String, String>) -> S3StorageOptions {+        fn str_or_default(map: &HashMap<String, String>, key: &str, default: String) -> String {

what's with the functions in your functions? As opposed to non public functions within the impl?

xianwill

comment created time in 25 days

PullRequestReviewEvent