profile
viewpoint

diesel-rs/diesel 5982

A safe, extensible ORM and Query Builder for Rust

penelopezone/rubyfmt 702

Ruby Autoformatter!

byroot/activerecord-typedstore 252

ActiveRecord::Store but with type definition

rust-lang/crates.io-index 217

Registry index for crates.io

jferris/effigy 209

Ruby views without a templating language

penelopezone/descriptor 11

Descriptor is a spec-style test structuring library for rust

penelopezone/expector 4

Expector is a matcher library for rust.

durhamka/jquery-dotdotdot-rails 3

jQuery.dotdotdot library for the Rails asset pipeline

push eventdiesel-rs/diesel

Siân Griffin

commit sha 4b87e64a1fdd0163c9c05652de7a5921345f51da

Make `Statement::prepare` async I really *really* don't want to have to maintain two copies of the PG backend. For `Statement::prepare`, this means at minimum having it call `PQsendPrepare` and `PQgetResult` instead of `PQprepare`. The only difference between the sync and async versions is whether or not we do the `PQflush`, `PQconsumeInput`, and `PQisBusy` loop before calling `PQgetResult`. I've gone back and forth on this a ton, and I think the best option is to just have the async version be canonical, and provide a really shitty form of `block_on` that the blocking connection uses. There's a few reasons for this, but the biggest is that I want to prevent `RawConnection` from being misused. Any time a command is sent, we need to grab all the results from that command, or otherwise have some form of multiplexing which isn't reasonable to implement. So if we exposed `send_prepare` and `get_result` on `RawConnection` separately, it's easy for callers to misuse. These functions are internal and unsafe, so maybe it's fine, but Rust is capable of encoding this, so I'd prefer to do so if possible. The second reason is that I don't want to have to maintain a second version of `StatementCache`, which means it will need to be aware of the fact that the function to get the value is async. This commit does not yet implement that, but starts to move in that direction. If we're going to make `StatementCache` async, that's going to affect all backends, async or not, so we *really* need to make sure it's zero cost. I care more about compile time than runtime here, but I think the impact is relatively negligible as long as we don't pull in `tokio` or something. As far as I'm aware, the actual implementation of `async`/`await` itself for functions that never yield is zero cost now that TLS is no longer used. So I think that if we combine this with the worlds worst implementation of `block_on` that specifically knows our futures never yield, we can share most of the code with a negligible impact on compile times or run times for sync users. The main cost here is that `RawConnection` now needs a `RefCell`, but we need to move a `RefCell` for `StatementCache` up a level anyway, so I think we'll end up having a "this is the real connection" struct that just gets a `RefCell` around the whole thing for PG.

view details

push time in a month

push eventdiesel-rs/diesel

Siân Griffin

commit sha bfd14ab89ef94b32c6fe497c85f5098d40fe203f

wip

view details

push time in a month

create barnchdiesel-rs/diesel

branch : sg-async-refactor-wip

created branch time in a month

issue commentdiesel-rs/diesel

Async I/O

The issue is not that we don't know how to write the impl, it's that compiler bugs regarding projection through associated types prevent closures from being passed in

On Tue, Sep 1, 2020 at 5:09 AM Martin Algesten notifications@github.com wrote:

@weiznich https://github.com/weiznich @mehcode https://github.com/mehcode and it's possible to do a blanket impl to accept both async fn and async closures.

https://github.com/algesten/hreq/blob/master/src/server/handler.rs#L44-L62

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/diesel-rs/diesel/issues/399#issuecomment-684773560, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALVMK2T2Z65WCNRZCHE3ATSDTI45ANCNFSM4CL7J2HQ .

--

Thanks, Siân Griffin

killercup

comment created time in a month

issue commentrust-lang/crates.io

Can't remove invalid users from crate owners

#1586 would also let us manually resolve this problem when we're aware of it (but that PR does not currently automatically mark users as deleted in the cases where we could detect it -- which in this case we could have)

carols10cents

comment created time in a month

issue commentrust-lang/crates.io

Can't remove invalid users from crate owners

This is slightly different framing, but almost the same as #1585

carols10cents

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor `FileComments` to better demonstrate intent

 impl ParserState {     }      pub fn insert_comment_collection(&mut self, comments: CommentBlock) {-        self.comments_to_insert-            .merge(comments.apply_spaces(self.spaces_after_last_newline));+        self.comments_to_insert += comments.apply_spaces(self.spaces_after_last_newline);

It was just the most convenient way to get it onto Option but we can do a trait with a named method

sgrif

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentpenelopezone/rubyfmt

Refactor `FileComments` to better demonstrate intent

 impl FileComments {         fc     } +    /// Add a new comment. If the beginning of this file is a comment block,+    /// each of those comment lines must be pushed before any other line, or+    /// the end of the "start of file sled" will be incorrectly calculated.     fn push_comment(&mut self, line_number: u64, l: String) {-        if self.lowest_key == 0 {-            self.lowest_key = line_number;-        }--        let last_line = self.contiguous_starting_indices.last();--        let should_push =-            line_number == 1 || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));-        if should_push {-            self.contiguous_starting_indices.push(line_number);+        match (&mut self.start_of_file_sled, line_number) {+            (None, 1) => {+                debug_assert!(+                    self.other_comments.is_empty(),+                    "If we have a start of file sled, it needs to come first,+                     otherwise we won't know where the last line is",+                );+                self.start_of_file_sled = Some(CommentBlock::new(1..2, vec![l]));+            }+            (Some(sled), _) if sled.following_line_number() == line_number => {+                sled.add_line(l);

(I would pick a different variable name but I don't want to pointlessly churn the line declaring it. GH plz pick a less bad font)

sgrif

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor `FileComments` to better demonstrate intent

 impl FileComments {         fc     } +    /// Add a new comment. If the beginning of this file is a comment block,+    /// each of those comment lines must be pushed before any other line, or+    /// the end of the "start of file sled" will be incorrectly calculated.     fn push_comment(&mut self, line_number: u64, l: String) {-        if self.lowest_key == 0 {-            self.lowest_key = line_number;-        }--        let last_line = self.contiguous_starting_indices.last();--        let should_push =-            line_number == 1 || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));-        if should_push {-            self.contiguous_starting_indices.push(line_number);+        match (&mut self.start_of_file_sled, line_number) {+            (None, 1) => {+                debug_assert!(+                    self.other_comments.is_empty(),+                    "If we have a start of file sled, it needs to come first,+                     otherwise we won't know where the last line is",+                );+                self.start_of_file_sled = Some(CommentBlock::new(1..2, vec![l]));+            }+            (Some(sled), _) if sled.following_line_number() == line_number => {+                sled.add_line(l);

l looks like 1 here :\

sgrif

comment created time in a month

PullRequestReviewEvent

PR opened penelopezone/rubyfmt

Refactor `FileComments` to better demonstrate intent

I'm not particularly happy with how this came out, but I do think it's a clear improvement over what was there before. Previously this has three obviously interacting fields, with a lot of subtle, undocumented, unmaintained invariants (or at least in the case of lowest_key it was unmaintained). I do believe that this structure at least better represents intent.

CommentBlock now keeps track of what lines it's on, and the "start of file sled" (what even is that?!) is kept separately from the rest of the comments. This reduces (but does not eliminates) the amount of state that this type has to keep, and makes the intent clearer both in the type itself, and its state at any single point. At absolute minimum, I have at least removed all panics from this type's implementation.

That said, this interface has a ton of pitfalls. I really dislike how much it needs to mutate itself. While developing this patch, it became clear to me why this is the case. The parser that we've written only retains the line number and the comment on that line. This means we have no way to determine if two comments are part of a contiguous block or not, and it seems we leave the formatter to determine whether that's the case. We do treat the first comment block specially though, and assume it's a contiguous block (though I'm not entirely clear why yet).

This is a problem for a number of reasons. First of all, it means that we will just delete comments if there are more than one on a given line, potentially breaking comments on method arguments that are significant to rdoc.

This also misbehaves if the first non-comment line has a trailing comment (most likely for rdoc, but could also potentially happen if the first comment is for the file itself, and the trailing comment is a poorly placed, very short comment for the item it's on)

This type very clearly wants to be Map<Range<LineNumber>, Vec<String>> or even better Vec<Span, String> since the formatter should be able to choose/change where newlines are inserted, and really shouldn't be coupled to line numbers in the source. I'm sure Ripper must give us a way to determine if non-comment tokens appear between two comments, but I do not know enough about that layer of the code to fix it in this patch. The fact that FileComments is constructed outside of the main deserializer is surprising, and I would hope that we can perhaps at least handle this in a Deserialize impl`.

But short of what I'd consider to be a more complete fix, I have at least tried to make the invariants of this type better upheld, and those that aren't enforced are now documented (mostly in push_comment, which is still extremely subtle)

I don't think push_comment is the right interface for this type. However, since it was marked as public, I've assumed that it is an interface intended to be externally consumed and thus needs to be maintained. If that's not the case, I think we should replace it with something that only operates when we have a full view of the data, such as FromIterator. In that case we can either manually sort the data before consuming it, or at least include a debug assertion that it's already sorted.

Speaking of debug assertions, one thing that stood out to me while writing this is that the test suite is run in release mode. I'd like to learn more about why that is. Previous iterations of this patch included debug assertions for the invariants that were implied but not upheld. While none of these landed in the final patch, if they did I feel like they should be run as part of the test suite.

<!-- Hi there! Thanks for taking the time to file a pull request against Rubyfmt Right now we're accepting CLI ergonomics PRs, Editor Integration PRs, and bug fixes only. We define bugs as Rubyfmt failing to format a file, or formatting a file such that it's behaviour changes. If you're trying to change the behaviour of the formatter, or style the output, please don't file the PR. We're working on getting that just right, and will accept those in a future release. -->

+77 -75

0 comment

3 changed files

pr created time in a month

push eventsgrif/rubyfmt

Siân Griffin

commit sha e2a27521596fc7e56f012c3a9863d444661ffe00

Refactor `FileComments` to better demonstrate intent I'm not particularly happy with how this came out, but I do think it's a clear improvement over what was there before. Previously this has three obviously interacting fields, with a lot of subtle, undocumented, unmaintained invariants (or at least in the case of `lowest_key` it was unmaintained). I do believe that this structure at least better represents intent. `CommentBlock` now keeps track of what lines it's on, and the "start of file sled" (what even is that?!) is kept separately from the rest of the comments. This reduces (but does not eliminates) the amount of state that this type has to keep, and makes the intent clearer both in the type itself, and its state at any single point. At absolute minimum, I have at least removed all panics from this type's implementation. That said, this interface has a ton of pitfalls. I really dislike how much it needs to mutate itself. While developing this patch, it became clear to me why this is the case. The parser that we've written only retains the line number and the comment on that line. This means we have no way to determine if two comments are part of a contiguous block or not, and it seems we leave the formatter to determine whether that's the case. We do treat the first comment block specially though, and assume it's a contiguous block (though I'm not entirely clear why yet). This is a problem for a number of reasons. First of all, it means that we will just delete comments if there are more than one on a given line, potentially breaking comments on method arguments that are significant to rdoc. This also misbehaves if the first non-comment line has a trailing comment (most likely for rdoc, but could also potentially happen if the first comment is for the file itself, and the trailing comment is a poorly placed, very short comment for the item it's on) This type very clearly wants to be `Map<Range<LineNumber>, Vec<String>>` or even better `Vec<Span, String>` since the formatter should be able to choose/change where newlines are inserted, and really shouldn't be coupled to line numbers in the source. I'm sure Ripper *must* give us a way to determine if non-comment tokens appear between two comments, but I do not know enough about that layer of the code to fix it in this patch. The fact that `FileComments` is constructed outside of the main deserializer is surprising, and I would hope that we can perhaps at least handle this in a `Deserialize` impl`. But short of what I'd consider to be a more complete fix, I have at least tried to make the invariants of this type better upheld, and those that aren't enforced are now documented (mostly in `push_comment`, which is still extremely subtle) I don't think `push_comment` is the right interface for this type. However, since it was marked as public, I've assumed that it is an interface intended to be externally consumed and thus needs to be maintained. If that's not the case, I think we should replace it with something that only operates when we have a full view of the data, such as `FromIterator`. In that case we can either manually sort the data before consuming it, or at least include a debug assertion that it's already sorted. Speaking of debug assertions, one thing that stood out to me while writing this is that the test suite is run in release mode. I'd like to learn more about why that is. Previous iterations of this patch included debug assertions for the invariants that were implied but not upheld. While none of these landed in the final patch, if they did I feel like they should be run as part of the test suite.

view details

push time in a month

push eventsgrif/rubyfmt

Siân Griffin

commit sha 52ad80f411a812f693b102b570edf358381f2605

Refactor `FileComments` to better demonstrate intent

view details

push time in a month

create barnchsgrif/rubyfmt

branch : sg-refactor-line-comments

created branch time in a month

issue openedpenelopezone/rubyfmt

Have the test suite run with `cargo test`

By having a bunch of custom shell scripts to run the test suite, we make the test suite harder to find, and also increase the barrier to entry for new contributors. I don't see anything about the suite that can't be done in Rust. We should seriously consider moving this into Rust, either by manually writing the boilerplate "check this file" when a new example is added, or by generating it with a proc macro. https://github.com/diesel-rs/diesel/blob/master/diesel_cli/tests/print_schema.rs is an example of a similar "run the binary associated with this crate on this input and diff stdout from this expected value" type of suite

created time in a month

issue openedpenelopezone/rubyfmt

Create a cargo workspace containing librubyfmt

I keep running cargo fmt and cargo clippy in the main directory forgetting most of the code is in a subdirectory. Having the outer directory be a workspace would fix this

created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use crate::line_tokens::LineToken; use crate::types::{ColNumber, LineNumber}; use std::collections::HashSet; +fn insert_at<T>(idx: usize, target: &mut Vec<T>, input: &mut Vec<T>) {+    let drain = input.drain(..);+    let mut idx = idx;+    for item in drain {+        target.insert(idx, item);+        idx += 1;+    }

The code as written will perform as many as log(N) allocations and 2N memcpys. The new code will perform at most 2 and at minimum 1 allocations, and 2 memcpys

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");+    unsafe { ruby_string_to_str(ruby_string) }.to_owned()+}++macro_rules! intern {+    ($s:literal) => {+        rb_intern(concat!($s, "\0").as_ptr() as _)+    };+}++// Safety: This function expects an initialized Ruby VM capable of evaling code+pub unsafe fn eval_str(s: &str) -> Result<VALUE, ()> {+    let rubyfmt_program_as_c = CString::new(s).expect("unexpected nul byte in Ruby code");+    let mut state = 0;+    let v = rb_eval_string_protect(rubyfmt_program_as_c.as_ptr(), &mut state);+    if state != 0 {+        Err(())+    } else {+        Ok(v)+    }+}++extern "C" fn real_debug_inspect(v: VALUE) -> VALUE {     unsafe {-        let res = eval_str("$!.inspect").expect("this can't fail");-        let ptr = rubyfmt_rstring_ptr(res);-        let length = rubyfmt_rstring_len(res);-        String::from_raw_parts(ptr as _, length as _, length as _)+        let inspect = rb_funcall(v, intern!("inspect"), 0, std::ptr::null() as *const VALUE);
        let inspect = rb_funcall(v, intern!("inspect"), 0, std::ptr::null());
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl BreakableEntry {         tokens     } +    fn last_token_is_a_newline(&self) -> bool {+        match self.tokens.last() {+            Some(x) => x.is_newline(),+            _ => false,+        }+    }++    fn index_of_prev_hard_newline(&self) -> Option<usize> {+        self.tokens+            .iter()+            .rposition(|v| v.is_newline() || v.is_comment())+    }+}++impl BreakableEntry {+    pub fn new(spaces: ColNumber, delims: BreakableDelims) -> Self {+        BreakableEntry {+            spaces,+            tokens: vec![],

Seeing vec![] next to HashSet::new() feels weird to me

            tokens: Vec::new(),
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");+    unsafe { ruby_string_to_str(ruby_string) }.to_owned()+}++macro_rules! intern {+    ($s:literal) => {+        rb_intern(concat!($s, "\0").as_ptr() as _)+    };+}++// Safety: This function expects an initialized Ruby VM capable of evaling code+pub unsafe fn eval_str(s: &str) -> Result<VALUE, ()> {+    let rubyfmt_program_as_c = CString::new(s).expect("unexpected nul byte in Ruby code");+    let mut state = 0;+    let v = rb_eval_string_protect(rubyfmt_program_as_c.as_ptr(), &mut state);+    if state != 0 {+        Err(())+    } else {+        Ok(v)+    }+}++extern "C" fn real_debug_inspect(v: VALUE) -> VALUE {     unsafe {-        let res = eval_str("$!.inspect").expect("this can't fail");-        let ptr = rubyfmt_rstring_ptr(res);-        let length = rubyfmt_rstring_len(res);-        String::from_raw_parts(ptr as _, length as _, length as _)+        let inspect = rb_funcall(v, intern!("inspect"), 0, std::ptr::null() as *const VALUE);+        let char_pointer = rb_string_value_cstr(&inspect) as *mut i8;+        let cstr = CStr::from_ptr(char_pointer);+        let s = cstr.to_str().expect("it's utf8");+        debug!("{}", s);+        Qnil     } } -pub fn eval_str(s: &str) -> Result<VALUE, ()> {+pub fn debug_inspect(v: VALUE) {     unsafe {-        let rubyfmt_program_as_c = CString::new(s).expect("it should become a c string");         let mut state = 0;-        let v = rb_eval_string_protect(-            rubyfmt_program_as_c.as_ptr(),-            &mut state as *mut libc::c_int,-        );+        rb_protect(real_debug_inspect as _, v, &mut state);         if state != 0 {-            Err(())-        } else {-            Ok(v)+            let s = current_exception_as_rust_string();+            panic!("blew us: {}", s);

Are you sure panicking if we failed to perform a debug inspection is the right call?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use std::collections::BTreeMap;-use std::io::{self, BufRead, Read};  use crate::comment_block::CommentBlock;+use crate::ruby::*; use crate::types::LineNumber; -use regex::Regex;--#[derive(Debug)]+#[derive(Debug, Default)] pub struct FileComments {     comment_blocks: BTreeMap<LineNumber, String>,     contiguous_starting_indices: Vec<LineNumber>,     lowest_key: LineNumber, }  impl FileComments {-    pub fn new() -> Self {-        FileComments {-            comment_blocks: BTreeMap::new(),-            contiguous_starting_indices: vec![],-            lowest_key: 0,+    pub fn from_ruby_hash(h: VALUE) -> Self {+        let mut fc = FileComments::default();+        let keys;+        let values;+        unsafe {+            keys = ruby_array_to_slice(rb_funcall(h, intern!("keys"), 0));+            values = ruby_array_to_slice(rb_funcall(h, intern!("values"), 0));+        }+        if keys.len() != values.len() {+            raise("expected keys and values to have same length, indicates error");         }+        for (ruby_lineno, ruby_comment) in keys.iter().zip(values) {+            let lineno = unsafe { rubyfmt_rb_num2ll(*ruby_lineno) };+            if lineno < 0 {+                raise("line number negative");+            }+            let comment = unsafe { ruby_string_to_str(*ruby_comment) }+                .trim()+                .to_owned();+            fc.push_comment(lineno as _, comment);+        }+        fc     } -    pub fn from_buf<R: Read>(r: io::BufReader<R>) -> io::Result<Self> {-        lazy_static! {-            static ref RE: Regex = Regex::new("^ *#").unwrap();+    fn push_comment(&mut self, line_number: u64, l: String) {+        if self.lowest_key == 0 {+            self.lowest_key = line_number;         }-        let mut res = Self::new();-        for (idx, line) in r.lines().enumerate() {-            let l = line?;-            if RE.is_match(&l) {-                let line_number = (idx + 1) as LineNumber;-                if res.lowest_key == 0 {-                    res.lowest_key = line_number;-                } -                let last_line = res.contiguous_starting_indices.last();+        let last_line = self.contiguous_starting_indices.last(); -                let should_push = line_number == 1-                    || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));-                if should_push {-                    res.contiguous_starting_indices.push(line_number);-                }-                res.comment_blocks.insert(line_number, l);-            }+        let should_push =+            line_number == 1 || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));

Or if my assumption that last_line is always None when line_number == 1 is incorrect,

        let should_push =
            line_number == 1 || last_line.copied() == Some(line_number - 1);
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");+    unsafe { ruby_string_to_str(ruby_string) }.to_owned()+}++macro_rules! intern {+    ($s:literal) => {+        rb_intern(concat!($s, "\0").as_ptr() as _)+    };+}++// Safety: This function expects an initialized Ruby VM capable of evaling code+pub unsafe fn eval_str(s: &str) -> Result<VALUE, ()> {+    let rubyfmt_program_as_c = CString::new(s).expect("unexpected nul byte in Ruby code");+    let mut state = 0;+    let v = rb_eval_string_protect(rubyfmt_program_as_c.as_ptr(), &mut state);+    if state != 0 {+        Err(())+    } else {+        Ok(v)+    }+}++extern "C" fn real_debug_inspect(v: VALUE) -> VALUE {     unsafe {-        let res = eval_str("$!.inspect").expect("this can't fail");-        let ptr = rubyfmt_rstring_ptr(res);-        let length = rubyfmt_rstring_len(res);-        String::from_raw_parts(ptr as _, length as _, length as _)+        let inspect = rb_funcall(v, intern!("inspect"), 0, std::ptr::null() as *const VALUE);+        let char_pointer = rb_string_value_cstr(&inspect) as *mut i8;+        let cstr = CStr::from_ptr(char_pointer);+        let s = cstr.to_str().expect("it's utf8");+        debug!("{}", s);
        debug!("{}", unsafe { ruby_string_to_str(inspect) });
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 pub const Qnil: VALUE = VALUE(8);  extern "C" {     // stuff that we need to compile out rubyfmt-    pub fn ruby_init();+    pub fn ruby_setup() -> libc::c_int;     pub fn ruby_cleanup(_: libc::c_int);     pub fn rb_eval_string_protect(_: *const libc::c_char, _: *mut libc::c_int) -> VALUE;     pub fn rb_funcall(_: VALUE, _: ID, _: libc::c_int, ...) -> VALUE;     pub fn rb_utf8_str_new(_: *const libc::c_char, _: libc::c_long) -> VALUE;     pub fn rb_str_new_cstr(_: *const libc::c_char) -> VALUE;-    pub fn rb_string_value_cstr(_: VALUE) -> *const libc::c_char;+    pub fn rb_string_value_cstr(_: *const VALUE) -> *const libc::c_char;

I don't see a corresponding change to the definition. Are you sure this is correct?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");+    unsafe { ruby_string_to_str(ruby_string) }.to_owned()+}++macro_rules! intern {+    ($s:literal) => {+        rb_intern(concat!($s, "\0").as_ptr() as _)+    };+}++// Safety: This function expects an initialized Ruby VM capable of evaling code+pub unsafe fn eval_str(s: &str) -> Result<VALUE, ()> {+    let rubyfmt_program_as_c = CString::new(s).expect("unexpected nul byte in Ruby code");+    let mut state = 0;+    let v = rb_eval_string_protect(rubyfmt_program_as_c.as_ptr(), &mut state);+    if state != 0 {+        Err(())+    } else {+        Ok(v)+    }+}++extern "C" fn real_debug_inspect(v: VALUE) -> VALUE {     unsafe {-        let res = eval_str("$!.inspect").expect("this can't fail");-        let ptr = rubyfmt_rstring_ptr(res);-        let length = rubyfmt_rstring_len(res);-        String::from_raw_parts(ptr as _, length as _, length as _)+        let inspect = rb_funcall(v, intern!("inspect"), 0, std::ptr::null() as *const VALUE);+        let char_pointer = rb_string_value_cstr(&inspect) as *mut i8;+        let cstr = CStr::from_ptr(char_pointer);+        let s = cstr.to_str().expect("it's utf8");+        debug!("{}", s);+        Qnil     } } -pub fn eval_str(s: &str) -> Result<VALUE, ()> {+pub fn debug_inspect(v: VALUE) {     unsafe {-        let rubyfmt_program_as_c = CString::new(s).expect("it should become a c string");         let mut state = 0;-        let v = rb_eval_string_protect(-            rubyfmt_program_as_c.as_ptr(),-            &mut state as *mut libc::c_int,-        );+        rb_protect(real_debug_inspect as _, v, &mut state);         if state != 0 {-            Err(())-        } else {-            Ok(v)+            let s = current_exception_as_rust_string();+            panic!("blew us: {}", s);         }     } }++pub fn raise(s: &str) {+    let cstr = CString::new(s).expect("it's not null");

"it's not null" doesn't accurately describe this error condition. Can we just handle a nul byte being contained in the string gracefully or at least give a useful error message?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use crate::line_tokens::LineToken; use crate::types::{ColNumber, LineNumber}; use std::collections::HashSet; +fn insert_at<T>(idx: usize, target: &mut Vec<T>, input: &mut Vec<T>) {+    let drain = input.drain(..);+    let mut idx = idx;+    for item in drain {+        target.insert(idx, item);+        idx += 1;+    }+}++#[derive(Copy, Clone, Debug)] pub enum ConvertType {     MultiLine,     SingleLine, } +pub trait LineTokenTarget {+    fn push(&mut self, lt: LineToken);+    fn insert_at(&mut self, idx: usize, tokens: &mut Vec<LineToken>);+    fn into_tokens(self, ct: ConvertType) -> Vec<LineToken>;+    fn last_token_is_a_newline(&self) -> bool;+    fn index_of_prev_hard_newline(&self) -> Option<usize>;+}

The impl of every method on this trait is exactly the same (or could be). How about this instead?

pub trait LineTokenTarget {
    fn tokens(&self) -> &[LineToken];
    fn tokens_mut(&mut self) -> &mut Vec<LineToken>;
    
    fn push(&mut self, lt: LineToken) {
        self.tokens_mut().push(lt);
    }
    
    // etc
}
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");+    unsafe { ruby_string_to_str(ruby_string) }.to_owned()+}++macro_rules! intern {+    ($s:literal) => {+        rb_intern(concat!($s, "\0").as_ptr() as _)+    };+}++// Safety: This function expects an initialized Ruby VM capable of evaling code+pub unsafe fn eval_str(s: &str) -> Result<VALUE, ()> {+    let rubyfmt_program_as_c = CString::new(s).expect("unexpected nul byte in Ruby code");

Can we return an error instead of panicking?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 extern "C" {     pub fn rb_id2name(id: ID) -> *const libc::c_char;     pub fn rb_ary_entry(arr: VALUE, idx: libc::c_long) -> VALUE;     pub fn rb_raise(cls: VALUE, msg: *const libc::c_char);+    pub fn rb_block_call(+        obj: VALUE,+        method_id: ID,+        argc: libc::c_int,+        argv: *const VALUE,+        block: extern "C" fn(_: VALUE, _: VALUE, _: libc::c_int, _: *const VALUE) -> VALUE,+        outer_scope: VALUE,+    ) -> VALUE; }  pub fn current_exception_as_rust_string() -> String {+    let ruby_string = unsafe { eval_str("$!.inspect") }.expect("Error evaluating `$!.inspect`");

Rather than panicking, how about we return a string indicating that there was an error even trying to get the value out of Ruby. Or even better, is there a function in the Ruby C API that we can call instead of evaling?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use crate::line_tokens::LineToken; use crate::types::{ColNumber, LineNumber}; use std::collections::HashSet; +fn insert_at<T>(idx: usize, target: &mut Vec<T>, input: &mut Vec<T>) {+    let drain = input.drain(..);+    let mut idx = idx;+    for item in drain {+        target.insert(idx, item);+        idx += 1;+    }+}++#[derive(Copy, Clone, Debug)] pub enum ConvertType {     MultiLine,     SingleLine, } +pub trait LineTokenTarget {+    fn push(&mut self, lt: LineToken);+    fn insert_at(&mut self, idx: usize, tokens: &mut Vec<LineToken>);+    fn into_tokens(self, ct: ConvertType) -> Vec<LineToken>;+    fn last_token_is_a_newline(&self) -> bool;+    fn index_of_prev_hard_newline(&self) -> Option<usize>;+}++#[derive(Debug, Default, Clone)]+pub struct BaseQueue {+    tokens: Vec<LineToken>,+}++impl LineTokenTarget for BaseQueue {+    fn push(&mut self, lt: LineToken) {+        self.tokens.push(lt)+    }++    fn insert_at(&mut self, idx: usize, tokens: &mut Vec<LineToken>) {

It looks like this is mostly called with a single element, should this just take that element instead of a vec?

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use crate::line_tokens::LineToken; use crate::types::{ColNumber, LineNumber}; use std::collections::HashSet; +fn insert_at<T>(idx: usize, target: &mut Vec<T>, input: &mut Vec<T>) {+    let drain = input.drain(..);+    let mut idx = idx;+    for item in drain {+        target.insert(idx, item);+        idx += 1;+    }
    let mut tail = target.split_off(idx);
    input.append(input);
    input.append(&mut tail);
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl ParserState {             }         }     }++    fn current_target(&self) -> &dyn LineTokenTarget {+        if self.breakable_entry_stack.is_empty() {+            &self.render_queue+        } else {+            self.breakable_entry_stack+                .last()+                .expect("we checked it's not empty")+        }+    }++    fn current_target_mut(&mut self) -> &mut dyn LineTokenTarget {+        if self.breakable_entry_stack.is_empty() {+            &mut self.render_queue+        } else {+            self.breakable_entry_stack+                .last_mut()+                .expect("we checked it's not empty")+        }
        self.breakable_entry_stack
            .last_mut()
            .unwrap_or(&mut self.render_queue)
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use std::collections::BTreeMap;-use std::io::{self, BufRead, Read};  use crate::comment_block::CommentBlock;+use crate::ruby::*; use crate::types::LineNumber; -use regex::Regex;--#[derive(Debug)]+#[derive(Debug, Default)] pub struct FileComments {     comment_blocks: BTreeMap<LineNumber, String>,     contiguous_starting_indices: Vec<LineNumber>,     lowest_key: LineNumber, }  impl FileComments {-    pub fn new() -> Self {-        FileComments {-            comment_blocks: BTreeMap::new(),-            contiguous_starting_indices: vec![],-            lowest_key: 0,+    pub fn from_ruby_hash(h: VALUE) -> Self {+        let mut fc = FileComments::default();+        let keys;+        let values;+        unsafe {+            keys = ruby_array_to_slice(rb_funcall(h, intern!("keys"), 0));+            values = ruby_array_to_slice(rb_funcall(h, intern!("values"), 0));+        }+        if keys.len() != values.len() {+            raise("expected keys and values to have same length, indicates error");         }+        for (ruby_lineno, ruby_comment) in keys.iter().zip(values) {+            let lineno = unsafe { rubyfmt_rb_num2ll(*ruby_lineno) };+            if lineno < 0 {+                raise("line number negative");+            }+            let comment = unsafe { ruby_string_to_str(*ruby_comment) }+                .trim()+                .to_owned();+            fc.push_comment(lineno as _, comment);+        }+        fc     } -    pub fn from_buf<R: Read>(r: io::BufReader<R>) -> io::Result<Self> {-        lazy_static! {-            static ref RE: Regex = Regex::new("^ *#").unwrap();+    fn push_comment(&mut self, line_number: u64, l: String) {+        if self.lowest_key == 0 {+            self.lowest_key = line_number;         }-        let mut res = Self::new();-        for (idx, line) in r.lines().enumerate() {-            let l = line?;-            if RE.is_match(&l) {-                let line_number = (idx + 1) as LineNumber;-                if res.lowest_key == 0 {-                    res.lowest_key = line_number;-                } -                let last_line = res.contiguous_starting_indices.last();+        let last_line = self.contiguous_starting_indices.last(); -                let should_push = line_number == 1-                    || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));-                if should_push {-                    res.contiguous_starting_indices.push(line_number);-                }-                res.comment_blocks.insert(line_number, l);-            }+        let should_push =+            line_number == 1 || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));
        let should_push = last_line.copied().unwrap_or(0) == line_number - 1
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl ParserState {             }         }     }++    fn current_target(&self) -> &dyn LineTokenTarget {+        if self.breakable_entry_stack.is_empty() {+            &self.render_queue+        } else {+            self.breakable_entry_stack+                .last()+                .expect("we checked it's not empty")+        }
        self.breakable_entry_stack
            .last()
            .unwrap_or(&self.render_queue)
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl ParserState {                 None => 0,             }; +            let spaces = self.spaces_after_last_newline;+            debug!("spaces: {} comments: {:?}", spaces, self.comments_to_insert);+            let bt = Backtrace::new();+            debug!("{:?}", bt);

If we're at the point where we're printing backtraces into the debug logger, we should probably use the tracing library instead

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl ParserState {     pub fn render_heredocs(&mut self, skip: bool) {         while !self.heredoc_strings.is_empty() {             let mut next_heredoc = self.heredoc_strings.pop().expect("we checked it's there");-            let want_newline = match self.render_queue.last() {-                Some(x) => !x.is_newline(),-                None => true,-            };+            let want_newline = !self.current_target_mut().last_token_is_a_newline();
            let want_newline = !self.current_target().last_token_is_a_newline();
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl ParserState {         // valid utf8 and also we're only using this to newline match         // which should be very hard to break. The unsafe conversion         // here skips a utf8 check which is faster.+        // FIXME: Is the overhead of utf8 checking actually a bottleneck here?+        // The comment above even admits there are circumstances where it will+        // not be UTF-8         unsafe {             let s = str::from_utf8_unchecked(&data).to_string();             s.trim().chars().any(|v| v == '\n')
            }
        s.trim().contains('\n')

(It won't let me include the closing brace that I'm moving in this suggestion comment sorry

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 pub enum FormatError {     OtherRubyError = 4, } +// FIXME: Why does this need to be a string? We're just printing it to stdout pub fn format_buffer(buf: &str) -> Result<String, RichFormatError> {-    let tree = run_parser_on(buf)?;+    let (tree, file_comments) = run_parser_on(buf)?;     let out_data = vec![];     let mut output = Cursor::new(out_data);-    let data = buf.as_bytes();-    toplevel_format_program(&mut output, data, tree)?;-    output.flush().expect("flushing works");-    Ok(unsafe { String::from_utf8_unchecked(output.into_inner()) })+    toplevel_format_program(&mut output, tree, file_comments)?;+    output.flush().expect("flushing to a vec should never fail");+    Ok(String::from_utf8(output.into_inner()).expect("we never write invalid UTF-8")) }  #[no_mangle] pub extern "C" fn rubyfmt_init() -> libc::c_int {     init_logger();-    unsafe {-        ruby::ruby_init();+    let res = ruby_ops::setup_ruby();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;

Should the C API provide a way to get a human readable description of the error?

static mut LAST_ERROR: Option<CString> = None;

// on this line
if let Err(e) = res {
    // If we think there could be multi-threading here we should use `lazy_static!` and a mutex
    unsafe { LAST_ERROR = Some(CString::new(e.to_string()).expect("don't actually expect in prod")) }
    return InitStatus::ERROR as libc::c_int;
}

// later
#[no_mangle]
pub extern "C" fn rubyfmt_last_error -> *const libc::c_char {
    match LAST_ERROR {
        Some(e) => e.as_ptr(),
        None => ptr::null(),
    }
}
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 use std::collections::BTreeMap;-use std::io::{self, BufRead, Read};  use crate::comment_block::CommentBlock;+use crate::ruby::*; use crate::types::LineNumber; -use regex::Regex;--#[derive(Debug)]+#[derive(Debug, Default)] pub struct FileComments {     comment_blocks: BTreeMap<LineNumber, String>,     contiguous_starting_indices: Vec<LineNumber>,     lowest_key: LineNumber, }  impl FileComments {-    pub fn new() -> Self {-        FileComments {-            comment_blocks: BTreeMap::new(),-            contiguous_starting_indices: vec![],-            lowest_key: 0,+    pub fn from_ruby_hash(h: VALUE) -> Self {+        let mut fc = FileComments::default();+        let keys;+        let values;+        unsafe {+            keys = ruby_array_to_slice(rb_funcall(h, intern!("keys"), 0));+            values = ruby_array_to_slice(rb_funcall(h, intern!("values"), 0));+        }+        if keys.len() != values.len() {+            raise("expected keys and values to have same length, indicates error");         }+        for (ruby_lineno, ruby_comment) in keys.iter().zip(values) {+            let lineno = unsafe { rubyfmt_rb_num2ll(*ruby_lineno) };+            if lineno < 0 {+                raise("line number negative");+            }+            let comment = unsafe { ruby_string_to_str(*ruby_comment) }+                .trim()+                .to_owned();+            fc.push_comment(lineno as _, comment);+        }+        fc     } -    pub fn from_buf<R: Read>(r: io::BufReader<R>) -> io::Result<Self> {-        lazy_static! {-            static ref RE: Regex = Regex::new("^ *#").unwrap();+    fn push_comment(&mut self, line_number: u64, l: String) {+        if self.lowest_key == 0 {+            self.lowest_key = line_number;         }-        let mut res = Self::new();-        for (idx, line) in r.lines().enumerate() {-            let l = line?;-            if RE.is_match(&l) {-                let line_number = (idx + 1) as LineNumber;-                if res.lowest_key == 0 {-                    res.lowest_key = line_number;-                } -                let last_line = res.contiguous_starting_indices.last();+        let last_line = self.contiguous_starting_indices.last(); -                let should_push = line_number == 1-                    || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));-                if should_push {-                    res.contiguous_starting_indices.push(line_number);-                }-                res.comment_blocks.insert(line_number, l);-            }+        let should_push =+            line_number == 1 || (last_line.is_some() && last_line.unwrap() == &(line_number - 1));+        if should_push {+            self.contiguous_starting_indices.push(line_number);

I need to give a higher level overview of what this is for, but my gut tells me this is sidestepping the abstraction we need or that what this is could be made more clear

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl FileComments {          let mut sled = Vec::with_capacity(self.contiguous_starting_indices.len());         for key in self.contiguous_starting_indices.iter() {-            sled.push(self.comment_blocks.remove(key).expect("we tracked it"));+            sled.push(+                self.comment_blocks+                    .remove(key)+                    .unwrap_or_else(|| panic!("we tracked it: {} {:?}", key, self.comment_blocks)),

This panic is also making me think there's a better structure for this

penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

 impl CommentBlock {     pub fn into_line_tokens(self) -> Vec<LineToken> {         self.comments             .into_iter()-            .map(|v| LineToken::Comment { contents: v })+            .map(|c| LineToken::Comment { contents: c })             .collect()     } +    pub fn apply_spaces(self, indent_depth: ColNumber) -> Self {+        let new_strings = self+            .comments+            .into_iter()+            .map(|c| {+                let spaces = (0..indent_depth)+                    .map(|_| " ".to_string())+                    .collect::<Vec<String>>()+                    .join("");
                let spaces = str::repeat(" ", indent_depth)
penelopezone

comment created time in a month

Pull request review commentpenelopezone/rubyfmt

Refactor ruby interface

+use crate::file_comments::FileComments;+use crate::ruby::*;++pub fn setup_ruby() -> Result<(), ()> {+    unsafe {+        let res = ruby_setup();+        if res == 0 {+            Ok(())+        } else {+            Err(())+        }+    }+}++// Safety: This function expects an initialized Ruby VM+pub unsafe fn load_rubyfmt() -> Result<(), ()> {+    let rubyfmt_program = include_str!("../rubyfmt_lib.rb");+    eval_str(rubyfmt_program)?;+    Ok(())+}++#[derive(Debug, Copy, Clone)]+pub struct Parser(VALUE);++#[derive(Debug, Clone)]+pub enum ParseError {+    SyntaxError,+    OtherRubyError(String),+}++impl Parser {+    unsafe extern "C" fn real_run_parser(parser_instance: VALUE) -> VALUE {+        rb_funcall(parser_instance, intern!("parse"), 0)+    }++    pub fn new(buf: &str) -> Self {+        unsafe {+            let buffer_string = rb_utf8_str_new(buf.as_ptr() as _, buf.len() as i64);+            let parser_class = rb_const_get_at(rb_cObject, intern!("Parser"));+            let parser_instance = rb_funcall(parser_class, intern!("new"), 1, buffer_string);+            Parser(parser_instance)+        }+    }++    pub fn parse(self) -> Result<(RipperTree, FileComments), ParseError> {+        let mut state = 0;+        let maybe_tree_and_comments =+            unsafe { rb_protect(Parser::real_run_parser as _, self.0 as _, &mut state) };+        if state == 0 {+            if maybe_tree_and_comments != Qnil {+                let tree_and_comments = unsafe { ruby_array_to_slice(maybe_tree_and_comments) };+                if let [tree, comments] = tree_and_comments {+                    let fc = FileComments::from_ruby_hash(*comments);+                    Ok((RipperTree::new(*tree), fc))+                } else {+                    panic!(+                        "expected tree to contain two elements, actually got: {}",+                        tree_and_comments.len(),+                    )+                }+            } else {+                Err(ParseError::SyntaxError)+            }+        } else {+            let s = current_exception_as_rust_string();+            Err(ParseError::OtherRubyError(s))+        }+    }+}++#[derive(Clone, Copy, Debug)]+pub struct RipperTree(VALUE);

Given that this just has a safe constructor and never asserts that the inner type is actually what is expected, I'm not entirely sure what value this brings

penelopezone

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

push eventpenelopezone/rubyfmt

Siân Griffin

commit sha c638384f7a132391286f0f541f9c70ac30efccea

make it compile

view details

push time in a month

Pull request review commentpenelopezone/rubyfmt

Partial audit of unsafe code, refactor to reduce its usage

 pub fn raise(s: &str) {         rb_raise(rb_eRuntimeError, cstr.as_ptr());     } }++// Safety: The given VALUE must be a valid Ruby array. The lifetime is not+// checked by the compiler. The returned slice must not outlive the given+// Ruby array. The Ruby array must not be modified while the returned slice+// is live.+pub unsafe fn ruby_array_to_slice<'a>(ary: VALUE) -> &'a [VALUE] {+    std::slice::from_raw_parts(rubyfmt_rb_ary_ptr(ary), rubyfmt_rb_ary_len(ary) as _)

Yes, that's an additional step I'd like to take if we can guarantee that VALUE is always a valid pointer, but if we're requiring the caller to verify invariants anyway we may as well allow this one as well

sgrif

comment created time in a month

PullRequestReviewEvent

push eventsgrif/rubyfmt

Siân Griffin

commit sha 1e8fb96949824fb867cefe126ad382cc0a92f3ba

cargo fmt

view details

push time in a month

Pull request review commentpenelopezone/rubyfmt

Partial audit of unsafe code, refactor to reduce its usage

 extern "C" { }  pub fn current_exception_as_rust_string() -> String {-    unsafe {-        let res = eval_str("$!.inspect").expect("this can't fail");-        let ptr = rubyfmt_rstring_ptr(res);-        let length = rubyfmt_rstring_len(res);-        String::from_raw_parts(ptr as _, length as _, length as _)

This was the double free mentioned in the PR description. Clearly this branch isn't executed on any files we currently test on, but this would result in the same segfault seen earlier

sgrif

comment created time in a month

PullRequestReviewEvent

PR opened penelopezone/rubyfmt

Partial audit of unsafe code, refactor to reduce its usage

This is the result of a partial audit of all unsafe code in rubyfmt. I found one double free, and many instances where unsafe was used more than it was needed. In some cases this was caused by omitting extremely cheap safety checks like UTF-8 validation, and in some cases it was redundant code converting arrays and strings to Rust versions. I've added UTF-8 checking where it's clearly not going to cause a performance problem, commented where it's not clear, and moved array and string conversion to a common function that's easier to audit.

<!-- Hi there! Thanks for taking the time to file a pull request against Rubyfmt Right now we're accepting CLI ergonomics PRs, Editor Integration PRs, and bug fixes only. We define bugs as Rubyfmt failing to format a file, or formatting a file such that it's behaviour changes. If you're trying to change the behaviour of the formatter, or style the output, please don't file the PR. We're working on getting that just right, and will accept those in a future release. -->

+81 -57

0 comment

6 changed files

pr created time in a month

push eventsgrif/rubyfmt

Siân Griffin

commit sha ebb8115be6743e0d5d546ca07993afda60feb9d2

Fix double free in `debug_inspect` (#231) This was the cause of the segfault that earlier commits in this branch tried to fix. They actually fixed it by not calling `debug_inspect`, the way it was performing iteration was fine (other than the signature of the callback missing an argument, which could fuck things up depending on the ABI of the target). The source of this bug is that `CString` is being constructed with a pointer we don't own. `CString` is an owned string, which frees its memory on drop. This results in a double free at best. The reason this manifested as it did was because we're not just freeing a pointer we don't own, we're freeing a pointer that was allocated by a different allocator. Since we didn't build ruby with jemalloc, and we aren't compiling our jemalloc with unprefixed symbols (we should totally do that), jemalloc would end up corrupting its own book keeping when we freed it. This is also why the segfault disappeared if we used the system allocator in Rust, or unprefixed symbols in jemalloc. When we do either of those, it becomes "just" a double free, since the string is never read again. Even the double free was unlikely to manifest since GC almost certainly never runs again after Ripper finishes parsing.

view details

Siân Griffin

commit sha 5ee4195b3b06b868f8f27395f55488958be2c8bb

Partial audit of unsafe code, refactor to reduce its usage This is the result of a partial audit of all unsafe code in rubyfmt. I found one double free, and many instances where unsafe was used more than it was needed. In some cases this was caused by omitting extremely cheap safety checks like UTF-8 validation, and in some cases it was redundant code converting arrays and strings to Rust versions. I've added UTF-8 checking where it's clearly not going to cause a performance problem, commented where it's not clear, and moved array and string conversion to a common function that's easier to audit.

view details

push time in a month

create barnchsgrif/rubyfmt

branch : sg-refactor-unsafe

created branch time in a month

issue openedrust-lang/rust-clippy

Lint for using the same expression for both length and capacity when constructing a vec/string

What it does

This lint checks for calls to String::from_raw_parts and Vec::from_raw_parts, and triggers when the same expression is used for both the length and capacity. If folks are interested in a pull request for this lint, I am happy to do the work.

Categories (optional)

  • Kind: pedantic

What is the advantage of the recommended code over the original code

When code is written this way, the author likely intended to use a borrowed type instead. The idea for this lint came after noticing a friend of mine consistently reaching for String and not &str, which requires inventing a capacity. The author will likely use the length in that case. Unless the string/vec was created using with_capacity or had shrink_to_fit called on it, it's unlikely that the length and capacity are the same.

Drawbacks

This is trying to catch a beginner mistake, and can potentially trigger on valid code. It's unclear how common this mistake actually is

Example

Vec::from_raw_parts(ptr, len, len)

Could be written as:

slice::from_raw_parts(ptr, len)
// or
Vec::from_raw_parts(ptr, len, cap)

created time in a month

issue openedrust-lang/rust-clippy

Restriction lint for constructing a type with a Drop impl from a raw pointer

What it does

This lint would trigger whenever a function is called that returns an owned type from a raw pointer such as CString::from_raw, String::from_raw_parts, Vec::from_raw_parts, or Box::from_raw. I believe the exact conditions that this lint should trigger are:

  • Does not take &self
  • Takes a *const T or *mut T as any of its arguments
  • Returns a type that is !Copy

But I'd love some feedback on that (especially the third bit, is !Copy the right way to express both a type that impls Drop and a type that has meaningful drop glue?

This would be useful in code bases such as Diesel or rubyfmt, where there is a significant amount of FFI, but it is rare that data owned by Rust is getting passed as a raw pointer. In those code bases, constructing an owned type like String instead of &str is almost always wrong. This is a mistake that's very easy to make, especially for newer Rust programmers, and I would love to require an explicit "no I'm actually sure I own this pointer" in the very few cases where that is actually the case.

I am happy to do the work of implementing this if folks are interested in receiving a PR for this.

Categories (optional)

  • Kind: Restruction lint

The recommended code over the original is that the original will introduce undefined behavior.

Drawbacks

As it is a restriction lint, it's unlikely to be useful to many people.

Example

CString::from_raw(ptr)

Could be written as:

CStr::from_ptr(ptr)

This lint would not be able to make a recommendation in 100% of cases. Special cases would be made for the applicable types in std (Vec, Box, CString, String). This lint would still trigger for Arc and Rc, but I'm not sure if it should recommend & instead or just note that constructing it will result in decreasing the ref count and possibly freeing the value.

created time in a month

PR opened penelopezone/rubyfmt

Fix double free in `debug_inspect`

This was the cause of the segfault that earlier commits in this branch tried to fix. They actually fixed it by not calling debug_inspect, the way it was performing iteration was fine (other than the signature of the callback missing an argument, which could fuck things up depending on the ABI of the target).

The source of this bug is that CString is being constructed with a pointer we don't own. CString is an owned string, which frees its memory on drop. This results in a double free at best.

The reason this manifested as it did was because we're not just freeing a pointer we don't own, we're freeing a pointer that was allocated by a different allocator. Since we didn't build ruby with jemalloc, and we aren't compiling our jemalloc with unprefixed symbols (we should totally do that), jemalloc would end up corrupting its own book keeping when we freed it.

This is also why the segfault disappeared if we used the system allocator in Rust, or unprefixed symbols in jemalloc. When we do either of those, it becomes "just" a double free, since the string is never read again. Even the double free was unlikely to manifest since GC almost certainly never runs again after Ripper finishes parsing.

<!-- Hi there! Thanks for taking the time to file a pull request against Rubyfmt Right now we're accepting CLI ergonomics PRs, Editor Integration PRs, and bug fixes only. We define bugs as Rubyfmt failing to format a file, or formatting a file such that it's behaviour changes. If you're trying to change the behaviour of the formatter, or style the output, please don't file the PR. We're working on getting that just right, and will accept those in a future release. -->

+2 -2

0 comment

1 changed file

pr created time in a month

create barnchsgrif/rubyfmt

branch : sg-ub

created branch time in a month

issue commentdiesel-rs/diesel

Async I/O

@GopherJ If you're interested in helping us add support, feel free to come work with us on adding support. But your comment reeks of entitlement and is not welcome here.

killercup

comment created time in a month

push eventsgrif/talks

Siân Griffin

commit sha 85ea644d0475bfa02f70f27b6851040295560a62

wip

view details

Siân Griffin

commit sha 852e380d66f73b833cd48ae01ce059263ada9f54

Finished pokemon slides

view details

Siân Griffin

commit sha 9f50c241129cad08c18d0b564ee9945077c905b4

Improve zubat image

view details

push time in 2 months

push eventsgrif/talks

Siân Griffin

commit sha 3cc98cab7acb2de7ea4b9216cf3dad8ea4cdb318

wip

view details

push time in 2 months

PR opened rust-lang/team

Update sgrif.toml
+1 -1

0 comment

1 changed file

pr created time in 2 months

push eventsgrif/team

Siân Griffin

commit sha 8f1e5c1524437c5510c15604edbbdebaafc92307

Update sgrif.toml

view details

push time in 2 months

fork sgrif/team

Rust teams structure

fork in 2 months

issue closedrust-lang/crates.io

Crate page for async-stdio shows incorrect version number and dependencies

https://crates.io/crates/async-stdio shows a version of 0.0.0.0 and no dependencies, while the versions list shows a list of published versions and https://gitlab.com/jrobsonchase/async-stdio/-/blob/master/Cargo.toml shows multiple dependencies.

closed time in 2 months

jdm

issue commentrust-lang/crates.io

Crate page for async-stdio shows incorrect version number and dependencies

John's comment is correct. The crate page shows the last non-prerelease version. This behavior is expected. Once you release a non-prerelease version > 0.0.0, that page will update

jdm

comment created time in 2 months

push eventrust-lang/crates.io

Siân Griffin

commit sha 380be53592e527620539bc4371d9a3504500414f

Fix the build while keeping clippy happy

view details

push time in 2 months

push eventrust-lang/crates.io

Siân Griffin

commit sha 7d7b71880b2d37aac06d74ea8f9a46ee1d574225

Make clippy happy

view details

push time in 2 months

push eventrust-lang/crates.io

Pietro Albini

commit sha d0dd084ae7260cf885c5a7c88d42ee3e622d7f29

Generate API tokens with a secure RNG, store hashed This addresses a security issue with the way crates.io generates API tokens. Prior to this commit, they were generated using the PostgreSQL `random` function, which is not a secure PRNG. Additionally, the tokens were stored in plain text, which would give an attacker who managed to compromise our database access to all user's API tokens. This commit addresses both changes. An advisory about the problem was posted at https://blog.rust-lang.org/2020/07/14/crates-io-security-advisory.html The tokens are now generated using the OS's random number generator, which maps to C's `getrandom` function, which is secure and unpredictable. The tokens are hashed using sha256. The choice to use a fast hashing function such as this one instead of one typically used for passwords (such as bcrypt or argon2) was intentional. Unlike passwords, our API tokens are known to be 32 characters and are truly random, giving us 192 bits of entropy. This means that even with a fast hashing function, actually finding a token from that hash before the death of human civilization is infeasible. Additionally, unlike passwords, API tokens need to be checked on every request where they're used, instead of once at sign in. This means that a slower hashing function would put significantly more load on our server than they would when used for passwords. We opted to use sha256 instead of bcrypt with a lower cost due to the mandatory salt in bcrypt. If we salt the values before hashing them, the tokens can no longer directly be used to identify themselves, and we would need to include another identifier in the token given to the user. While this is feasible, it leads to a very obtuse looking token, and more complex code.

view details

push time in 2 months

PR opened rust-lang/crates.io

Generate API tokens with a secure RNG, store hashed

Note: The contents of this PR have already been deployed to production, as this was a security issue. The code in this PR is the minimum number of changes needed to rebase what was deployed today onto master. Unfortunately, since this was done in parallel with some refactoring to the errors module, this means some of the added code doesn't look like the rest of the code base. However, master currently does not reflect what is in production, so any cleanup should come as a followup PR. The purpose of this is mainly to get master in sync with prod after going through bors.

This addresses a security issue with the way crates.io generates API tokens. Prior to this commit, they were generated using the PostgreSQL random function, which is not a secure PRNG. Additionally, the tokens were stored in plain text, which would give an attacker who managed to compromise our database access to all user's API tokens. This commit addresses both changes.

An advisory about the problem was posted at https://blog.rust-lang.org/2020/07/14/crates-io-security-advisory.html

The tokens are now generated using the OS's random number generator, which maps to C's getrandom function, which is secure and unpredictable.

The tokens are hashed using sha256. The choice to use a fast hashing function such as this one instead of one typically used for passwords (such as bcrypt or argon2) was intentional. Unlike passwords, our API tokens are known to be 32 characters and are truly random, giving us 192 bits of entropy. This means that even with a fast hashing function, actually finding a token from that hash before the death of human civilization is infeasible.

Additionally, unlike passwords, API tokens need to be checked on every request where they're used, instead of once at sign in. This means that a slower hashing function would put significantly more load on our server than they would when used for passwords.

We opted to use sha256 instead of bcrypt with a lower cost due to the mandatory salt in bcrypt. If we salt the values before hashing them, the tokens can no longer directly be used to identify themselves, and we would need to include another identifier in the token given to the user. While this is feasible, it leads to a very obtuse looking token, and more complex code.

r? @jtgeibel

+246 -55

0 comment

21 changed files

pr created time in 2 months

create barnchrust-lang/crates.io

branch : sg-security-advisory-2020-07-14

created branch time in 2 months

push eventsgrif/talks

Siân Griffin

commit sha 85d90c575bb899a81e78057db6c53680ff4b5e3f

wip

view details

push time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();+    let len = bytes.len();+    eprintln!("format 2");+    let fb = rubyfmt_format_buffer(FormatBuffer {+        bytes: bytes.as_ptr(),+        count: len as i64,+    });+    eprintln!("format 3");+    fb.into_string()+}++#[no_mangle]+pub extern "C" fn rubyfmt_format_buffer(buf: FormatBuffer) -> FormatBuffer {+    let output_data = Vec::with_capacity(buf.count as usize);+    let mut output = Cursor::new(output_data);+    let tree = run_parser_on(buf);+    if tree.is_err() {+        unsafe {+            let cstr = CString::new("oh no").expect("we just made it");+            ruby::rb_raise(ruby::rb_eRuntimeError, cstr.as_ptr());+        }+    }+    let tree = tree.expect("we raised");+    let data = buf.into_buf();+    let res = toplevel_format_program(&mut output, data, tree);+    raise_if_error(res);+    let output_data = output.into_inner().into_boxed_slice();+    let ptr = output_data.as_ptr();+    let len = output_data.len();+    let fb = FormatBuffer {+        bytes: ptr as *const libc::c_char,+        count: len as i64,+    };+    std::mem::forget(output_data);+    fb+}++fn load_rubyfmt() -> Result<VALUE, ()> {+    let rubyfmt_program = include_str!("../rubyfmt_lib.rb");+    eval_str(rubyfmt_program)+}++fn load_ripper() -> Result<(), ()> {+    // trick ruby in to thinking ripper is already loaded+    eval_str(+        r#"+    $LOADED_FEATURES << "ripper.bundle"+    $LOADED_FEATURES << "ripper.so"+    $LOADED_FEATURES << "ripper.rb"+    $LOADED_FEATURES << "ripper/core.rb"+    $LOADED_FEATURES << "ripper/sexp.rb"+    $LOADED_FEATURES << "ripper/filter.rb"+    $LOADED_FEATURES << "ripper/lexer.rb"+    "#,+    )?;++    // init the ripper C module+    unsafe { Init_ripper() };++    //load each ripper program+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/core.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/lexer.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/filter.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/sexp.rb"+    ))?;++    Ok(())+}++fn eval_str(s: &str) -> Result<VALUE, ()> {+    unsafe {+        let rubyfmt_program_as_c = CString::new(s).expect("it should become a c string");+        let mut state = 0;+        let v = ruby::rb_eval_string_protect(+            rubyfmt_program_as_c.as_ptr(),+            &mut state as *mut libc::c_int,+        );+        if state != 0 {+            Err(())+        } else {+            Ok(v)+        }+    }+}++fn toplevel_format_program<W: Write>(writer: &mut W, buf: &[u8], tree: VALUE) -> RubyfmtResult {+    let line_metadata = FileComments::from_buf(BufReader::new(buf))+        .expect("failed to load line metadata from memory");+    let mut ps = ParserState::new(line_metadata);+    let v: ripper_tree_types::Program = de::from_value(tree)?;++    format::format_program(&mut ps, v);++    ps.write(writer)?;+    writer.flush().expect("it flushes");+    Ok(())+}++fn raise_if_error(value: RubyfmtResult) {+    if let Err(e) = value {+        unsafe {+            // If the string contains nul, just display the error leading up to+            // the nul bytes+            let c_string = CString::from_vec_unchecked(e.to_string().into_bytes());+            ruby::rb_raise(ruby::rb_eRuntimeError, c_string.as_ptr());+        }+    }+}++fn intern(s: &str) -> ruby::ID {+    unsafe {+        let ruby_string = CString::new(s).expect("it's a string");+        ruby::rb_intern(ruby_string.as_ptr())+    }+}++fn run_parser_on(buf: FormatBuffer) -> Result<VALUE, ()> {+    unsafe {+        let buffer_string = ruby::rb_utf8_str_new(buf.bytes, buf.count);+        let parser_class = eval_str("Parser")?;+        let parser_instance = ruby::rb_funcall(parser_class, intern("new"), 1, buffer_string);+        let tree = ruby::rb_funcall(parser_instance, intern("parse"), 0);+        Ok(tree)+    }+}
macro_rules! intern {
    ($s:str) => {
        ruby::rb_intern(concat!($s, "\0".as_ptr*() as _))
    }
}

fn run_parser_on(buf: FormatBuffer) -> Result<VALUE, ()> {
    unsafe {
        let buffer_string = ruby::rb_utf8_str_new(buf.bytes, buf.count);
        let parser_class = eval_str("Parser")?;
        let parser_instance = ruby::rb_funcall(parser_class, intern!("new"), 1, buffer_string);
        let tree = ruby::rb_funcall(parser_instance, intern!("parse"), 0);
        Ok(tree)
    }
}
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}

So if you were to try to keep this type, here's how I would write it to eliminate the unsoundness. I've noted the differences where they occur

#[repr(C)]
#[derive(Debug)] // This represents an owned, heap allocated pointer. To ensure it doesn't leak, it must have a `Drop` impl. It may also have methods that return similar owned types like `Vec`, `String`, or `Box`. It absolutely should not impl `Copy`, and a `Clone` impl would need to be manual and copy the memory being pointed to.

struct FormatBuffer {
    ptr: NonNull<u8>, // Encode as much as we can here. Using `u8`, C can cast the pointer all they want.
    len: usize, // Again, using the Rust type that's convenient. This is size_t or uintptr_t in C.
}

impl FormatBuffer {
    // Taking Into<Vec<u8>> since you're calling with String and Vec<u8>. `into_boxed_slice` copies, we can avoid
    // that if we keep the capacity around as well. Most importantly though, we provide a constructor which consumes
    // the input, leaving now room for one caller to accidentally use `.as_ptr()` instead of into_raw or similar. This 
    // struct should also live in its own file so its fields are private (C can of course see them, but we Rust callers
    // shouldn't).
    fn new<T: Into<Vec<u8>>>(bytes: T) -> Self {
        let bytes = bytes.into().into_boxed_slice();
        let len = bytes.len();
        Self {
            // safety: Box::into_raw never returns null. Use Box::into_raw_nonnull when stable
            ptr: unsafe { NonNull::new_unchecked(Box::into_raw(bytes) as _) },
            len,
        }
    }

    // Does not return 'static. Instead returns a proper lifetime.
    pub fn as_bytes(&self) -> &[u8] {
        // safety: We know we were constructed with a valid pointer
        unsafe { slice::from_raw_parts(self.bytes, self.len) }
    }

    // Returns `&str` instead of `String`. We own the pointer, we don't want to be returning
    // other types which own it.
    pub fn as_str(&self) -> &str {
        // Uses checked, we didn't validate UTF-8 on input. Even if we did, let's avoid unsafe
        str::from_utf8(self.as_bytes()).expect("Invalid UTF-8")
    }
}

impl Drop for FormatBuffer {
    fn drop(&mut self) {
        unsafe {
            let ptr = ptr::slice_from_raw_parts(self.ptr, self.len);
            drop(Box::from_raw(ptr));
        }
    }
}
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();

This is moot since you're getting rid of FormatBuffer, but you could just cast the pointer for the same effect but without a copy

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();+    let len = bytes.len();+    eprintln!("format 2");+    let fb = rubyfmt_format_buffer(FormatBuffer {+        bytes: bytes.as_ptr(),+        count: len as i64,+    });+    eprintln!("format 3");+    fb.into_string()+}++#[no_mangle]+pub extern "C" fn rubyfmt_format_buffer(buf: FormatBuffer) -> FormatBuffer {+    let output_data = Vec::with_capacity(buf.count as usize);+    let mut output = Cursor::new(output_data);+    let tree = run_parser_on(buf);+    if tree.is_err() {+        unsafe {+            let cstr = CString::new("oh no").expect("we just made it");+            ruby::rb_raise(ruby::rb_eRuntimeError, cstr.as_ptr());+        }+    }+    let tree = tree.expect("we raised");+    let data = buf.into_buf();+    let res = toplevel_format_program(&mut output, data, tree);+    raise_if_error(res);+    let output_data = output.into_inner().into_boxed_slice();+    let ptr = output_data.as_ptr();+    let len = output_data.len();+    let fb = FormatBuffer {+        bytes: ptr as *const libc::c_char,+        count: len as i64,+    };+    std::mem::forget(output_data);+    fb+}++fn load_rubyfmt() -> Result<VALUE, ()> {+    let rubyfmt_program = include_str!("../rubyfmt_lib.rb");+    eval_str(rubyfmt_program)+}++fn load_ripper() -> Result<(), ()> {+    // trick ruby in to thinking ripper is already loaded+    eval_str(+        r#"+    $LOADED_FEATURES << "ripper.bundle"+    $LOADED_FEATURES << "ripper.so"+    $LOADED_FEATURES << "ripper.rb"+    $LOADED_FEATURES << "ripper/core.rb"+    $LOADED_FEATURES << "ripper/sexp.rb"+    $LOADED_FEATURES << "ripper/filter.rb"+    $LOADED_FEATURES << "ripper/lexer.rb"+    "#,+    )?;++    // init the ripper C module+    unsafe { Init_ripper() };++    //load each ripper program+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/core.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/lexer.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/filter.rb"+    ))?;+    eval_str(include_str!(+        "../ruby_checkout/ruby-2.6.6/ext/ripper/lib/ripper/sexp.rb"+    ))?;++    Ok(())+}++fn eval_str(s: &str) -> Result<VALUE, ()> {

One thing I noticed is that you never call this with anything other than &'static str. If the input string were always nul terminated you could skip some copies (but that would require either taking CStr, an annoying type to construct, or assuming &str is always nul terminated, which would make it unsafe). However, since you are only ever taking literals, you could make this a macro and safely do it.

macro_rules! eval_str {
    ($s:expr) => {
        let mut state = 0;
        ruby::rb_eval_string_protect(
            concat!(s, "\0").as_ptr(),
            &mut state,
        );
        if state != 0 {
            return Err(())
        }
    }
}

(note: I omitted the return value since it's never used)

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+extern crate glob;+extern crate libc;+extern crate rubyfmt;++use std::fs::{metadata, read_to_string, OpenOptions};+use std::io::{self, Read, Write};+use std::path::PathBuf;++use glob::glob;++fn rubyfmt_file(file_path: PathBuf) -> io::Result<()> {+    let buffer = read_to_string(file_path.clone())?;+    let res = rubyfmt::format_buffer(buffer);+    let mut file = OpenOptions::new()+        .write(true)+        .open(file_path)+        .expect("file");+    write!(file, "{}", res)?;

Same thing here, I'd consider having format_buffer take &mut Write and pass the file directly (probably with a BufWriter)

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+extern crate glob;+extern crate libc;+extern crate rubyfmt;++use std::fs::{metadata, read_to_string, OpenOptions};+use std::io::{self, Read, Write};+use std::path::PathBuf;++use glob::glob;++fn rubyfmt_file(file_path: PathBuf) -> io::Result<()> {+    let buffer = read_to_string(file_path.clone())?;+    let res = rubyfmt::format_buffer(buffer);+    let mut file = OpenOptions::new()+        .write(true)+        .open(file_path)+        .expect("file");+    write!(file, "{}", res)?;+    Ok(())+}++fn rubyfmt_dir(path: &String) -> io::Result<()> {+    for entry in glob(&format!("{}/**/*.rb", path)).expect("it exists") {+        let p = entry.expect("should not be null");+        rubyfmt_file(p)?;+    }+    Ok(())+}++fn format_parts(parts: &[String]) {+    for part in parts {+        if let Ok(md) = metadata(part) {+            if md.is_dir() {+                rubyfmt_dir(part).expect("failed to format dir");+            } else if md.is_file() {+                rubyfmt_file(part.into()).expect("failed to format file");+            }+        }+    }+}++fn main() {+    let res = rubyfmt::rubyfmt_init();+    if res != rubyfmt::InitStatus::OK as libc::c_int {+        panic!("bad init status");+    }+    let args: Vec<String> = std::env::args().collect();+    if args.len() == 1 {+        let mut buffer = String::new();+        io::stdin()+            .read_to_string(&mut buffer)+            .expect("reading frmo stdin to not fail");+        let res = rubyfmt::format_buffer(buffer);+        write!(io::stdout(), "{}", res).expect("write works");+        io::stdout().flush().expect("flush works");+    } else if args.len() == 2 {+        eprintln!("1");+        let buffer = read_to_string(args[1].clone()).expect("file exists");+        eprintln!("2");+        let res = rubyfmt::format_buffer(buffer);+        eprintln!("3");+        write!(io::stdout(), "{}", res).expect("write works");+        eprintln!("4");+        io::stdout().flush().expect("flush works");+    } else if args[1] == "-i" {+        let parts = &args[2..args.len()];+        format_parts(parts);+    } else {+        let parts = &args[1..args.len()];+        format_parts(parts);+    }+}

It sounds like the point of this is to test that the API C would call actually works, but it's worth noting that you can skip a lot of allocation and buffering by calling into toplevel_format_program, passing stdout or the file that you're writing to directly

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();+    let len = bytes.len();+    eprintln!("format 2");+    let fb = rubyfmt_format_buffer(FormatBuffer {+        bytes: bytes.as_ptr(),+        count: len as i64,+    });+    eprintln!("format 3");+    fb.into_string()+}++#[no_mangle]+pub extern "C" fn rubyfmt_format_buffer(buf: FormatBuffer) -> FormatBuffer {+    let output_data = Vec::with_capacity(buf.count as usize);+    let mut output = Cursor::new(output_data);+    let tree = run_parser_on(buf);+    if tree.is_err() {+        unsafe {+            let cstr = CString::new("oh no").expect("we just made it");+            ruby::rb_raise(ruby::rb_eRuntimeError, cstr.as_ptr());+        }+    }+    let tree = tree.expect("we raised");+    let data = buf.into_buf();+    let res = toplevel_format_program(&mut output, data, tree);+    raise_if_error(res);+    let output_data = output.into_inner().into_boxed_slice();+    let ptr = output_data.as_ptr();+    let len = output_data.len();+    let fb = FormatBuffer {+        bytes: ptr as *const libc::c_char,+        count: len as i64,+    };+    std::mem::forget(output_data);+    fb

I know you're getting rid of FormatBuffer, but this is how I'd write this if you were keeping it. Also worth noting that this leaks, and I'd consider including the capacity on the struct to avoid the realloc here.

    output.get_mut().shrink_to_fit();
    let (ptr, len, _) = output.into_inner().into_raw_parts();
    FormatBuffer {
        bytes: ptr as *const libc::c_char,
        count: len as i64,
    }
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();+    let len = bytes.len();+    eprintln!("format 2");+    let fb = rubyfmt_format_buffer(FormatBuffer {+        bytes: bytes.as_ptr(),+        count: len as i64,+    });+    eprintln!("format 3");+    fb.into_string()+}++#[no_mangle]+pub extern "C" fn rubyfmt_format_buffer(buf: FormatBuffer) -> FormatBuffer {+    let output_data = Vec::with_capacity(buf.count as usize);+    let mut output = Cursor::new(output_data);+    let tree = run_parser_on(buf);+    if tree.is_err() {+        unsafe {+            let cstr = CString::new("oh no").expect("we just made it");+            ruby::rb_raise(ruby::rb_eRuntimeError, cstr.as_ptr());+        }+    }+    let tree = tree.expect("we raised");
    let tree = match run_parser_on(buf) {
        Err(_) => return ruby::rb_raise(ruby::rb_eRuntimeError, "oh no\0".as_ptr() as _),
        Ok(tree) => tree,
    }
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Cursor, Write};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(+                self.bytes as *mut u8,+                self.count as usize,+                self.count as usize,+            );+            String::from_utf8_unchecked(vec)+        }+    }+}++#[derive(Debug, Copy, Clone)]+pub enum InitStatus {+    OK = 0,+    ERROR = 1,+}++#[no_mangle]+pub extern "C" fn rubyfmt_init() -> libc::c_int {+    init_logger();+    unsafe {+        ruby::ruby_init();+    }+    let res = load_ripper();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    let res = load_rubyfmt();+    if res.is_err() {+        return InitStatus::ERROR as libc::c_int;+    }++    InitStatus::OK as libc::c_int+}++pub fn format_buffer(buf: String) -> String {+    eprintln!("format 1");+    let bytes: Vec<libc::c_char> = buf+        .into_bytes()+        .into_iter()+        .map(|v| v as libc::c_char)+        .collect();+    let len = bytes.len();+    eprintln!("format 2");+    let fb = rubyfmt_format_buffer(FormatBuffer {+        bytes: bytes.as_ptr(),+        count: len as i64,+    });+    eprintln!("format 3");+    fb.into_string()+}++#[no_mangle]+pub extern "C" fn rubyfmt_format_buffer(buf: FormatBuffer) -> FormatBuffer {+    let output_data = Vec::with_capacity(buf.count as usize);+    let mut output = Cursor::new(output_data);+    let tree = run_parser_on(buf);+    if tree.is_err() {+        unsafe {+            let cstr = CString::new("oh no").expect("we just made it");+            ruby::rb_raise(ruby::rb_eRuntimeError, cstr.as_ptr());
            ruby::rb_raise(ruby::rb_eRuntimeError, "oh no\0".as_ptr() as _);
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+//#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Write, Cursor};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{CombinedLogger, Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}

Because you're reimplementing it anyway, why not do it without unsafe code

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+//#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Write, Cursor};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{CombinedLogger, Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}

What do you think about having this wrap Box<str> or Box<CStr>?

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+//#![deny(warnings, missing_copy_implementations)]+use std::ffi::CString;+#[macro_use]+extern crate lazy_static;++use std::io::{BufReader, Write, Cursor};+use std::slice;++#[global_allocator]+static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;++pub type RawStatus = i64;++mod breakable_entry;+mod comment_block;+mod de;+mod delimiters;+mod file_comments;+mod format;+mod intermediary;+mod line_metadata;+mod line_tokens;+mod parser_state;+mod render_queue_writer;+mod ripper_tree_types;+mod ruby;+mod types;++use file_comments::FileComments;+use parser_state::ParserState;+use ruby::VALUE;++#[cfg(debug_assertions)]+use log::debug;+#[cfg(debug_assertions)]+use simplelog::{CombinedLogger, Config, LevelFilter, TermLogger, TerminalMode};++type RubyfmtResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;++extern "C" {+    pub fn Init_ripper();+}++#[repr(C)]+#[derive(Debug, Copy, Clone)]+pub struct FormatBuffer {+    pub bytes: *const libc::c_char,+    pub count: i64,+}++impl FormatBuffer {+    pub fn into_buf(self) -> &'static [u8] {+        unsafe { slice::from_raw_parts(self.bytes as *const u8, self.count as usize) }+    }++    pub fn into_string(self) -> String {+        unsafe {+            let vec = Vec::from_raw_parts(self.bytes as *mut u8, self.count as usize, self.count as usize);+            let res = String::from_utf8_unchecked(vec);+            return res
            String::from_utf8_unchecked(vec)
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+use std::process::Command;+use std::io::{self, Write};+use std::path::Path;++fn main() {+    let path = std::env::current_dir().expect("is current");+    let ruby_checkout_path = format!("{}/ruby_checkout/ruby-2.6.6", path.display());+    if !Path::new(&format!("{}/libruby.2.6-static.a", ruby_checkout_path)).exists() {
    if !path.join("ruby_checkout/ruby-2.6.6/libruby.2.6-static.a").exists() {
penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+use std::process::Command;+use std::io::{self, Write};+use std::path::Path;++fn main() {+    let path = std::env::current_dir().expect("is current");+    let ruby_checkout_path = format!("{}/ruby_checkout/ruby-2.6.6", path.display());+    if !Path::new(&format!("{}/libruby.2.6-static.a", ruby_checkout_path)).exists() {+        let o = Command::new("bash")+            .arg("-c")+            .arg(format!("{}/configure && make -j", ruby_checkout_path))+            .current_dir(&ruby_checkout_path)+            .output().expect("works1 ");+        if !o.status.success() {+            io::stdout().write_all(&o.stdout).unwrap();+            io::stderr().write_all(&o.stderr).unwrap();+            panic!("failed subcommand");+        }+    }+    if !Path::new(&format!("{}/ruby_checkout/ruby-2.6.6/libripper.2.6-static.a", path.display())).exists() {+        let o = Command::new("bash")+            .arg("-c")+            .arg("ar crus libripper.2.6-static.a ext/ripper/ripper.o")+            .current_dir(&ruby_checkout_path)+            .output().expect("works");+        if !o.status.success() {+            panic!("failed subcommand");+        }+    }+    cc::Build::new()+        .file("src/rubyfmt.c")+        .include(format!("{}/include", ruby_checkout_path))+        .include(format!("{}/.ext/include/x86_64-darwin19", ruby_checkout_path))

"Works on my machine"

penelopezone

comment created time in 3 months

Pull request review commentpenelopezone/rubyfmt

Embed ruby

+use std::process::Command;+use std::io::{self, Write};+use std::path::Path;++fn main() {+    let path = std::env::current_dir().expect("is current");+    let ruby_checkout_path = format!("{}/ruby_checkout/ruby-2.6.6", path.display());+    if !Path::new(&format!("{}/libruby.2.6-static.a", ruby_checkout_path)).exists() {+        let o = Command::new("bash")+            .arg("-c")+            .arg(format!("{}/configure && make -j", ruby_checkout_path))+            .current_dir(&ruby_checkout_path)+            .output().expect("works1 ");+        if !o.status.success() {+            io::stdout().write_all(&o.stdout).unwrap();+            io::stderr().write_all(&o.stderr).unwrap();+            panic!("failed subcommand");+        }+    }+    if !Path::new(&format!("{}/ruby_checkout/ruby-2.6.6/libripper.2.6-static.a", path.display())).exists() {
    if !path.join("ruby_checkout/ruby-2.6.6/libripper.2.6-static.a").exists() {
penelopezone

comment created time in 3 months

more