profile
viewpoint
Luiz Irber luizirber @dib-lab - @ucdavis Davis, CA https://luizirber.org Computer Science PhD student @ucdavis, @dib-lab minion, @carpentries enthusiast

dib-lab/sourmash 210

Compute and compare MinHash signatures for DNA data sets.

kipoi/kipoi 168

Kipoi's model zoo API

castelao/seabird 32

Python parser for Sea-Bird CTD outputs, usually .cnv files.

castelao/CoTeDe 20

Quality Control of Oceanographic Data

dkoslicki/MinHashMetagenomics 17

Fast approximation of similarity for sets of very different sizes

dib-lab/sourmash_databases 11

Build sourmash databases for genbank.

dsidavis/dsiaffiliates 8

Wiki-like repos for affiliates to share questions, information, projects, etc.

castelao/pyrings 6

Python package to handle rings/eddies in the ocean

castelao/maud 4

Moving Average of Uneven Data

push eventluizirber/phd

Luiz Irber

commit sha fb90efa892ed538725813b1e9c4a3a982fbdb837

fixes #9

view details

push time in 18 hours

PR closed luizirber/phd

minor fixes

🤛

+4 -4

0 comment

3 changed files

ctb

pr closed time in 18 hours

issue commentzenodo/zenodo

github: inclusion of release assets in published record

Pinging in this issue with an use case: I have a repo for my thesis, and on a new release a PDF with the rendered content is uploaded to the release using GH actions.

I would like to see it show up in the Zenodo record, in the "Files" section.

slint

comment created time in a day

created tagluizirber/phd

tag2020.09.20

created time in a day

push eventluizirber/phd

Luiz Irber

commit sha 7588668d031b5db50c3548d90a7ec9020e3f201a

abstract

view details

push time in a day

issue commentdib-lab/sourmash

Within signature distance?

Just dropping a quick note that we actually have hash counts too, and that might be useful information to use...

(Sorry for short message, will comment more next week)

anmwinter

comment created time in 2 days

push eventluizirber/phd

Luiz Irber

commit sha 62d1c16a47c627571f0303762f038cd5d7fba449

conclusion

view details

push time in 2 days

created tagluizirber/phd

tag2020.09.18

created time in 3 days

push eventluizirber/phd

Luiz Irber

commit sha fcc2a78ad9c13b99e31a6e0e13b895730cded457

chp2 methods

view details

push time in 4 days

push eventluizirber/phd

Luiz Irber

commit sha 7e4436767df7a92e9cd3a7c8740455714ed87686

chp3

view details

push time in 4 days

startedrune-rs/rune

started time in 4 days

startedClarissaE/2020-NSURP-Project-IBD

started time in 4 days

push eventluizirber/phd

Luiz Irber

commit sha 27fdfc963169a64f6ae511f7455bb77f864a11bf

chp2

view details

push time in 4 days

issue openeddib-lab/sourmash

Large memory peak when loading large SBTs

When loading the 2020-07-18 GenBank bacteria SBT for search I noticed a large memory peak when opening the SBT (Y-axis in GB): image Later it stabilizes around 2.2 GB.

I'm guessing it is related to how the zipfile module from Python loads the data, but further investigation needed.

created time in 5 days

push eventluizirber/phd

Luiz Irber

commit sha 95425e49f2b3d3e458157e94429ac4750e02c280

chp1

view details

push time in 6 days

created tagluizirber/phd

tag2020.09.15

created time in 7 days

push eventluizirber/phd

Luiz Irber

commit sha 92c2d5ce5f837e2c181f1ef63ce684acda0b6172

update minhash fig

view details

Luiz Irber

commit sha a7d7bdead1c1f1fbc4c833d001cd900869e12406

sizes plots

view details

push time in 7 days

delete branch luizirber/phd

delete branch : ctb_moar

delete time in 7 days

PR closed luizirber/phd

minor fixes

also, get rid of the Conclusions sections for 04 and 05, I think!

For 06 / final conclusions, you can probably finish with a ~single page summarizing your four chapters --

"We did [ ... this ... ]. We showed [ ... this ... ]. In future work we would like to do [ ... this ...]."

+10 -10

1 comment

3 changed files

ctb

pr closed time in 7 days

pull request commentluizirber/phd

minor fixes

Fixed in 9a6dc46

ctb

comment created time in 7 days

push eventluizirber/phd

Luiz Irber

commit sha b2111271a0e2f9fde385e0688e41dd02d1cf02f3

chp1 fig

view details

Luiz Irber

commit sha 9a6dc46e02b2c1d4fb16266bd7075c1e2ce8afe6

titus minor fixes

view details

push time in 7 days

push eventluizirber/phd

Luiz Irber

commit sha 49073f1b4879cb422f5073714eea58a09c49a803

chp1 plots

view details

push time in 8 days

push eventluizirber/phd

Luiz Irber

commit sha ed161fca3431d246c2f2f51b00ee3c4785590c36

upd

view details

push time in 9 days

issue commentluizirber/phd

devtools install magic to install aggiedown

someone else's code, I don't R =]

ctb

comment created time in 9 days

push eventluizirber/phd

Luiz Irber

commit sha 0a05b42d0381c627998a980a927cc09ad6e2ed1f

upd

view details

push time in 9 days

startedsmarco/WFA

started time in 10 days

startedpdbpp/pdbpp

started time in 10 days

push eventluizirber/phd

Luiz Irber

commit sha b973fed0530e5880fdcc0111a8872b926d1c9d2a

use greedy scheduler in CI

view details

push time in 10 days

created tagluizirber/phd

tag2020.09.12

created time in 10 days

push eventluizirber/phd

Luiz Irber

commit sha 4edb0731776bd0fb8adabc5056f05bef067e7cd7

upd

view details

push time in 10 days

push eventluizirber/phd

Luiz Irber

commit sha a860cd23a15dd2d7bcaf8734c98c4bc9a958714d

upd

view details

Luiz Irber

commit sha 88f1c52ebbc803808d14619cb77169bf08effe76

upd

view details

push time in 11 days

issue openedCAMI-challenge/data

Number of reference genomes used in profiling

Hi,

I was trying to figure out how many reference genomes/datasets each tool used for the mouse gut metagenome toy challenge, but it seems results are mostly generated from default databases provided with the tools, and not built from the RefSeq snapshot provided in the CAMI 2 databases page. Is this info about the tools databases available?

Thanks!

created time in 11 days

delete branch dib-lab/sourmash

delete branch : autodoc_myst

delete time in 11 days

push eventdib-lab/sourmash

Luiz Irber

commit sha 7a8e5ac75acf4c23ae51f6f473a139420e91a84e

remove last .rst file from docs (#1185)

view details

push time in 11 days

PR merged dib-lab/sourmash

[MRG] Replace last .rst file from docs

With #163 being fixed and myst-parser v0.12.2 released, we can replace the last .rst file from docs.

Checklist

  • [x] Is it mergeable?
  • [x] make test Did it pass the tests?
  • [x] make coverage Is the new code covered?
  • [x] Did it change the command-line interface? Only additions are allowed without a major version increment. Changing file formats also requires a major version number increment.
  • [x] Was a spellchecker run on the source code and documentation after changes were made?
+42 -40

1 comment

4 changed files

luizirber

pr closed time in 11 days

push eventluizirber/bioconda-recipes

Luiz Irber

commit sha 6db5cc27d0064c06b50a85309feca8e7d171d850

boost

view details

push time in 12 days

startedsalzberg-lab/Balrog

started time in 12 days

push eventluizirber/bioconda-recipes

Luiz Irber

commit sha 7a1757536d03c84197d9503acc35bfbecd055aea

copy only binary, run build tests

view details

push time in 12 days

issue commentbingmann/cobs

conda package

https://github.com/bioconda/bioconda-recipes/pull/24326

issues to solve:

  • versioning. bioconda will accept a git SHA, but a tag is preferred.
  • run tests on built package
graceblackwell

comment created time in 12 days

PR opened bioconda/bioconda-recipes

Add recipe: COBS

Add new recipe for COBS, based on the Salmon and dashing recipes.

TODO:

  • [ ] ask for a version and tag in original repo
  • [ ] run tests

Please read the guidelines for Bioconda recipes before opening a pull request (PR).

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.

<details> <summary>Please use the following BiocondaBot commands:</summary>

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

<table> <tr> <td><code>@BiocondaBot please update</code></td> <td>Merge the master branch into a PR.</td> </tr> <tr> <td><code>@BiocondaBot please add label</code></td> <td>Add the <code>please review & merge</code> label.</td> </tr> <tr> <td><code>@BiocondaBot please fetch artifacts</code></td> <td>Post links to CI-built packages/containers. <br />You can use this to test packages locally.</td> </tr> </table>

For members of the Bioconda project, the following command is also available:

<table> <tr> <td><code>@BiocondaBot please merge</code></td> <td>Upload built packages/containers and merge a PR. <br />Someone must approve a PR first! <br />This reduces CI build time by reusing built artifacts.</td> </tr> </table>

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

</details>

+56 -0

0 comment

2 changed files

pr created time in 12 days

create barnchluizirber/bioconda-recipes

branch : cobs

created branch time in 12 days

issue commentbingmann/cobs

conda package

I'll try it, I think all deps are available in conda-forge

graceblackwell

comment created time in 12 days

push eventluizirber/phd

Luiz Irber

commit sha 0785dad2e5f1367332df77d7ab2ae91549c55f7f

fig update

view details

push time in 12 days

push eventdib-lab/sourmash

Luiz Irber

commit sha 86f85417cce416f64b43d03fa80ef40525949aca

remove last .rst file from docs

view details

push time in 13 days

push eventdib-lab/sourmash

Luiz Irber

commit sha bbca595ed07c9fb608941940045c55b0654a5188

From enum to trait object

view details

push time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha 619c98140cb0ce3866e23982443cbc32230d81ab

update snakemake

view details

push time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha cad70e2585bb6eba25b8bef67016b33618a22f40

use mamba for CI

view details

push time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha 5db98b8eeef9fa9ff5ac6b4c6ae12b33ae6fc688

use mamba for CI

view details

push time in 13 days

created tagluizirber/phd

tag2020.09.09

created time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha 362bb2693142f8e46ee2493c75e7656a020e7e6f

fix citation

view details

push time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha f0abd07e4626995a7424995f2d10610cda911634

zenodo upd

view details

push time in 13 days

delete branch luizirber/phd

delete branch : ctb_updates

delete time in 13 days

PR closed luizirber/phd

minor typo foo

many minor fixes.

+21 -20

1 comment

6 changed files

ctb

pr closed time in 13 days

pull request commentluizirber/phd

minor typo foo

Thanks, fixed in 0ca98d0

ctb

comment created time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha 0ca98d0ae7aafe402e8bb82190ffa3bbce4bff71

fixes from PR #7

view details

push time in 13 days

issue commentluizirber/phd

devtools install magic to install aggiedown

Interesting... I add tar as a dependency in the R env, let's see if it works.

ctb

comment created time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha dd4c411edbe12618bedf1a9745a24b69e5e45250

upd

view details

push time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha efe1fa9a3c32ea200ffea1cd37ff909be77dbfb4

cami plots

view details

push time in 13 days

delete tag luizirber/phd

delete tag : 2020-09-08

delete time in 13 days

push eventluizirber/phd

Luiz Irber

commit sha 8cd61b67cc19082cd91a611a8b440b45035d4ddd

fix spacing

view details

push time in 14 days

created tagluizirber/phd

tag2020-09-08

created time in 14 days

push eventluizirber/phd

Luiz Irber

commit sha 56a399c35944ff2505d2091adae79481e5a5f121

opal fig with sourmash

view details

push time in 14 days

created tagluizirber/phd

tag2020.09.08

created time in 14 days

push eventluizirber/phd

Luiz Irber

commit sha 30b653dbb6e5ca5c2fe679a9b65983e113d896f5

upd

view details

push time in 14 days

push eventluizirber/phd

Luiz Irber

commit sha 4c2fd27f494f517cf3f34b39515e279a531ef713

upd

view details

push time in 15 days

push eventluizirber/phd

Luiz Irber

commit sha e6c25d8d86182e11f3c538674d982941dfe12cf9

binder

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha eb42c3a1b8abda221aef1356e7193485d22e9b09

prebuild repo2docker for binder

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha aa42d6821fd1c313430cdd35cbb70091972c852e

add repo2docker instructions

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha d0da2b0e0961e00bc6b6cb8e798c2744085bb20b

disable

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha 235160a539120c556d9eea48d4410d9886a7fdec

disable

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha 61e3ce166fd698c2b18c50e44a3e356fdf2cfde2

disable

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha 66d921f167a2ffa8e15de60863af65c9aace3de6

disable

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha 6e339099fe319ef3a5b66edde0132a0fbb7944f3

update binder badge

view details

push time in 16 days

push eventluizirber/2020-09-06-jct

Luiz Irber

commit sha cffa2b66a58717e4e6d5e532b6469f5312787165

update binder badge

view details

push time in 16 days

create barnchluizirber/2020-09-06-jct

branch : latest

created branch time in 16 days

created repositoryluizirber/2020-09-06-jct

created time in 16 days

issue commentdib-lab/sourmash

How should users prepare signatures for large scale metagenome search?

These are the parameters used for the metagenomes, so this would generate sigs supported with any of the parameters:

sourmash compute -k 21,31,51 --scaled 1000 --track-abundance --name IDENTIFIER genome.fa

(--track-abundance and multiple -k will make largers sigs, but might be useful for downstream analysis)

ctb

comment created time in 17 days

issue commentdib-lab/wort

Calculating SAC on metagenome clusters

That is a really good idea... and a monstrous matrix :rofl:

I'll work on sharing all the sigs in a couple of weeks, but it is not something I can tackle at the moment :cry:

nmb85

comment created time in 17 days

issue commentdib-lab/sourmash

Higher-order "compare" method with zeta diversity metric

Then I can calculate all the statistics I want from the table/matrix really quickly. Using itertools to create all combinations of signatures with shared hashes is a waste of time. Am I reinventing the wheel; does sourmash already have a method to save the signature names, hashes, and hash counts for a set of signatures to a tabular file?

We didn't have a method before, but now we do =] This is beautiful!

nmb85

comment created time in 18 days

push eventluizirber/phd

Luiz Irber

commit sha 600dee0d812189abb6521b1c7f4f7c0a29b8fdf6

experimental rules

view details

push time in 19 days

issue commentdib-lab/sourmash

Higher-order "compare" method with zeta diversity metric

Did a quick skim on the paper, and I think this is possible with the info we already output from sourmash gather (if you do the classification first and use matches to calculate diversity).

A possible extension would be to do the zeta diversity on the hashes, and that would require a bit more of code (but is doable with the LCA index, which already has the mapping from hash to which signatures contain the hash). I'm not sure how robust the results would be... but worth trying it out =]

nmb85

comment created time in 19 days

issue commentdib-lab/sourmash

duplication of signatures seen in large SBT databases

When I count the number of signatures in the "tree" object before saving the sbt, there aren't any duplicated signatures, but when I use the "tree.save" function to write the sbt to a file, then use sourmash sig describe on that file, I can see the exact same duplication as I did in the original issue. Is the "save" function that is writing the signatures to disk duplicating some of the signatures?

Oh, great finding! I'm scratching my head to think what is happening on the .save() code, but I will take a closer look (next week). Finding a reduced subset that triggers the problem would be a great unit test, but I'm starting to think it only triggers with a very large number of sigs...

P.S. - Are you a fan of maté :mate:?

Yup! It's just hard to find good Erva Mate around here, but I still have some that I brought from Brazil =]

ctb

comment created time in 19 days

PR closed luizirber/phd

May25 notes.

see e-mail.

+147 -28

0 comment

7 changed files

ctb

pr closed time in 19 days

created tagluizirber/phd

tag2020.09.02

created time in 20 days

push eventdib-lab/sourmash

C. Titus Brown

commit sha c89ac9550c009a5a27010cd8874a180151fb8ad6

Merge branch 'latest' into autodoc_myst

view details

push time in 20 days

issue openeddib-lab/sourmash_databases

RefSeq representative genomes

NCBI is releasing the RefSeq representative genomes, similar to how GTDB has their dereplicated genomes. https://ncbiinsights.ncbi.nlm.nih.gov/2020/08/21/updated-representative-genomes/

Might be worth building an equivalent sourmash database and version as NCBI release new versions?

created time in 20 days

push eventluizirber/phd

Luiz Irber

commit sha 47190f73813c08519eca0e3ad1ac32f79f225814

upd

view details

push time in 20 days

push eventluizirber/phd

Luiz Irber

commit sha 094692b3755311331b372057e0b15e619174a91a

use csv instead of txt

view details

push time in 21 days

push eventluizirber/phd

Luiz Irber

commit sha 89e972aec7ceb4f08072d681c5549a9e00649c53

fix table

view details

push time in 21 days

push eventluizirber/phd

Luiz Irber

commit sha 6f11e468186aa9c312766b0564b22488a2804937

fix percentage

view details

push time in 21 days

push eventluizirber/phd

Luiz Irber

commit sha b5da4e98e175fc7c2f7d779cfdfa1b116a8d4d82

fix sig page

view details

push time in 21 days

push eventluizirber/phd

Luiz Irber

commit sha 00fd46b7cf472d34e374dca08dd65951ecb08489

upd table

view details

push time in 22 days

push eventluizirber/phd

Luiz Irber

commit sha 14b3681338c64ff3ab7a7b1a36c491d57b7805bf

add wort processed datasets analysis

view details

Luiz Irber

commit sha a3d01181dc7bb318d42bc6e116a29a2405f9905b

upd

view details

push time in 22 days

push eventluizirber/phd

Luiz Irber

commit sha fcb2095209e91f7892f298de96206ac20a0b6634

mag search validation with mapping

view details

push time in 22 days

push eventluizirber/phd

Luiz Irber

commit sha d7c5a724c7e86057764ef956e772835c7dcebf86

new table

view details

push time in 23 days

starteddib-lab/charcoal

started time in 25 days

issue commentdib-lab/sourmash

Document everything in the Rust side

(this triggers 399 errors as of 6e639bd2292be8686f97a4a071d90c7abe41ac8a =P)

luizirber

comment created time in 25 days

issue openeddib-lab/sourmash

Document everything in the Rust side

Can selectively add #![deny(missing_docs)] to each module, or go all the way and add it to src/core/lib.rs and see everything break until all public items are documented.

Once the current items are documented, leave #![deny(missing_docs)] active and require docs for future changes.

(I'm assigning myself because the lack of docs is my fault)

created time in 25 days

Pull request review commentdib-lab/sourmash

[WIP] Replace mx by scaled

 impl KmerMinHash {         self.check_compatible(other)?;          let mut combined_mh = KmerMinHash::new(-            self.num,+            scaled_for_max_hash(self.max_hash),

currently we have thousands of warnings in CI (example) triggered because we are still using .max_hash internally. Eventually (for 4.0) we want to avoid using/documenting/suggesting .max_hash and focus on .scaled instead, including constructors (and hence why we deprecated .max_hash in 3.5 and generate all the warnings)

But that's all in the Python side, and this PR is for changing the Rust side. So, you don't need to worry about it if you don't want to, but I thought it was relevant to keep in mind =]

xmnlab

comment created time in 25 days

more