profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/mmokrejs/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

mmokrejs/abyss 0

:microscope: Assemble large genomes using short reads

mmokrejs/adapterremoval 0

AdapterRemoval v2 - rapid adapter trimming, identification, and read merging

mmokrejs/biopython 0

Official git repository for Biopython (converted from CVS)

mmokrejs/bwa 0

Burrow-Wheeler Aligner for pairwise alignment between DNA sequences

mmokrejs/canu 0

A single molecule sequence assembler for genomes large and small.

mmokrejs/cdhit 0

Automatically exported from code.google.com/p/cdhit

mmokrejs/Fastaq 0

Python3 scripts to manipulate FASTA and FASTQ files

mmokrejs/gentoo 0

The Gentoo ebuild repository mirror

mmokrejs/keepassxc 0

KeePassXC is a cross-platform community-driven port of the Windows application “Keepass Password Safe”.

mmokrejs/lib2bit 0

A C library for accessing 2bit files

issue openedgentoo/sci

sci-biology/tigmint: setup.py needs fix

Hi, would somebody have a look why the package places a read_fasta.py module file into /usr/bin/ instead of python site-packages? Thank you

https://github.com/gentoo/sci/commit/6c45174a480dae72e147dbf764bda33b72404a56

created time in 5 hours

push eventmmokrejs/tigmint

Martin Mokrejs

commit sha b6e8ca24d45938395355db4366105a07b7a94b4c

Add missing dependencies and clarify ONT/PB handling

view details

push time in 8 hours

issue openedbcgsc/tigmint

pigz may be better replaced by bgzip

pigz may be replaced by bgzip from htslib package from http://www.htslib.org which scales better

https://github.com/samtools/samtools/issues/1318#issuecomment-703483014

created time in 8 hours

issue openedbcgsc/tigmint

samtools sort may be replaced by bamsort whcih scales better

The samtools sort may be replaced by bamsort from biobambam2 package which scales much better. See https://gitlab.com/german.tischler/biobambam2

created time in 8 hours

issue openedbcgsc/tigmint

Respect $TMPDIR as anticipated by sort tool

The sort tool by default uses /tmp which is typically very small and even may bnot be writable in cluster environments by users. If non-empty, the Makefile shall pass its contents down to sort commandline.

created time in 8 hours

push eventmmokrejs/tigmint

Martin Mokrejs

commit sha df64c48e2850eb210c5dc09840f41a9b6fe25d1f

Add missing dependencies and clarify ONT/PB handling Notes for future changes: 1. The Makefile should append '-T $TMPDIR' to all sort calls. 2. The 'samtools sort' may be replaced by bamsort from biobambam2 package which scales much better. See https://gitlab.com/german.tischler/biobambam2 3. pigz may be replaced by bgzip from htslib package from http://www.htslib.org which scales better: https://github.com/samtools/samtools/issues/1318#issuecomment-703483014

view details

push time in 8 hours

PR opened bcgsc/tigmint

Add missing dependencies and clarify ONT/PB handling

The Makefile should append '--tmpdir $TMPDIR' to all sort calls.

The 'samtools sort' may be replaced by bamsort from biobambam2 package which scales much better. See https://gitlab.com/german.tischler/biobambam2

pigz may be replaced by bgzip from htslib package from http://www.htslib.org which scales better: https://github.com/samtools/samtools/issues/1318#issuecomment-703483014

+3 -3

0 comment

1 changed file

pr created time in 8 hours

create barnchmmokrejs/tigmint

branch : improve_README

created branch time in 8 hours

fork mmokrejs/tigmint

⛓ Correct misassemblies using linked AND long reads

https://bcgsc.github.io/tigmint/

fork in 9 hours

issue commentbcgsc/tigmint

Tigmint only works when files are in working directory

Seems lots of users hit this problem (https://github.com/bcgsc/tigmint/issues/35 https://github.com/bcgsc/tigmint/issues/44 https://github.com/bcgsc/tigmint/issues/52 ). The make somehow constructs a Makefile target based on the input filename and if the there directory separators in there, it chokes. Likewise, it chokes when I use shell pipe to feed in a FASTA stream from multiple files:

tigmint-make ... reads=<(cat *.fmlrc2.fa)
make: *** No rule to make target `mygenome-long-scaffs.fa./dev/fd/63.cut500.as0.65.nm500.molecule.size2000.trim0.window1000.spanauto.breaktigs.fa', needed by `tigmint-long'.  Stop.

Maybe a sed call in a proper place could fix that.

lculibrk

comment created time in 2 days

issue openedbcgsc/btl_bloomfilter

btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/test-driver: line 112: 26977 Aborted (core dumped)

>>> Test phase: sci-biology/btl_bloomfilter-1.2.1
Making check in Tests/AdHoc
make[1]: Entering directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make  BloomFilterTests ParallelFilter
make[2]: Entering directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
x86_64-pc-linux-gnu-g++ -DHAVE_CONFIG_H -I. -I../..  -I../..  -Wall -Wextra -Werror -fopenmp -std=c++11 -O2 -pipe -march=native -ftree-vectorize -c -o BloomFilterTests-BloomFilterTests.o `test -f 'BloomFilterTests.cpp' || echo './'`BloomFilterTests.cpp
x86_64-pc-linux-gnu-g++ -Wall -Wextra -Werror -fopenmp -std=c++11 -O2 -pipe -march=native -ftree-vectorize  -Wl,-O1 -Wl,--as-needed -o BloomFilterTests BloomFilterTests-BloomFilterTests.o  
x86_64-pc-linux-gnu-g++ -DHAVE_CONFIG_H -I. -I../..  -I../..  -Wall -Wextra -Werror -fopenmp -std=c++11 -O2 -pipe -march=native -ftree-vectorize -c -o ParallelFilter-ParallelFilter.o `test -f 'ParallelFilter.cpp' || echo './'`ParallelFilter.cpp
x86_64-pc-linux-gnu-g++ -Wall -Wextra -Werror -fopenmp -std=c++11 -O2 -pipe -march=native -ftree-vectorize  -Wl,-O1 -Wl,--as-needed -o ParallelFilter ParallelFilter-ParallelFilter.o  
make[2]: Leaving directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make  check-TESTS
make[2]: Entering directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make[3]: Entering directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
../../test-driver: line 112: 26977 Aborted                 (core dumped) "$@" >> "$log_file" 2>&1
FAIL: BloomFilterTests
PASS: ParallelFilter
============================================================================
Testsuite summary for BLOOMFILTER 1.2.1
============================================================================
# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See Tests/AdHoc/test-suite.log
Please report to cjustin@bcgsc.ca
============================================================================
make[3]: *** [Makefile:667: test-suite.log] Error 1
make[3]: Leaving directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make[2]: *** [Makefile:775: check-TESTS] Error 2
make[2]: Leaving directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make[1]: *** [Makefile:855: check-am] Error 2
make[1]: Leaving directory '/var/tmp/portage/portage/sci-biology/btl_bloomfilter-1.2.1/work/btl_bloomfilter-1.2.1/Tests/AdHoc'
make: *** [Makefile:469: check-recursive] Error 1
(gdb) where
#0  0x00007fae17fd4291 in raise () from /lib64/libc.so.6
#1  0x00007fae17fbe536 in abort () from /lib64/libc.so.6
#2  0x00007fae17fbe41f in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x00007fae17fcccf2 in __assert_fail () from /lib64/libc.so.6
#4  0x000055aaa4cbe543 in main () at BloomFilterTests.cpp:133
(gdb) bt full
#0  0x00007fae17fd4291 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fae17fbe536 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fae17fbe41f in __assert_fail_base.cold () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fae17fcccf2 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x000055aaa4cbe543 in main () at BloomFilterTests.cpp:133
        memUsage = 6236
        filterSize = 1000000
        numHashes = 3
        k = 4
        seq = 0x55aaa4ce7b71 "ACGTAC"
        filter = {m_filter = 0x55aaa689f120 "", m_size = 1000000, m_sizeInBytes = 125000, m_hashNum = 3, m_kmerSize = 4, m_dFPR = 0, m_nEntry = 0, m_tEntry = 0}
        insertIt = {m_seq = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0x7ffd999efc50 "ACGTAC"}, _M_string_length = 6, {_M_local_buf = "ACGTAC\000\000\000\000\000\000\000\000\000", _M_allocated_capacity = 73947865891649}}, m_h = 3, 
          m_k = 4, m_hVec = 0x55aaa689d090, m_pos = 18446744073709551615, m_fhVal = 17023861349393640413, m_rhVal = 17023861349393640413}
        queryIt = {m_seq = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0x7ffd999efca0 "ACGTAC"}, _M_string_length = 6, {_M_local_buf = "ACGTAC\000\000\000\000\000\000\000\000\000", _M_allocated_capacity = 73947865891649}}, m_h = 3, 
          m_k = 4, m_hVec = 0x55aaa689d0b0, m_pos = 18446744073709551615, m_fhVal = 17023861349393640413, m_rhVal = 17023861349393640413}
        __PRETTY_FUNCTION__ = "int main()"
        filename = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
            _M_p = 0x55aaa689e0e0 "/tmp/bloomFilter.bf"}, _M_string_length = 19, {_M_local_buf = "\023", '\000' <repeats 14 times>, _M_allocated_capacity = 19}}
        ifile = {<std::basic_istream<char, std::char_traits<char> >> = {<std::basic_ios<char, std::char_traits<char> >> = {<std::ios_base> = {
                _vptr.ios_base = 0x7fae183a6e88 <vtable for std::basic_ifstream<char, std::char_traits<char> >+64>, static boolalpha = std::_S_boolalpha, static dec = std::_S_dec, 
                static fixed = std::_S_fixed, static hex = std::_S_hex, static internal = std::_S_internal, static left = std::_S_left, static oct = std::_S_oct, 
                static right = std::_S_right, static scientific = std::_S_scientific, static showbase = std::_S_showbase, static showpoint = std::_S_showpoint, 
                static showpos = std::_S_showpos, static skipws = std::_S_skipws, static unitbuf = std::_S_unitbuf, static uppercase = std::_S_uppercase, 
                static adjustfield = std::_S_adjustfield, static basefield = std::_S_basefield, static floatfield = std::_S_floatfield, static badbit = std::_S_badbit, 
                static eofbit = std::_S_eofbit, static failbit = std::_S_failbit, static goodbit = std::_S_goodbit, static app = std::_S_app, static ate = std::_S_ate, 
                static binary = std::_S_bin, static in = std::_S_in, static out = std::_S_out, static trunc = std::_S_trunc, static beg = std::_S_beg, static cur = std::_S_cur, 
                static end = std::_S_end, _M_precision = 6, _M_width = 0, _M_flags = 4098, _M_exception = std::_S_goodbit, _M_streambuf_state = std::_S_goodbit, _M_callbacks = 0x0, 
                _M_word_zero = {_M_pword = 0x0, _M_iword = 0}, _M_local_word = {{_M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}, {
                    _M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}, {_M_pword = 0x0, _M_iword = 0}}, 
                _M_word_size = 8, _M_word = 0x7ffd999efec0, _M_ios_locale = {static none = 0, static ctype = 1, static numeric = 2, static collate = 4, static time = 8, 
                  static monetary = 16, static messages = 32, static all = 63, _M_impl = 0x7fae183afd20 <(anonymous namespace)::c_locale_impl>}}, _M_tie = 0x0, _M_fill = 0 '\000', 
              _M_fill_init = false, _M_streambuf = 0x7ffd999efd90, _M_ctype = 0x7fae183af740 <(anonymous namespace)::ctype_c>, _M_num_put = 
    0x7fae183af6d0 <(anonymous namespace)::num_put_c>, _M_num_get = 0x7fae183af6e0 <(anonymous namespace)::num_get_c>}, 
            _vptr.basic_istream = 0x7fae183a6e60 <vtable for std::basic_ifstream<char, std::char_traits<char> >+24>, _M_gcount = 0}, 
          _M_filebuf = {<std::basic_streambuf<char, std::char_traits<char> >> = {_vptr.basic_streambuf = 0x7fae183a6d68 <vtable for std::basic_filebuf<char, std::char_traits<char> >+16>, 
              _M_in_beg = 0x0, _M_in_cur = 0x0, _M_in_end = 0x0, _M_out_beg = 0x0, _M_out_cur = 0x0, _M_out_end = 0x0, _M_buf_locale = {static none = 0, static ctype = 1, 
                static numeric = 2, static collate = 4, static time = 8, static monetary = 16, static messages = 32, static all = 63, 
                _M_impl = 0x7fae183afd20 <(anonymous namespace)::c_locale_impl>}}, _M_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
                __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, _M_file = {_M_cfile = 0x0, _M_cfile_created = true}, _M_mode = 0, 
            _M_state_beg = {__count = 0, __value = {__wch = 0, __wchb = "\000\000\000"}}, _M_state_cur = {__count = 0, __value = {__wch = 0, __wchb = "\000\000\000"}}, _M_state_last = {
              __count = 0, __value = {__wch = 0, __wchb = "\000\000\000"}}, _M_buf = 0x0, _M_buf_size = 8192, _M_buf_allocated = false, _M_reading = false, _M_writing = false, 
            _M_pback = 0 '\000', _M_pback_cur_save = 0x0, _M_pback_end_save = 0x0, _M_pback_init = false, _M_codecvt = 0x7fae183af6b0 <(anonymous namespace)::codecvt_c>, 
            _M_ext_buf = 0x0, _M_ext_buf_size = 0, _M_ext_next = 0x0, _M_ext_end = 0x0}}
        headerEnd = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
            _M_p = 0x7ffd999efc10 "[HeaderEnd]"}, _M_string_length = 11, {_M_local_buf = "[HeaderEnd]\000\000\000\000", _M_allocated_capacity = 5004173617767204955}}
        line = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
            _M_p = 0x55aaa689e2d0 "[HeaderEnd]"}, _M_string_length = 11, {_M_local_buf = "<\000\000\000\000\000\000\000IcE\000\000\000\000", _M_allocated_capacity = 60}}
        headerEndCheck = true
        currPos = 169
        fileSize = 125169
        filter2 = {m_filter = 0x55aaa68bf9d0 "", m_size = 1000000, m_sizeInBytes = 125000, m_hashNum = 3, m_kmerSize = 4, m_dFPR = 0, m_nEntry = 0, m_tEntry = 0}
        queryIt2 = {m_seq = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0x7ffd999efcf0 "ACGTAC"}, _M_string_length = 6, {_M_local_buf = "ACGTAC\000\000\000\000\000\000\000\000\000", _M_allocated_capacity = 73947865891649}}, m_h = 3, 
          m_k = 4, m_hVec = 0x55aaa689e460, m_pos = 18446744073709551615, m_fhVal = 17023861349393640413, m_rhVal = 17023861349393640413}
        filter3 = 0x55aaa689e1e0
        tempMem = 332
(gdb) 

created time in 2 days

issue commentbcgsc/abyss

PathConsensus error and abyss crash

Hi @vlad0x00 , the two patches still did NOT make it into abyss-2.3.2 yet, long time since 2.2.5 already.

https://github.com/gentoo/sci/commit/f22bb3be4e8f283831edc5f4c38a839442355644#diff-94cee5830c9137e6cd9652fdc7d6ae62f5df0296cde4e05511df65febe550680

https://github.com/gentoo/sci/commit/f22bb3be4e8f283831edc5f4c38a839442355644#diff-0331115e154d3d976b9f6a32f2c2abdf4fdb8d003d44e0e794a93620d0723727

lsterck

comment created time in 2 days

issue commentbcgsc/LINKS

Unbundling bloomfilter out of the LINKS distribution tarball and more cleanup

Hi René, I got back to this issue with 1.8.7 release ... also the perl tools scripts are missing from the current release:

$ find LINKS-1.8.7 -name \*.pl
LINKS-1.8.7/releases/links_v1.8.5/LINKS.pl
LINKS-1.8.7/releases/links_v1.8.5/tools/makeMPETOutput2EQUALfiles.pl
LINKS-1.8.7/releases/links_v1.8.5/tools/testBloom.pl
LINKS-1.8.7/releases/links_v1.8.5/tools/writeBloom.pl
LINKS-1.8.7/releases/links_v1.8.6/LINKS.pl
LINKS-1.8.7/releases/links_v1.8.6/tools/makeMPETOutput2EQUALfiles.pl
LINKS-1.8.7/releases/links_v1.8.6/tools/testBloom.pl
LINKS-1.8.7/releases/links_v1.8.6/tools/consolidateGraphs.pl
LINKS-1.8.7/releases/links_v1.8.6/tools/writeBloom.pl
LINKS-1.8.7/releases/links_v1.8.4/LINKS.pl
LINKS-1.8.7/releases/links_v1.8.4/tools/makeMPETOutput2EQUALfiles.pl
LINKS-1.8.7/releases/links_v1.8.4/tools/testBloom.pl
LINKS-1.8.7/releases/links_v1.8.4/tools/writeBloom.pl
LINKS-1.8.7/scaffoldsToAGP2.pl
$

Seems only LINKS-1.8.7/scaffoldsToAGP2.pl and LINKS-1.8.7/bin/LINKS are available for 1.8.7. I used https://github.com/bcgsc/LINKS/archive/refs/tags/v1.8.7.tar.gz as the source.

mmokrejs

comment created time in 2 days

issue commentsystemd-cron/systemd-cron

postdrop: warning: mail_queue_enter: create file maildrop/887233.343246: Permission denied

I think the best solution might be to stop using DynamicUser, and instead use something like

[Service]
DynamicUser=no
User=_cron-failure
Group=systemd-journal

I ran into this issue too when migrating from OpenRC to systemd on Gentoo Linux. About what file are you speaking here?

internethering

comment created time in 2 days

issue commentHudsonAlpha/rust-fmlrc

No error if '-k=21,33,41,59,79' cannot get parsed

fmlrc2 -t 16 -C 10 -k 21 33 41 59 79 comp_msbwt.npy input.fastq.gz output.fasta
...
error: The following required arguments were not provided:
    <COMP_MSBWT.NPY>
    <LONG_READS.FA>
    <CORRECTED_READS.FA>
mmokrejs

comment created time in 5 days

issue commentHudsonAlpha/rust-fmlrc

No error if '-k=21,33,41,59,79' cannot get parsed

Well, the primary error is maybe that the index creation failed

+ grep -v '^>' L319_301_S9_L003.trimmomatic.tadpole.k62.shave.rinse.pairs.fasta
+ sort --parallel=16
+ ropebwt2 -LR
+ tr NT TN
+ tr NT TN
+ fmlrc2-convert comp_msbwt.npy
[2021-10-13T16:24:00Z INFO  fmlrc2_convert] Input parameters (required):
[2021-10-13T16:24:00Z INFO  fmlrc2_convert] 	Input BWT: "stdin"
[2021-10-13T16:24:00Z INFO  fmlrc2_convert] 	Output BWT: "comp_msbwt.npy"
sort: write failed: /tmp/sortJzvz0h: No space left on device
[M::main_ropebwt2] inserted 1 symbols in 0.002 sec, 0.001 CPU sec
[M::main_ropebwt2] constructed FM-index in 1940.004 sec, 0.001 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1, 0, 0, 0, 0, 0)
[M::main] Version: r187
[M::main] CMD: ropebwt2 -LR
[M::main] Real time: 1940.007 sec; CPU: 0.006 sec
[2021-10-13T16:56:20Z INFO  fmlrc::bwt_converter] Converted BWT with symbol counts: [1, 0, 0, 0, 0, 0]
[2021-10-13T16:56:20Z INFO  fmlrc::bwt_converter] RLE-BWT byte length: 1
[2021-10-13T16:56:20Z INFO  fmlrc2_convert] RLE-BWT conversion complete.
grep: write error: Broken pipe

giving me comp_msbwt.npy only 97 bytes long.

Still I thing parsing the commandline should raise an error.

mmokrejs

comment created time in 5 days

issue openedHudsonAlpha/rust-fmlrc

No error if '-k=21,33,41,59,79' cannot get parsed

Hi, luckily I realized that the output file is exactly same with my input by comparing checksums. It appears to me the -k values were ignored, with I assume a spurious message output speaking about k-mer sizes 21 and 59 only:

fmlrc2 -t 16 -C 10 -k=21,33,41,59,79 comp_msbwt.npy input.fastq.gz output.fasta
[2021-10-13T20:22:51Z INFO  fmlrc2] Input parameters (required):
[2021-10-13T20:22:51Z INFO  fmlrc2] 	BWT: "comp_msbwt.npy"
[2021-10-13T20:22:51Z INFO  fmlrc2] 	Input reads: "input.fastq.gz"
[2021-10-13T20:22:51Z INFO  fmlrc2] 	Output corrected reads: "output.fasta"
[2021-10-13T20:22:51Z INFO  fmlrc2] Execution Parameters:
[2021-10-13T20:22:51Z INFO  fmlrc2] 	verbose: false
[2021-10-13T20:22:51Z INFO  fmlrc2] 	threads: 16
[2021-10-13T20:22:51Z INFO  fmlrc2] 	cache size: 10
[2021-10-13T20:22:51Z INFO  fmlrc2] Correction Parameters:
[2021-10-13T20:22:51Z INFO  fmlrc2] 	reads to correct: [0, 18446744073709551615)
[2021-10-13T20:22:51Z INFO  fmlrc2] 	k-mer sizes: [21, 59]
[2021-10-13T20:22:51Z INFO  fmlrc2] 	abs. mininimum count: 5
[2021-10-13T20:22:51Z INFO  fmlrc2] 	dyn. minimimum fraction: 0.1
[2021-10-13T20:22:51Z INFO  fmlrc2] 	branching factor: 4
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Loading BWT with 1 compressed values
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Loaded BWT with symbol counts: [1, 0, 0, 0, 0, 0]
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Allocating binary vectors...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Calculating binary vectors...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Constructing FM-indices...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Building 10-mer cache...
[2021-10-13T20:22:52Z INFO  fmlrc::bv_bwt] Finished BWT initialization.
[2021-10-13T20:22:52Z INFO  fmlrc2] Starting read correction processes...
[2021-10-13T20:23:04Z INFO  fmlrc2] Processed 10000 reads...

By reading https://github.com/HudsonAlpha/rust-fmlrc/issues/7#issuecomment-761107717 I see the syntax shuld be different. Please improve the parsing or at least, improve the README.md and give an example how to use multiple k-mer sizes.

Could fmlrc2 output a summary how many correction changes it did per dataset?

created time in 5 days

issue commentHudsonAlpha/rust-fmlrc

fmlrc2: thread 'main' panicked at 'index out of bounds: the len is 0 but the index is ...

There is actually a more pressing issue. fmlrc2 overwrote the file existing output file. Make the tool exit if the output file already exists.

mmokrejs

comment created time in 6 days

issue commentHudsonAlpha/rust-fmlrc

fmlrc2: thread 'main' panicked at 'index out of bounds: the len is 0 but the index is ...

I thought to uderstand from the README.md that larger is better, so I asked for 400GB of RAM. But it seems there is no advantage to derive the larger k-mers from the index? Would you mind improving the README.md to make it clearer why I should not even go for -C 100? Why not? I don't mind longer build time if the results are better. If tehy won't be any better then I am missing something. OK, this is a cache but still, ...

Anyway, I think the app should not crash.

mmokrejs

comment created time in 6 days

issue openedHudsonAlpha/rust-fmlrc

fmlrc2: thread 'main' panicked at 'index out of bounds: the len is 0 but the index is ...

Hi, seems the -C parameter is related to read length. But reads were 125nt in length but even with -C 100 it crashes. -C 10 is fine. But why 10-mers? Is there any use for them? I use current git master checkout.

fmlrc2 -t 16 -C 100 -k=21,59,79 comp_msbwt.npy CCS2.0000006678.fastq.gz L319_301_S9_L003.trimmomatic.tadpole.k62.shave.rinse.pairs.fasta
[2021-10-13T18:59:16Z INFO  fmlrc2] Input parameters (required):
[2021-10-13T18:59:16Z INFO  fmlrc2] 	BWT: "comp_msbwt.npy"
[2021-10-13T18:59:16Z INFO  fmlrc2] 	Input reads: "CCS2.0000006678.fastq.gz"
[2021-10-13T18:59:16Z INFO  fmlrc2] 	Output corrected reads: "L319_301_S9_L003.trimmomatic.tadpole.k62.shave.rinse.pairs.fasta"
[2021-10-13T18:59:16Z INFO  fmlrc2] Execution Parameters:
[2021-10-13T18:59:16Z INFO  fmlrc2] 	verbose: false
[2021-10-13T18:59:16Z INFO  fmlrc2] 	threads: 16
[2021-10-13T18:59:16Z INFO  fmlrc2] 	cache size: 100
[2021-10-13T18:59:16Z INFO  fmlrc2] Correction Parameters:
[2021-10-13T18:59:16Z INFO  fmlrc2] 	reads to correct: [0, 18446744073709551615)
[2021-10-13T18:59:16Z INFO  fmlrc2] 	k-mer sizes: [21, 59]
[2021-10-13T18:59:16Z INFO  fmlrc2] 	abs. mininimum count: 5
[2021-10-13T18:59:16Z INFO  fmlrc2] 	dyn. minimimum fraction: 0.1
[2021-10-13T18:59:16Z INFO  fmlrc2] 	branching factor: 4
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Loading BWT with 1 compressed values
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Loaded BWT with symbol counts: [1, 0, 0, 0, 0, 0]
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Allocating binary vectors...
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Calculating binary vectors...
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Constructing FM-indices...
[2021-10-13T18:59:16Z INFO  fmlrc::bv_bwt] Building 100-mer cache...
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 3689348814741910323', src/bv_bwt.rs:400:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

created time in 6 days

pull request commentgentoo/sci

sci-biology/SEECER: EAPI bump, patch, remove files, fix build system

Thanks for your efforts, Lucas!

lucasmitrak

comment created time in 13 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha 00142aeda9166b82dc94725c8ac3e53e682065fb

Instruct users how to verify their conda installation Until one can run the whole pipeline it is unclear if all dependencies were properly installed or not. One can run a simple shell one-liner to check what is installed without the necessity to execute the pipeline.

view details

Martin Mokrejs

commit sha 3f7c5b7ee14c574aa4f33f24e25b5d0049845c72

Check for binaries needed for execution are in PATH Provided one may not need to run the whole pipeline or start from its latter steps with pre-processed data, do not raise a hard error but issue a warning instead.

view details

Tomáš Pluskal

commit sha 2ce57d7eebd2fed33092ce2503120221f0090f9f

Update README.md

view details

Tomáš Pluskal

commit sha 00dcff2b3ff1c275754ec9006b2e574bc74ddae5

Update README.md

view details

Tomáš Pluskal

commit sha 49e2b1f3a34bab26e00f08688d763098f88468a1

Update README.md

view details

Martin Mokrejs

commit sha 1db3e5f81759fe617e6903d68c09522befc21b2d

Make sure --samples_file is provided to run_DE_analysis.pl and add tests Keep edgeR being used in both cases for the moment. Add a few datasets to mimic biological replicates in our tests The FASTQ files were derived from SRR2103848 which is another RNA-seq dataset from the firefly like in the existing testcase. Three biological replicates were mimiscked by picking first 5k reads, second 5k of reads and third 5k of reads. The $num calculation fixed in Snakefile works now and can be printed to snakemake output using shell echo command. STDERR output from the run_DE_analysis.pl is stored in logs/trinity_DE.log where eventually message was reported when no --samples_file was provided: note, no biological replicates identified, so setting min reps = $MIN_REPS $ cat logs/trinity_DE.log Got 5 samples, and got: 6 data fields. Header: SRR2103848.repl1 SRR2103848.repl2 SRR2103848.repl3 SRR6345446.repl1 SRR6345446.repl2 Next: TRINITY_DN0_c0_g1 44.10 30.05 37.44 0.00 0.00 $VAR1 = { 'SRR6345446.repl2' => 5, 'SRR2103848.repl1' => 1, 'SRR6345446.repl1' => 4, 'SRR2103848.repl3' => 3, 'SRR2103848.repl2' => 2 }; $VAR1 = { 'SRR6345446' => [ 'SRR6345446.repl1', 'SRR6345446.repl2' ], 'SRR2103848' => [ 'SRR2103848.repl1', 'SRR2103848.repl2', 'SRR2103848.repl3' ] }; Contrasts to perform are: $VAR1 = [ [ 'SRR2103848', 'SRR6345446' ] ]; CMD: Rscript kallisto.gene.counts.matrix.SRR2103848_vs_SRR6345446.SRR2103848.vs.SRR6345446.EdgeR.Rscript Loading required package: edgeR Loading required package: limma Using classic mode. null device 1 $

view details

Martin Mokrejs

commit sha cb76d38ebd4347577135420ba6ed8756afd254bb

Rename num variable to num_replicates_minus_samples Make it clearer what the variable contains. The idea is detect if any biological replicates were used at all. If there are 2 repl. + 3 repl. then the calculation is 2 + 3 - 2 = 3 and because 3 > 1 then we let edgeR use the replicates and do not force --dispersion argument. Likewise, 4 replicates + 5 replicates - 2 samples = 7. It might be preferred to require the number of replicates per each sample to be > 1 but we assume the downstream code in edgeR is smarter then us.

view details

Tereza Čalounová

commit sha d47d985ef81c82e87b78e240d6cd9f9476cdef1e

Fix comments

view details

Tomáš Pluskal

commit sha 264d732ee6655dbddd94d1ac94d00ed8c35bd74f

Merge pull request #47 from CalounovaT/commented Fix comment style

view details

Martin Mokrejs

commit sha 3ed02635b9c85ec4b6f63bdea386f3b928d9f02e

Merge branch 'master' of github.com:mmokrejs/transXpress-snakemake into master

view details

Martin Mokrejs

commit sha aaa362259a5021e805e6458f1c0caf941427a112

Merge branch 'master' of https://github.com/transXpress/transXpress-snakemake into master

view details

Martin Mokrejs

commit sha d253c0e5c4975960bb83d65c17a98ac4e9d70bf2

Enable multiple k-mer assembly steps using rnaSPAdes Use odd numbers for k-mer steps of the assembly up to about 2/3 of input read length. The k-mer size is determined by getting average read lengths from input FASTQ files. Keep trinity as the default although spades should do better jobs due to the multiple and longer k-mer sizes. Add a comment explaining what is passed down to 'seqkit stats' Make sure files describing assembly document the assembler actually used Refer to SPAdes docs on how k-mer sizes selected in steps of 22 Get the assembler variable parsed out by snakemake directly Note the missing single quotes around the variable wrapped by square brackets {config[foo]} . Thanks to @tomas-pluskal for the link to https://stackoverflow.com/questions/49140578/how-can-one-access-snakemake-config-variables-inside-shell-section

view details

push time in 15 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha 00142aeda9166b82dc94725c8ac3e53e682065fb

Instruct users how to verify their conda installation Until one can run the whole pipeline it is unclear if all dependencies were properly installed or not. One can run a simple shell one-liner to check what is installed without the necessity to execute the pipeline.

view details

Martin Mokrejs

commit sha 3f7c5b7ee14c574aa4f33f24e25b5d0049845c72

Check for binaries needed for execution are in PATH Provided one may not need to run the whole pipeline or start from its latter steps with pre-processed data, do not raise a hard error but issue a warning instead.

view details

Tomáš Pluskal

commit sha 2ce57d7eebd2fed33092ce2503120221f0090f9f

Update README.md

view details

Tomáš Pluskal

commit sha 00dcff2b3ff1c275754ec9006b2e574bc74ddae5

Update README.md

view details

Tomáš Pluskal

commit sha 49e2b1f3a34bab26e00f08688d763098f88468a1

Update README.md

view details

Martin Mokrejs

commit sha 1db3e5f81759fe617e6903d68c09522befc21b2d

Make sure --samples_file is provided to run_DE_analysis.pl and add tests Keep edgeR being used in both cases for the moment. Add a few datasets to mimic biological replicates in our tests The FASTQ files were derived from SRR2103848 which is another RNA-seq dataset from the firefly like in the existing testcase. Three biological replicates were mimiscked by picking first 5k reads, second 5k of reads and third 5k of reads. The $num calculation fixed in Snakefile works now and can be printed to snakemake output using shell echo command. STDERR output from the run_DE_analysis.pl is stored in logs/trinity_DE.log where eventually message was reported when no --samples_file was provided: note, no biological replicates identified, so setting min reps = $MIN_REPS $ cat logs/trinity_DE.log Got 5 samples, and got: 6 data fields. Header: SRR2103848.repl1 SRR2103848.repl2 SRR2103848.repl3 SRR6345446.repl1 SRR6345446.repl2 Next: TRINITY_DN0_c0_g1 44.10 30.05 37.44 0.00 0.00 $VAR1 = { 'SRR6345446.repl2' => 5, 'SRR2103848.repl1' => 1, 'SRR6345446.repl1' => 4, 'SRR2103848.repl3' => 3, 'SRR2103848.repl2' => 2 }; $VAR1 = { 'SRR6345446' => [ 'SRR6345446.repl1', 'SRR6345446.repl2' ], 'SRR2103848' => [ 'SRR2103848.repl1', 'SRR2103848.repl2', 'SRR2103848.repl3' ] }; Contrasts to perform are: $VAR1 = [ [ 'SRR2103848', 'SRR6345446' ] ]; CMD: Rscript kallisto.gene.counts.matrix.SRR2103848_vs_SRR6345446.SRR2103848.vs.SRR6345446.EdgeR.Rscript Loading required package: edgeR Loading required package: limma Using classic mode. null device 1 $

view details

Martin Mokrejs

commit sha cb76d38ebd4347577135420ba6ed8756afd254bb

Rename num variable to num_replicates_minus_samples Make it clearer what the variable contains. The idea is detect if any biological replicates were used at all. If there are 2 repl. + 3 repl. then the calculation is 2 + 3 - 2 = 3 and because 3 > 1 then we let edgeR use the replicates and do not force --dispersion argument. Likewise, 4 replicates + 5 replicates - 2 samples = 7. It might be preferred to require the number of replicates per each sample to be > 1 but we assume the downstream code in edgeR is smarter then us.

view details

Tereza Čalounová

commit sha d2e96a59bc42a590678bd25fa96b89a610bb798d

Add rules comments

view details

Tomáš Pluskal

commit sha 6d5753cd97671438237c6c429b02c2c7fc482e8d

Merge pull request #46 from CalounovaT/commented Add rules comments

view details

Tomáš Pluskal

commit sha 688e9ae2e8a259c1fde148b87504add1648685e8

Update Snakefile Avoid deleting the logo folder

view details

Tereza Čalounová

commit sha d47d985ef81c82e87b78e240d6cd9f9476cdef1e

Fix comments

view details

Tomáš Pluskal

commit sha 264d732ee6655dbddd94d1ac94d00ed8c35bd74f

Merge pull request #47 from CalounovaT/commented Fix comment style

view details

Martin Mokrejs

commit sha 3ed02635b9c85ec4b6f63bdea386f3b928d9f02e

Merge branch 'master' of github.com:mmokrejs/transXpress-snakemake into master

view details

Martin Mokrejs

commit sha aaa362259a5021e805e6458f1c0caf941427a112

Merge branch 'master' of https://github.com/transXpress/transXpress-snakemake into master

view details

push time in 15 days

pull request commenttransXpress/transXpress

Add rules comments

Did you test this @CalounovaT ?

Running the transXpress-trinity pipeline using snakemake
IndentationError in line 33 of /home/pluskal/proj/transXpress-snakemake/Snakefile:
unexpected indent

Are the empty lines surrounding the docstring allowed in the snakefile?

CalounovaT

comment created time in 25 days

pull request commenttransXpress/transXpress

Enable multiple k-mer assembly steps using rnaSPAdes

Sorry for the many push-forces, now it is hopefully in a single chunk with applied tweaks.

mmokrejs

comment created time in 25 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha 609ca8a2772ed17a2bc25c61cc861b3ddf04e1c8

Enable multiple k-mer assembly steps using rnaSPAdes Use odd numbers for k-mer steps of the assembly up to about 2/3 of input read length. The k-mer size is determined by getting average read lengths from input FASTQ files. Keep trinity as the default although spades should do better jobs due to the multiple and longer k-mer sizes. Add a comment explaining what is passed down to 'seqkit stats' Make sure files describing assembly document the assembler actually used Refer to SPAdes docs on how k-mer sizes selected in steps of 22 Get the assembler variable parsed out by snakemake directly Note the missing single quotes around the variable wrapped by square brackets {config[foo]} . Thanks to @tomas-pluskal for the link to https://stackoverflow.com/questions/49140578/how-can-one-access-snakemake-config-variables-inside-shell-section

view details

push time in 25 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha a4d2ee2b7cfb2a4ac84d63cc0a295c5f9467bf8b

Add a comment explaining what is passed down to 'seqkit stats' Make sure files describing assembly document the assembler actually used Refer to SPAdes docs on how k-mer sizes selected in steps of 22 Get the assembler variable parsed out by snakemake directly Note the missing single quotes around the variable wrapped by square brackets {config[foo]} . Thanks to @tomas-pluskal for the link to https://stackoverflow.com/questions/49140578/how-can-one-access-snakemake-config-variables-inside-shell-section

view details

push time in 25 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha 74fce8da7f8cc43a0e7786f1fe682eb46d9028b2

Get the assembler variable parsed out by snakemake directly Note the missing single quotes around the variable wrapped by square brackets {config[foo]} . Thanks to @tomas-pluskal for the link to https://stackoverflow.com/questions/49140578/how-can-one-access-snakemake-config-variables-inside-shell-section

view details

push time in 25 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha a2319173261ec65ace727dd787472162cee13e40

Get the assembler variable parsed out by snakemake directly Note the missing single quotes around the variable wrapped by square brackets {config[foo]} . Thanks to @plusik for the link to https://stackoverflow.com/questions/49140578/how-can-one-access-snakemake-config-variables-inside-shell-section

view details

push time in 25 days

push eventmmokrejs/transXpress-snakemake

Martin Mokrejs

commit sha a1480fb9e6444269de8eb44274187d1135ca0840

Refer to SPAdes docs on how k-mer sizes selected in steps of 22

view details

push time in 25 days