profile
viewpoint

automl/learna 44

End-to-end RNA Design using deep reinforcement learning

Rungetf/HiCExplorer 0

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.

Rungetf/learna 0

End-to-end RNA Design using deep reinforcement learning

issue commentuci-cbcl/UFold

Many Errors in Data and Test pipeline

Hi, Thanks for the fast reply and the fixes! I appreciate linking the data and just tested with your provided files. Everything works fine now, thanks.

Rungetf

comment created time in 2 months

issue openeduci-cbcl/UFold

Many Errors in Data and Test pipeline

Hi,

I wanted to run UFold on some own test data (the original TS1 data accompanying the SPOT-RNA publication. After putting the respective file in .bpseq format into the data/ folder, I tried running the process_data_newdataset.py script to produce the cPickle files needed. I used the following command

python process_data_newdataset.py data/<respective_folder_with_bpseq_files>

which produces a NameError because one_hot_matrix is not defined. This results from an Error in the awk call in line 55, which cannot find the files due to a missing /. Changing the command to

python process_data_newdataset.py data/<respective_folder_with_bpseq_files>/

however, fixes the NameError but results in a ValueError in the list comprehension in line 69. I finally changed line 55 from

t0 = subprocess.getstatusoutput('awk \'{print $2}\' '+file_dir+item_file)

to

t0 = subprocess.getstatusoutput('awk \'{print $2}\' '+file_dir + '/' +item_file)

and ran the call without trailing /, which fixes the Errors. However, I got stucked in the pdb.set_trace() call in line 127. Unfortunately, without this call there is still a FileNotFoundError due to a hard-coded path in the final cPickle dump that needs to be fixed (setting the path to file_dir + '.cPickle does not produce additional errors and produces the desired file).

After that I tried running the ufold_test.py script to evaluate the performance of UFold on the produced data but ran into similar issues:

  1. Call stops at pdb.set_trace()
  2. Hard-coded model paths don't fit the provided models in the drive
    • The provided models are ufold_train_alldata.pt, ufold_train_pdbfinetune.pt, and ufold_train.pt
    • In the code there is unet_train_on_merge_alldata_98.pt and ufold_train_on_pdb_contrafold_pdbfinetune_99.pt in lines 229 and 231, respectively
  3. A ModuleNotFoundError when setting --nc True because e2efold cannot be found (import in line 25)

And maybe some more that I currently don't remember.

However, I finally managed to run the script on TS1 but the results were very poor with the provided models (in the range of 3e-13 f1-scores). Probably there is more that needs to be fix that I'm not aware of yet.

After that, I switched to the Webserver but got empty files for download with the first two sample sequences I tested (both .ct and dot-bracket files; with and without non-canonical pairs).

From a user perspective this was a very bad experience and code accompanying a NAR publication in 2021 should at least have running example scripts that can be used out-of-the-box I think.

Having said that, I'm looking forward to running your code once the issues have been fixed.

Best regards

created time in 2 months

more