I downloaded the new build and ran some tests. I am not getting reproducible results. I think some downstream steps are not utilizing the global seed still.
I can reproduce the .PRO files when I just run with --express, but when I run the rest of the pipeline, the .pro file gets modified.
yes, thats one issue at the moment. The .pro file gets updated. But I am not sure where the problem is. I ran some tests here and I am able to create the same results. Maybe I am missing something. Can you maybe lay out how you call the Simulator and when you would expect to get the same results? For my local tests, I simply rerun the full pipeline with the same parameters and the same SEED. But you are trying to reuse some of the files ? Like the .pro ?
I expect all the output files to be same. If that's not possible, I expect the final FastQ file to be the same.
To test, I ran Flux-Simulator with the same .PAR, .GTF, and fasta sequences files, side-by-side on the same machine. I monitored the two runs, and checked the md5sum of the .PRO file shortly after they started, and they have the same hash. Once the flux-simulator completed, I checked the md5sum on all the output files (.PRO, .LIB .BED, and .FASTQ) and they all have different hashes.
I am using Flux-Simulator v1.2.1-20130503160104 (Flux Library: 1.22-20130503160104)
5 Comments
Thasso Griebel
Hi Andrew,
unfortunately this is currently not possible. I create a ticket (BARNA-306) and will try to implement it for the next release.
-Thasso
Thasso Griebel
Hey Andrew,
I implemented the SEED feature. Its not released yet, but it already in a nightly-build. If you want, download the lates nightly from here:
http://sammeth.net/artifactory/barna-nightly/barna/barna.simulator/1.2.1-20130503160104/flux-simulator-1.2.1-20130503160104.tgz
and set the SEED parameter in your parameter file to a number > 0.
Andrew Tritt
Hey Thasso,
I downloaded the new build and ran some tests. I am not getting reproducible results. I think some downstream steps are not utilizing the global seed still.
I can reproduce the .PRO files when I just run with --express, but when I run the rest of the pipeline, the .pro file gets modified.
Andrew
Thasso Griebel
Hey,
yes, thats one issue at the moment. The .pro file gets updated. But I am not sure where the problem is. I ran some tests here and I am able to create the same results. Maybe I am missing something. Can you maybe lay out how you call the Simulator and when you would expect to get the same results? For my local tests, I simply rerun the full pipeline with the same parameters and the same SEED. But you are trying to reuse some of the files ? Like the .pro ?
-Thasso
Andrew Tritt
I expect all the output files to be same. If that's not possible, I expect the final FastQ file to be the same.
To test, I ran Flux-Simulator with the same .PAR, .GTF, and fasta sequences files, side-by-side on the same machine. I monitored the two runs, and checked the md5sum of the .PRO file shortly after they started, and they have the same hash. Once the flux-simulator completed, I checked the md5sum on all the output files (.PRO, .LIB .BED, and .FASTQ) and they all have different hashes.
I am using Flux-Simulator v1.2.1-20130503160104 (Flux Library: 1.22-20130503160104)
Is that the same version you're testing?
Andrew