-
Notifications
You must be signed in to change notification settings - Fork 39
Add Skip Existing Option For run_framework.py #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
alexnick83
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of questions below:
| return all_benchmarks | ||
|
|
||
| # Query measured benchmarks | ||
| cur.execute(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also check whether sufficient executions have been recorded? To be clear, all executions for a specific benchmark (in a specific run) are recorded together. Therefore, a partial benchmark, e.g., with 5 out of 10 desired repetitions, is not possible. So, such a feature would only make sense if, in subsequent jobs, the number of repetitions was increased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I was just not sure how to detect whether all desired repetitions had been completed. We can assume that if a timeout occurs and the desired repetitions are R, then there should be 2 possible values in the database at any given time. (R and another integer that is less than R in value, but not 3 unique values).
I can check for this. What do you think?
| if args["skip_existing_benchmarks"]: | ||
| benchname_to_shortname_mapping = dict() | ||
| json_dir = bench_dir | ||
| for json_file in json_dir.glob("*.json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just get the benchmark names from line 132 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to load the JSON to get the short benchmark name, since the database already has it.
JSON files' names do not necessarily have long names, so I have to build the mapping. This dictionary needs to load all the JSONs and read the "short_name" field anyway.
I added an option to skip benchmarks if they exist in the
npbench.db, this is imho, extremely useful for checkpoint and repeatedly submitting the job.If the option is set to True, then it tries to read npbench.db database, results table, and if it finds an entry for the (benchname, preset, framework) then it removes from the benchmarks. If it is set to false behavior is same as before.