Skip to content

Conversation

@ThrudPrimrose
Copy link
Collaborator

I added an option to skip benchmarks if they exist in the npbench.db, this is imho, extremely useful for checkpoint and repeatedly submitting the job.

If the option is set to True, then it tries to read npbench.db database, results table, and if it finds an entry for the (benchname, preset, framework) then it removes from the benchmarks. If it is set to false behavior is same as before.

@ThrudPrimrose ThrudPrimrose changed the title Skip Existing Benchmarks For Run Frameworks Add Skip Existing Option For run_framework.py Jan 25, 2026
Copy link
Contributor

@alexnick83 alexnick83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of questions below:

return all_benchmarks

# Query measured benchmarks
cur.execute("""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also check whether sufficient executions have been recorded? To be clear, all executions for a specific benchmark (in a specific run) are recorded together. Therefore, a partial benchmark, e.g., with 5 out of 10 desired repetitions, is not possible. So, such a feature would only make sense if, in subsequent jobs, the number of repetitions was increased.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I was just not sure how to detect whether all desired repetitions had been completed. We can assume that if a timeout occurs and the desired repetitions are R, then there should be 2 possible values in the database at any given time. (R and another integer that is less than R in value, but not 3 unique values).

I can check for this. What do you think?

if args["skip_existing_benchmarks"]:
benchname_to_shortname_mapping = dict()
json_dir = bench_dir
for json_file in json_dir.glob("*.json"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you just get the benchmark names from line 132 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to load the JSON to get the short benchmark name, since the database already has it.

JSON files' names do not necessarily have long names, so I have to build the mapping. This dictionary needs to load all the JSONs and read the "short_name" field anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants