Skip to content

Number of workers not matching number of nodes #26

@tornadoslims

Description

@tornadoslims

+1 on building such an awesome product guys. Here's an issue I've ran into a couple times -

If you hit an OOM or do something else that corrupts state you can lose workers that won't come back with a

bc.dask_client.restart() or client.restart()

This isn't a huge issue bc it can be quickly fixed by stopping and starting the cluster, and if a 32 node cluster drops to 25 workers everything still works.

More of an issue - I just stopped and started a 128 node cluster and it came up with only 1 worker. restarting dask client from within py didn't help. Trying to reproduce. I took some screenshots and kept the logs - will send them over.

JB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions