Skip to content

Intuition for self.iter_size (or accumulate gradients) #4

@ghazni123

Description

@ghazni123

I have skimmed through the papers however didn't find the detailed explanation on accumulate gradients. Please help me understand. Generally simplified flow is like

predicted_output = model(input)
loss = loss_function(predicted_output, ground_truth)
optimizer.zero_grad()
loss.backward()
optimizer.step()

However in code, gradients are accumulated for 10 iterations and then reset. I am wondering what +ve or -ve impacts it will have if I

1: reset on each iteration means along the lines of above general algorithm flow
2: increase/decrease the self.iter_size
3: add support for multi-batching and multi-gpu

Many thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions