Skip to content

About the issue of training time #13

@Toughq

Description

@Toughq

Hello author, thank you for your contribution to the paper. During the process of reproducing your paper, I encountered an issue with the training time being too long. In your paper, you said: BitDistiller completes the process in approximately 3 Hours on a single A100-80G GPU, but during my experiments on Llama3, the results showed that it took thousands of hours. I would like to know where my problem lies and hope to receive your reply.I used two A100-80G GPUs with batch size of 8.

Uploading 1732804561729.png…

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions