Reproduce fine tuning but score poorly on the evaluation dataset

Hi,
Thanks to the author for the contribution, but I had some problems reproducing it.
Why do I get bad scores on the evaluation dataset when reproducing fine-tuning results on my RTX3090?
The scores are as follows:
| Dataset   | AbsRel ↓ | Delta_1 ↑ |
|-----------|----------|-----------|
| NYUv2     | 0.056    | 0.963     |
| KITTI     | 0.092    | 0.928     |
| ETH3D     | 0.064    | 0.961     |
| ScanNet   | 0.062    | 0.956     |
| DIODE     | 0.299    | 0.780     |

The following is the train script configuration I use, refer to train_marigold_e2e_ft_depth.sh:
```
Note.
The following are the modified parts:
--checkpointing_steps 500 => to store the best checkpoint
--dataloader_num_workers 4 => speed up training time
--mixed_precision "bf16" => reduce memory usage
--seed 1234 => fixed seed
```


The complete script is as follows:
```
#!/bin/bash

accelerate launch training/train.py \
--pretrained_model_name_or_path "prs-eth/marigold-v1-0" \
--modality "depth" \
--noise_type "zeros" \
--max_train_steps 20000 \
--checkpointing_steps 500 \
--train_batch_size 2 \
--gradient_accumulation_steps 16 \
--gradient_checkpointing \
--learning_rate 3e-05 \
--lr_total_iter_length 20000 \
--lr_exp_warmup_steps 100 \
--dataloader_num_workers 4 \
--mixed_precision "bf16" \
--output_dir "model-finetuned/marigold_e2e_ft_depth_bf16" \
--enable_xformers_memory_efficient_attention \
--seed 1234 \
"$@"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce fine tuning but score poorly on the evaluation dataset #20

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dataset	AbsRel ↓	Delta_1 ↑
NYUv2	0.056	0.963
KITTI	0.092	0.928
ETH3D	0.064	0.961
ScanNet	0.062	0.956
DIODE	0.299	0.780

Reproduce fine tuning but score poorly on the evaluation dataset #20

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions