Build software better, together

neulab / code-bert-score

CodeBERTScore: an automatic metric for code generation, based on BERTScore

code score bert codebert bertscore code-bert-score code-bertscore codebertscore

Updated Mar 1, 2024
Jupyter Notebook

txsun1997 / Metric-Fairness

EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

natural-language-processing deep-learning text-generation fairness-ai fairness-ml pretrained-language-model bertscore metric-bias

Updated Oct 19, 2022
Jupyter Notebook

benitomartin / llm-observability-opik

Star

LLM Evaluation and Observability System for Football Content

python mongodb pre-commit openai cosine-similarity evaluation-metrics comet-ml hallucination huggingface-transformers zenml bertscore

Updated Jun 19, 2025
Python

BramVanroy / mateo-demo

Star

MAchine Translation Evaluation Online (MATEO)

machine-translation comet ter bleu clarin machine-translation-metrics streamlit bertscore bleurt machine-translation-evaluation chrf

Updated May 8, 2026
Python

srx7703 / multi-horizon-financial-llm

Star

Domain-specialized Gemma 2 27B + Gemma 4 31B for SEC filings — fine-tuned on TPU v6e-8 with PyTorch/XLA FSDPv2, plus a Vertex AI Vector Search RAG demo (69 tickers × 381 filings). Same LoRA recipe, +3.5% / +5.8% BERTScore F1.

finance lora knowledge-distillation gemma tpu peft sec-filings rag streamlit pytorch-xla vertex-ai bertscore gemma-2 gemma-4

Updated Apr 26, 2026
Python

ntphuc149 / ViAG

Star

ViAG: A Novel Framework for Fine-tuning Answer Generation models ultilizing Encoder-Decoder and Decoder-only Transformers's architecture

meteor question-answering bart llama rouge bleu-score encoder-decoder fine-tuning answer-generation t5 plms bartpho llm bertscore instruction-tuning qlora qwen decoder-only vit5

Updated May 26, 2025
Python

krish1925 / Persona-Chatbot-G28

Star

Fine-tuning GPT-3.5 and Llama3 LLMs for enhanced persona consistency in chatbots using Google's Synthetic Persona Chat dataset

rouge-metric finetuning perplexity persona-chatbot bertscore gpt-3-5-turbo unsloth llama3

Updated May 3, 2025
Jupyter Notebook

ShayanSalehi81 / MedicalQuestionAnsweringSystem

Star

Medical Question Answering System using on PubMed dataset.

natural-language-processing pubmed bert t5-model bertscore

Updated May 29, 2025
Jupyter Notebook

luizanisio / agent-orchestration-2026

Star

Agent Orchestration - LLM for Legal Metadata Extraction: A Comparative Analysis of Efficiency and Precision (paper 161 PROPOR)

python data-science data-extraction slm rouge-metric llm bertscore agent-orchestration

Updated Apr 20, 2026
Python

haticeozbolat01 / Text-Summarization-How-to-Calculate-BertScore

Star

About BertScore

python transformer bertscore

Updated Sep 26, 2023
Jupyter Notebook

soniatyburczy / llama2-qlora-sft-coverletter-project

Star

Implementation of a task-specific QLoRA supervised fine-tuning pipeline for LLaMA-2-7B-Chat, developed for an independent study on structured cover letter generation.

nlp natural-language-processing transformers text-generation pytorch rouge lora model-evaluation fine-tuning peft huggingface llm bertscore supervised-finetuning qlora llama2 parameter-efficient-fine-tuning structured-generation

Updated Dec 13, 2025
Python

liux2 / BERT_score_T5

Star

Experimenting changing loss function in T5 to BERTScore

bert webnlg t5 bertscore

Updated May 26, 2022
Jupyter Notebook

ddm06 / NLG-The-impact-of-data-quality-on-automatic-text-generation-from-RDF-data

Star

The work presented was developed during the internship, as researchers in the field of Natural Language Generation, at the Insid&s Lab laboratory in Milan-Bicocca. The work carried out deals with the creation of a framework for the correct assessment of the impact of the quality of the input datasets on the quality of the text generated by the N…

python natural-language-processing deep-learning torch artificial-intelligence lstm rdf-triples natural-language-generation nlg rouge-metric bleu-score transformer-architecture bertscore

Updated Mar 30, 2026
Jupyter Notebook

TSS-sniper / Research-paper-Summarizer-with-Realtime-Eval

Star

An LLM-powered application that summarizes scientific research papers, extracts tables, and provides real-time evaluation using BERTScore (F1) and ROUGE. Built using Meta’s LLaMA 3–8B via Groq, with table extraction powered by pdfplumber and pandas.

text-summarization rouge-metric research-paper llm bertscore meta-llama3

Updated Jul 23, 2025
Python

Anushkaghei / Hallucination-Detection-In-LLMs

Star

Detecting and Mitigating Self Contradictory Hallucinations in LLMs using a Multi-Agent System and Stepback Prompting

detection rouge multi-agent-systems mitigation bleu-score bertscore llms stepback-prompting

Updated May 30, 2024
Jupyter Notebook

gamzeakkurt / BART-Insights

Star

A project for student essay analysis using NLP, ML, and generative AI. Essays are classified with models like Logistic Regression, KNN, SVM, XGBoost, and BERT, and conclusions are generated using BART and evaluated with ROUGE & BERTScore.

nlp machine-learning ai datascience bart rouge bert textanalysis textclassification bertscore generativeai

Updated Apr 3, 2026
Jupyter Notebook

LazerLambda / modern-bert-score

Star

Re-implementation of BERTScore for evaluation of generated text, leveraging vLLM and SentenceTransformers.

nlp machine-learning ai metrics evaluation ml transformers sentence-transformers llm bertscore vllm

Updated Mar 23, 2026
Python

rmaacario / LLMs-vs.NMT-spatial-semantics-translation

Star

Code and data from the master’s thesis “Decoding Spatial Semantics”. Analyzes and compares open-source LLMs and NMT systems in translating spatial prepositions from English to Brazilian Portuguese. Includes preprocessing scripts, datasets, and evaluation metrics.

python nlp deep-learning machine-translation comet transformers bleu-score bertscore

Updated Sep 15, 2024
Jupyter Notebook

a-iceberg / clustering_and_naming_categories

Star

Summarization, clastering and characterization of text categories using LLM

python nlp data-science deep-learning clustering transformers openai data-analysis summarization gpt mssqlserver llm bertscore prompt-engineering

Updated Feb 26, 2025
Jupyter Notebook

SamiINReciept / Vector-Graph-RAG-Evaluation-Cybersecurity-Australian-SMEs

Star

Benchmarking Vector RAG vs Graph RAG for cybersecurity question answering on Australian SMBs using LLaMA 3.1, Mistral 7B, and Qwen3 with ChromaDB and Neo4j

nlp neo4j evaluation transformers knowledge-graph cybersecurity rag vector-search huggingface bertscore chromadb mistral-7b graph-rag llama3 qwen3 australian-smes

Updated Apr 9, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bertscore

Here are 32 public repositories matching this topic...

neulab / code-bert-score

txsun1997 / Metric-Fairness

benitomartin / llm-observability-opik

BramVanroy / mateo-demo

srx7703 / multi-horizon-financial-llm

ntphuc149 / ViAG

krish1925 / Persona-Chatbot-G28

ShayanSalehi81 / MedicalQuestionAnsweringSystem

luizanisio / agent-orchestration-2026

haticeozbolat01 / Text-Summarization-How-to-Calculate-BertScore

soniatyburczy / llama2-qlora-sft-coverletter-project

liux2 / BERT_score_T5

ddm06 / NLG-The-impact-of-data-quality-on-automatic-text-generation-from-RDF-data

TSS-sniper / Research-paper-Summarizer-with-Realtime-Eval

Anushkaghei / Hallucination-Detection-In-LLMs

gamzeakkurt / BART-Insights

LazerLambda / modern-bert-score

rmaacario / LLMs-vs.NMT-spatial-semantics-translation

a-iceberg / clustering_and_naming_categories

SamiINReciept / Vector-Graph-RAG-Evaluation-Cybersecurity-Australian-SMEs

Improve this page

Add this topic to your repo