CodeBERTScore: an automatic metric for code generation, based on BERTScore
-
Updated
Mar 1, 2024 - Jupyter Notebook
CodeBERTScore: an automatic metric for code generation, based on BERTScore
EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
LLM Evaluation and Observability System for Football Content
MAchine Translation Evaluation Online (MATEO)
Domain-specialized Gemma 2 27B + Gemma 4 31B for SEC filings — fine-tuned on TPU v6e-8 with PyTorch/XLA FSDPv2, plus a Vertex AI Vector Search RAG demo (69 tickers × 381 filings). Same LoRA recipe, +3.5% / +5.8% BERTScore F1.
ViAG: A Novel Framework for Fine-tuning Answer Generation models ultilizing Encoder-Decoder and Decoder-only Transformers's architecture
Fine-tuning GPT-3.5 and Llama3 LLMs for enhanced persona consistency in chatbots using Google's Synthetic Persona Chat dataset
Medical Question Answering System using on PubMed dataset.
Agent Orchestration - LLM for Legal Metadata Extraction: A Comparative Analysis of Efficiency and Precision (paper 161 PROPOR)
Implementation of a task-specific QLoRA supervised fine-tuning pipeline for LLaMA-2-7B-Chat, developed for an independent study on structured cover letter generation.
The work presented was developed during the internship, as researchers in the field of Natural Language Generation, at the Insid&s Lab laboratory in Milan-Bicocca. The work carried out deals with the creation of a framework for the correct assessment of the impact of the quality of the input datasets on the quality of the text generated by the N…
An LLM-powered application that summarizes scientific research papers, extracts tables, and provides real-time evaluation using BERTScore (F1) and ROUGE. Built using Meta’s LLaMA 3–8B via Groq, with table extraction powered by pdfplumber and pandas.
Detecting and Mitigating Self Contradictory Hallucinations in LLMs using a Multi-Agent System and Stepback Prompting
A project for student essay analysis using NLP, ML, and generative AI. Essays are classified with models like Logistic Regression, KNN, SVM, XGBoost, and BERT, and conclusions are generated using BART and evaluated with ROUGE & BERTScore.
Re-implementation of BERTScore for evaluation of generated text, leveraging vLLM and SentenceTransformers.
Code and data from the master’s thesis “Decoding Spatial Semantics”. Analyzes and compares open-source LLMs and NMT systems in translating spatial prepositions from English to Brazilian Portuguese. Includes preprocessing scripts, datasets, and evaluation metrics.
Summarization, clastering and characterization of text categories using LLM
Benchmarking Vector RAG vs Graph RAG for cybersecurity question answering on Australian SMBs using LLaMA 3.1, Mistral 7B, and Qwen3 with ChromaDB and Neo4j
Add a description, image, and links to the bertscore topic page so that developers can more easily learn about it.
To associate your repository with the bertscore topic, visit your repo's landing page and select "manage topics."