This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
-
Updated
Sep 10, 2023 - Python
This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
A structured documentation hub for AI and ML concepts, based on Andrej Karpathy's 'Zero to Hero' series, featuring practical implementations and learning resources for language models and transformers.
Lyrics Generation:notes: using LSTM , word2vec Analysis and more
Text Article generator using using Character level LSTM network.
Build a character level language model to generate new dinosaur names
Sequence Models coding assignments
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis
Notebooks of programming assignments of Sequence Models course of deeplearning.ai on coursera in May-2020
An implementation of "Character-level Convolutional Networks for Text Classification" in Tensorflow. See https://arxiv.org/pdf/1509.01626.pdf.
It aims to write new sentences by learning character units sentences using RNN. As training data, a collection of Shakespeare's novels was used.
Character-level GPT trained on Truyện Kiều (Nguyễn Du) — based on Karpathy's nanoGPT
Annotated study fork of Karpathy nanoGPT — GPT-2 training from scratch with extended notes on causal self-attention, positional encoding, layer norm placement, and efficient fine-tuning on custom datasets.
This repository contains the code and PLODv2 dataset to train character-level language models (CLM) for abbreviation and long-form detection released with our LREC-COLING 2024 publication
A character-level Hindi language model built from scratch in PyTorch using manual backpropagation and custom character embeddings.
Character-level and token-based language models implemented in pure PyTorch.
This repository contains the source code for our research on character-level models in Arabic Natural Language Processing (NLP).
This project is a minimal implementation of a character-level language model inspired by the makemore series by Andrej Karpathy. The goal is to build an intuitive understanding of how neural networks can learn patterns in text and generate new sequences—in this case, human-like names.
Add a description, image, and links to the character-level-language-model topic page so that developers can more easily learn about it.
To associate your repository with the character-level-language-model topic, visit your repo's landing page and select "manage topics."