background image

LLM Generated Text Detection

Was the essay written by an LLM or Human?

Natural Language

Is the text generated by an LLM?

Paste your content below, and we’ll tell you likely it is that it has been generated by an LLM

Word Count

0/1000 words

Results
Large-Language Model

0%

Student

0%

How The Model Works

Training Data

The model was trained on approximately 1.2 million essays equally divided into 2 categories: AI written and Student written. The data was tokenised with a character level tokeniser of vocabulary size 30000. You can visit the Kaggle notebook or the GitHub repository to see the exact datasets used for training.

Model architecture

The architecture uses an LSTM layer for classification. The LSTM is a recurrent neural network that takes 1000 tokens and then processes them. The recurrent neural network produces 1000 outputs and the last layer is used for classification via a linear layer and sigmoid activation. To prevent overfitting, there were dropouts of 0.3 placed in both the LSTM layers and after it.

Metrics

In order to evaluate the model, I used a test set which was 10% of the entire dataset. The metrics used for evaluation were the accuracy and f1-score. The accuracy gives us an overview of the model's performance but, the f1-score takes into consideration the performance of the model on both classes.

94.5%

Test Accuracy

94.5%

Test F1-Score
logo

Neural Nexus

creation credit