legekka
/

AI-Anime-Image-Detector-ViT

Image Classification

image-detection

ai-image-generation

human-detection

Inference Endpoints

Model card Files Files and versions Community

AI-Anime-Image-Detector-ViT / README.md

legekka's picture

Update README.md

fdd3e52 verified about 2 months ago

|

3.47 kB

	---
	license: apache-2.0
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- image-detection
	- ai-image-generation
	- anime
	- ai-anime
	- human-detection
	- art
	---

	# AI Anime Image Detector ViT

	This model is a proof of concept model of detecting anime styled AI images. Using Vision Transformer, it was trained on 1M human-made real and 217K AI generated anime images. During training either type appeared in equal amount to avoid biases. The model was trained on a single RTX 3090 GPU for about 40 hours, ~35 epochs.

	## Evaluation

	Each checkpoint was evaluated on 500-500 real and AI images.
	- Training Loss: 0.1009
	- Eval Loss: 0.1386

	It seems like using random crops helped the model to generalize better, however, the training dataset only contained 512x512 images, which meant that every cropped image had bilinear interpolation. Training the model on 1024x1024 images could probably further improve its performance.

	## Performance comparison

	We did a small comparison with the current available AI image detectors. Note that these models were not specificly trained on anime images.

	\| Image \| Nahrawy/AIorNot \| umm-maybe/AI-image-detector \| Organika/sdxl-detector \| Ours \|
	\|--------------------\|-----------------\|-----------------------------\|------------------------\|------------\|
	\| D:\test\ai_1.jpg \| ai (100%) \| human (86%) \| artificial (100%) \| ai (100%) \|
	\| D:\test\ai_2.jpg \| ai (99%) \| human (96%) \| artificial (100%) \| ai (100%) \|
	\| D:\test\ai_3.jpg \| ai (77%) \| human (98%) \| artificial (100%) \| ai (100%) \|
	\| D:\test\ai_4.jpg \| real (66%) \| human (100%) \| human (100%) \| real (100%)\|
	\| D:\test\ai_5.jpg \| ai (51%) \| human (99%) \| artificial (55%) \| real (65%) \|
	\| D:\test\ai_6.jpg \| ai (100%) \| human (98%) \| artificial (100%) \| ai (84%) \|
	\| D:\test\real_1.jpg \| ai (99%) \| human (99%) \| artificial (100%) \| ai (55%) \|
	\| D:\test\real_2.jpg \| ai (88%) \| human (100%) \| artificial (100%) \| real (85%) \|
	\| D:\test\real_3.jpg \| ai (95%) \| human (96%) \| artificial (100%) \| real (97%) \|
	\| D:\test\real_4.jpg \| real (90%) \| human (100%) \| artificial (97%) \| real (94%) \|
	\| D:\test\real_5.jpg \| ai (75%) \| human (100%) \| human (57%) \| real (100%)\|
	\| D:\test\real_6.jpg \| ai (89%) \| human (98%) \| human (100%) \| real (99%) \|
	\| Accuracy: \| 50% \| 50% \| 58% \| 75% \|


	## Usage

	Example inference code:

	```python
	from transformers import AutoModelForImageClassification, AutoFeatureExtractor
	import torch
	from PIL import Image

	model = AutoModelForImageClassification.from_pretrained("legekka/AI-Anime-Image-Detector-ViT")
	feature_extractor = AutoFeatureExtractor.from_pretrained("legekka/AI-Anime-Image-Detector-ViT")

	model.eval()

	image = Image.open("example.jpg")
	inputs = feature_extractor(images=image, return_tensors="pt")

	outputs = model(**inputs)
	logits = outputs.logits

	label = model.config.id2label[torch.argmax(logits).item()]
	confidence = torch.nn.functional.softmax(logits, dim=1)[0][torch.argmax(logits)].item()

	print(f"Prediction: {label} ({round(confidence * 100)}%)")
	```