Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch

For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting?

No loras, no quants, no tricks.

This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on.

And, merge them if you want!

Downloads last month: 31

Safetensors

Model size

35B params

Tensor type

BF16

Inference API

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

CausalLM
/

35b-beta2ep

Datasets used to train CausalLM/35b-beta2ep

Collection including CausalLM/35b-beta2ep

34B & 35B