djuna
/

Qwen2-4x1.5B

Mixture of Experts

cognitivecomputations/dolphin-2.9.3-qwen2-1.5b

Replete-AI/Replete-Coder-Qwen2-1.5b

Qwen/Qwen2-1.5B-Instruct

macadeliccc/Samantha-Qwen2-1.5B

Model card Files Files and versions Community

Qwen2-4x1.5B / mergekit_moe_config.yml

djuna's picture

Upload folder using huggingface_hub

d068029 verified 2 months ago

history blame contribute delete

No virus

1.08 kB


	base_model: cognitivecomputations/dolphin-2.9.3-qwen2-1.5b
	gate_mode: hidden
	architecture: qwen
	dtype: bfloat16
	experts_per_token: 2
	experts:
	- source_model: cognitivecomputations/dolphin-2.9.3-qwen2-1.5b
	positive_prompts:
	- "chat"
	- "explain"
	- "describe"
	- "define"
	- "help"
	- source_model: Replete-AI/Replete-Coder-Qwen2-1.5b
	positive_prompts:
	- "code"
	- "algorithm"
	- "programming"
	- "development"
	- "software"
	- "framework"
	- source_model: Qwen/Qwen2-1.5B-Instruct
	positive_prompts:
	- "assistant"
	- "translate"
	- "summarize"
	- "rewrite"
	- "multilingual"
	- source_model: macadeliccc/Samantha-Qwen2-1.5B
	positive_prompts:
	- "characters"
	- "scene"
	- "roleplay"
	- "writing"
	- "creative"
	- "acting"
	shared_experts:
	- source_model: cognitivecomputations/dolphin-2.9.3-qwen2-1.5b
	positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
	- "chat"
	- "assistant"
	# (optional, but recommended:)
	residual_scale: 0.1