arxiv:2407.10424

CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization

Published on Jul 15

· Submitted by

yang-z on Jul 19

Upvote

Authors:

Yang Zhao ,

Abstract

The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilog due to the scarcity of high-quality instruction tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on Verilog generation. Regarding this issue, we observe that (1) Verilog code collected from the real world has higher quality than those generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating it. Based on these observations, this paper introduces CodeV, a series of open-source instruction-tuned Verilog generation LLMs. Instead of generating descriptions first and then getting the corresponding code from advanced LLMs, we prompt the LLM with Verilog code and let the LLM generate the corresponding natural language description by multi-level summarization. Experimental results show that CodeV relatively surpasses the previous open-source SOTA by 14.4% (BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.

View arXiv page View PDF Add to collection

Community

yang-z

Paper author Paper submitter Jul 19

CodeV is a series of instruction-tuned Verilog generation LLMs that employ reverse instruction generation for dataset construction.
Our contributions:

Novel Dataset Construction: We propose an effective description-code dataset construction approach by providing GPT-3.5 with Verilog code to summarize corresponding descriptions in a multi-level manner. This method outperforms prior data construction work on Verilog generation tasks.
SOTA Verilog Generation Models: Based on this method, we present a series of SOTA Verilog generation LLMs, namely CodeV. Among them, CodeV-CodeQwen achieves 77.6% pass@1 on the VerilogEval-machine benchmark and 53.2% on the VerilogEval-human benchmark, outperforming GPT-4 and previous SOTA model BetterV, and also achieves a 93.1% syntax pass rate and 55.2% function pass rate on the RTLLM benchmark, outperforming previous SOTA model RTLCoder.
Open Source Contribution: We plan to open-source a series of CodeV models, to support the advancement and collaboration within the LLM, electronic design automation (EDA), and programming language communities.

nielsr

Jul 22

•

edited Jul 22

Hi,

Congrats on your work! I see you're planning to release the models.

Refer to this guide: https://huggingface.co/docs/hub/models-uploading. By including appropriate tags in the model card, people may find your work easier!

Also, you can link your models to this paper, see here: https://huggingface.co/docs/hub/en/model-cards#linking-a-paper.

Let us also know if you plan to release datasets!

Let us know if you need any help :)

Niels
Open-source @ HF

librarian-bot

Jul 20

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.10424 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.10424 in a Space README.md to link it from this page.