codelion (Asankhaya Sharma)

posted an update 15 days ago

Post

2258

A new paper titled "STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis" shows the benefits of integrating static analysis with LLMs. (https://arxiv.org/abs/2406.10018)

Authors evaluate 4 key questions:

- How does each static analysis integration strategy perform in LLM-based repository-level code completion?
> They found that integrating static analysis in the prompting phase (especially with file-level dependencies) can achieve the substantially larger improvements than other phases.

- How do different combinations of integration strategies affect LLM-based repository-level code completion?
> Languages that are easier to analyze like Java show more improvements compared to dynamic languages like Python.

- How do static analysis integration strategies perform when compared or combined with RAG in LLM-based repository-level code completion?
> Static analysis and RAG are complementary and boost the overall accuracy.

- What are the online costs of different integration strategies in LLM-based repository-level code completion?
> Combining prompting-phase static analysis and RAG is the best option for cost-effectiveness.

In my @owasp App Sec keynote last year, I had described how one can do static analysis augmented generation (SaAG) to boost the accuracy of LLM based patches for vulnerability remediation. (you can see the talk here - https://www.youtube.com/watch?v=Cw4-ZnUNVLs)

posted an update 22 days ago

Post

2176

LLM-Assisted Patching of Polyfill Supply Chain Attack

A recent supply chain attack on polyfill.io affected over 100,000 websites (see https://www.patched.codes/blog/patching-the-polyfill-supply-chain-attack). To address this issue, we show how developers can leverage Large Language Models (LLMs) for efficient vulnerability patching:

1. Automated Detection: Using Semgrep rules (see https://semgrep.dev/playground/r/KxUvD7w/asankhaya_personal_org.polyfill-compromise-copy) to identify vulnerable code.

2. LLM-Powered Patching: Utilizing Patchwork (https://github.com/patched-codes/patchwork), an open-source solution that employs LLMs to automatically fix vulnerabilities.

3. Custom Workflows: The "Fixpolyfill" patchflow (https://github.com/patched-codes/patchwork-configs/tree/main/patchflows/Fixpolyfill) , tailored for this specific attack, can be easily run across multiple repositories.

4. Scalable Solutions: Options to scan and patch entire GitHub/GitLab organizations, with automated pull request generation.

5. Rapid Response: LLM-assisted patching enables swift action to minimize damage from supply chain attacks.

This approach demonstrates how LLMs can be effectively used to quickly respond to and remediate widespread security vulnerabilities in code.

posted an update about 1 month ago

Post

4013

The new Claude Sonnet 3.5 model from Anthropic AI has been getting good reviews on since last night. It is quite good at coding related tasks. We tried it on the Static Analysis Eval benchmark ( patched-codes/static-analysis-eval) which measures the ability of a LLM to fix vulnerabilities. The model scores 59.21% which is good but not better than other frontier models (like GPT-4, Gemini-1.5 and LLama-3).

1 reply

·

posted an update about 1 month ago

Post

2515

Automatically generate docstrings for your code using LLMs. We just released a new patchflow that can generate docstrings - https://github.com/patched-codes/patchwork/tree/main/patchwork/patchflows/GenerateDocstring

Here is an example PR that does it - https://github.com/codelion/example-java-maven/pull/4

You can check out other patchflows to automate developer chores with patchwork https://github.com/patched-codes/patchwork

posted an update about 2 months ago

Post

1605

WorkerSafetyQAEval: A new benchmark to evaluate worker safety domain question and answering

Happy to share a new benchmark on question and answers for worker safety domain. The benchmark and leaderboard is available at
codelion/worker-safety-qa-eval

We evaluate popular generic chatbots like ChatGPT and HuggingChat on WorkerSafetyQAEval and compare it with a domain specific RAG bot called Securade.ai Safety Copilot - codelion/safety-copilot It highlights the importance of having domain specific knowledge for critical domains like worker safety that require high accuracy. Securade.ai Safety Copilot achieves ~97% on the benchmark setting a new SOTA.

You can read more about the Safety Copilot on https://securade.ai/blog/how-securade-ai-safety-copilot-transforms-worker-safety.html

replied to their post 2 months ago

We can use it in patchwork already by using model=gemini-1.5-flash-latest at CLI.

We have benchmarked the AutoFix patchflow in patchwork on a number of projects. E.g. here are a few PRs with differed models that fix vulnerabilities -

https://github.com/patched-codes/dvpwa/pulls

https://github.com/patched-codes/tarpit/pulls

https://github.com/patched-codes/shiftleft-java-demo/pulls

https://github.com/patched-codes/AltoroJ/pulls

https://github.com/patched-codes/pygoat/pulls

replied to their post 2 months ago

Here are the updated numbers with the new gemini-1.5-flash-latest model that was released at @goog1e IO - https://huggingface.co/posts/codelion/955796074731531

posted an update 2 months ago

Post

1099

After the announcements yesterday, I got a chance to try the new gemini-1.5-flash model from @goog1e , it is almost as good as gpt-4o on the StaticAnalaysisEval ( patched-codes/static-analysis-eval) It is also a bit faster than gpt-4o and much cheaper.

I did run into a recitation flag with an example in the dataset where the api refused to fix the vulnerability and flagged the input as using copyrighted content. This is something you cannot unset even with the safety filters and seems to be an existing bug https://issuetracker.google.com/issues/331677495

But overall you get gpt-4o level performance for 7% the price, we are thinking of making it default in patchwork - https://github.com/patched-codes/patchwork You can use the google_api_key and model options to choose gemini-1.5-flash-latest to run it with patchwork.

2 replies

·

replied to their post 2 months ago

At the moment we do not have any multimodal examples in the benchmark. The focus has been on vulnerability remediation but I cannot think off any use to utilize it in coding related tasks? Do you have any ideas on how multi modality can be exploited for something like coding?

posted an update 2 months ago

Post

1752

The new gpt-4o model seems to a very good coder. OpenAI reported a 90+ score on https://huggingface.co/datasets/openai_humaneval

We tried the new model on our patched-codes/static-analysis-eval which evaluates the model on vulnerability remediation. gpt-4o has reclaimed the top spot on our leaderboard (from meta-llama/Meta-Llama-3-70B-Instruct).

You can now use the new model with our open-source framework PatchWork - https://github.com/patched-codes/patchwork by passing model=gpt-4o on the CLI.

5 replies

·

replied to their post 3 months ago

Thank you!

replied to Jaward's post 3 months ago

Great thanks, would love to see the kind of output it produces directly. We have been trying to automate agentic workflows using an open source framework called patchwork - https://github.com/patched-codes/patchwork

It is more deterministic and we are focussing only specific workflows so would love to compare with something like Devin.

posted an update 3 months ago

Post

1758

Happy to announce the open source framework to turbo charge devops called patchwork - https://github.com/patched-codes/patchwork

You can use it to build patchflows - workflows that use LLMs for software development tasks like bug fixing, pull request review, library migration and documentation.

Supports any LLM of your choice including our own MoE model - patched-codes/patched-mix-4x7B

Give it a try!

2 replies

·

replied to Jaward's post 3 months ago

Can you share the apps that it created?

posted an update 3 months ago

Post

1816

Meta's new LLama-3 ( meta-llama/Meta-Llama-3-8B-Instruct) is an extremely capable model out of the box for coding related tasks. It is the first model that we have seen that beats GPT-4 on Static-Analysis-Eval - patched-codes/static-analysis-eval.

replied to WizardLM's post 3 months ago

The weights seem to have been taken down?

posted an update 3 months ago

Post

1942

We just released a new MoE model (meraGPT/mera-mix-4x7B) that is half as large as Mixtral-8x7B while still been competitive with it across different benchmarks. mera-mix-4x7B achieves 76.37 on the open LLM eval.

You can check mera-mix-4x7B out on HF here - meraGPT/mera-mix-4x7B

Asankhaya Sharma PRO

AI & ML interests

Organizations

codelion's activity