1 6 8

Jaward Sesay

Jaward

https://github.com/Jaykef

Jaykef_

Jaykef

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22

• 2

On Coding Your First Attention

Apr 21

• 7

Organizations

Posts 35

Post

780

Time and again deeply nature-inspired AI has proven to be THE effective solution to complex AI problems. Liquid Time-Constant Networks (orLNNs) are a testament to this. Inspired by the dynamic nature of the human brain, neurons in LNNs can adapt their time constants, offering remarkable flexibility with context-aware information processing capabilities that surpass that of traditional Recurrent Neural Networks.

The paper proposed a new neural network architecture that utilizes dynamic time-constants that improves the performance of artificial neural networks in processing temporal data. They argued that traditional neural networks with fixed time constants are limited in their ability to process time-varying data and perform tasks such as sequence learning and temporal credit assignment.

LTC networks are designed to address these limitations by allowing each neuron to adapt its time constant based on the input and context. This adaptability enables the network to process temporal information more effectively and learn long-term dependencies in data.

Paper: https://arxiv.org/pdf/2006.04439
Code: https://github.com/raminmh/liquid_time_constant_networks

Post

1986

All You Need To Know About Apple Intelligence Architecture And Models!!

One key challenge with running llms on device is a balance between compute, performance and model size. Apple Intelligence solves this using small/specialized chunks (Adapters) of the on-device foundation model when needed.

For compute, they engineered a new framework that uses LoRA adapters of rank 16, allowing a merged 2-bit and 4-bit config that yields up to 3.5 bits per weight, achieving the same performance as the uncompressed models.

With the help of an OSS model latency and power analysis tool (Talaria), they were able to optimize the bit rate selection for each operation. This along with activation & embedding quantizations plus efficient key-value caching, achieved up to 30 tokens/sec on iPhone 15 pro.

When the model is prompted (e.g to rewrite an email in the mail app), the app draws from the app intents toolbox which sends the prompt to the adapter specialized for writing, the model responds through the same pipeline with a real-time update of the text to rewrite.

The coolest feature of these models is their ability to adapt and dynamically specialize on user’s everyday activities. For this they adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

For tasks that require more capable models, the arch utilizes server/larger models on a private cloud compute infrastructure that delivers SOTA secured and verifiable privacy experience.

More on the private cloud compute: https://developer.apple.com/videos/play/wwdc2024/102/

View all posts