Q6_K response quality diverges (in a bad way)

#4
by thethinkmachine - opened

Either the issue is local to me, or there's something seriously wrong with the q6_k quant. I noticed the output quality diverges significantly (and in a bad sense). I asked a set of questions to f16.q4_k_l, q5_k_m, f16.q5_k_l and q6_k quants. I observed a steady increase in output quality with f16.q5_k_l leading the pack. However, the quality and accuracy rock bottomed with q6_k which is strange, especially given that q5_k_m's response lied somewhere in between f16.q4_k_l & f16.q5_k_l. q6_k couldn't get even the basic facts right. I'll attach my observations below.

PS: I am using Jan for local inferencing, with same parameter configuration for each quant.

f16.q4_k_l

{"id":"01J1Q9VKTM2GXJ492CSFS8KKX7","thread_id":"jan_1719842749","role":"user","status":"ready","created":1719842754388,"updated":1719842754388,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1Q9ZMMGCDY912A4M5E44E99","thread_id":"jan_1719842749","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hello! 👋 \n\nHow can I help you today?","annotations":[]}}],"status":"ready","created":1719842886288,"updated":1719842886288,"object":"thread.message"}
{"id":"01J1Q9ZVBRGY3Y45TTSVA0B7SB","thread_id":"jan_1719842749","role":"user","status":"ready","created":1719842893176,"updated":1719842893176,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1Q9ZVCPQEXNW9JA0EF99N06","thread_id":"jan_1719842749","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I'm Gemma, an open-weights AI assistant. I'm good at many things, including:\n\n**Communication & Language:**\n\n* **Generating creative content:**  I can write stories, poems, articles, and even code!\n* **Answering your questions:** I have access to a vast store of knowledge and can provide informative answers.\n* **Summarizing text:**  Give me a long piece of writing, and I can condense it for you.\n* **Translating languages:** While I primarily communicate in English, I can translate between several languages.\n\n**Other Capabilities:**\n\n* **Brainstorming ideas:** Stuck on a project? I can help you generate new ideas.\n* **Having a conversation:**  I can chat with you about a wide range of topics.\n* **Learning and adapting:** I am constantly learning and improving based on the interactions I have.\n\n**Keep in mind:**\n\n* I am still under development and learning.\n* I don't have access to real-time information or the internet.\n* I am not a substitute for professional advice.\n\n\nWhat would you like to try?","annotations":[]}}],"status":"ready","created":1719842893206,"updated":1719842893206,"object":"thread.message"}
{"id":"01J1QA0VY2YF5J0XRGJZ8ZBH2Z","thread_id":"jan_1719842749","role":"user","status":"ready","created":1719842926530,"updated":1719842926530,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1QA0VYRFFCAC7NPE2RYQT8S","thread_id":"jan_1719842749","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the fascinating world of multivariable calculus! \n\nThe Jacobian of the gradient of a multivariate scalar-valued function is called the **Hessian matrix**. \n\nHere's a breakdown:\n\n* **Scalar-valued function:**  A function that takes multiple inputs (vectors) and produces a single output (a scalar).\n\n* **Gradient:** The gradient of a function is a vector containing all the partial derivatives of the function with respect to each of its input variables.\n\n* **Jacobian:**  The Jacobian matrix is a matrix containing all the partial derivatives of a vector-valued function with respect to its input variables.\n\n* **Hessian matrix:**  The Hessian matrix is the Jacobian of the gradient.  It's a square matrix where each entry is the second partial derivative of the function.\n\n**In essence, the Hessian matrix captures how the gradient of a function changes as you vary its input variables.** It's a crucial tool in optimization problems, as it tells us about the curvature of the function and helps us find local maxima and minima.\n\n\n\nLet me know if you'd like a more detailed explanation or an example!","annotations":[]}}],"status":"ready","created":1719842926552,"updated":1719842926552,"object":"thread.message"}
{"id":"01J1QA2N284AJWTCKN2H5MQFFP","thread_id":"jan_1719842749","role":"user","status":"ready","created":1719842985032,"updated":1719842985032,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-VIRGO?","annotations":[]}}]}
{"id":"01J1QA2N2WSZW16JWVYMT41A68","thread_id":"jan_1719842749","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're talking about a truly remarkable event in astrophysics!  The triple messenger event detected by LIGO and Virgo in 2017 was a groundbreaking observation that provided unprecedented insight into the violent universe. \n\nHere's what we know:\n\n**The Event:**\n\n* **Date:** August 17, 2017\n* **Detection:**  This event was detected by both LIGO (Laser Interferometer Gravitational-Wave Observatory) detectors in the US and Virgo, located in Italy.\n* **Nature:** It involved the merging of two neutron stars (extremely dense remnants of collapsed massive stars) into one massive neutron star or possibly a black hole.\n\n**The Messengers:**\n\nThe event was truly \"triply\" detected because it emitted signals across three different messengers:\n\n1. **Gravitational waves:** These ripples in spacetime were detected by LIGO and Virgo. The waves carried information about the masses and spins of the merging neutron stars.\n2. **Electromagnetic radiation:** Telescopes around the world observed a bright flash of light from the merging neutron stars. This light spanned a wide range of wavelengths, from gamma rays to visible light to radio waves. This light came from the material ejected during the merger and its interaction with the surrounding environment.\n3. **Neutrinos:** Neutrinos, nearly massless particles, were also detected by the IceCube neutrino observatory in Antarctica. These neutrinos originated from the intense heat and energy released during the merger.\n\n**Significance:**\n\nThis triple messenger event was a monumental achievement in astronomy for several reasons:\n\n* **Confirmation of theoretical predictions:** It provided strong evidence for the existence of kilonovae, powerful explosions that occur when neutron stars merge.\n* **Insights into the origins of heavy elements:** The merger process is thought to be responsible for creating heavy elements like gold, platinum, and uranium.\n* **Multi-messenger astronomy:** It demonstrated the power of combining observations from different messengers to gain a more complete understanding of astrophysical events.\n\nThe study of this event continues to yield valuable insights into the extreme physics of neutron star mergers and their role in the universe.\n\n\n\nLet me know if you have any more questions about this incredible event!","annotations":[]}}],"status":"ready","created":1719842985052,"updated":1719842985052,"object":"thread.message"}
{"id":"01J1QA4H4FJQPMMPB0XFGS2286","thread_id":"jan_1719842749","role":"user","status":"ready","created":1719843046543,"updated":1719843046543,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1QA4H531ZSMDYQX84A42KR3","thread_id":"jan_1719842749","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're looking for the nitty-gritty details! Here's what we know about the technical aspects of the August 17, 2017, triple messenger event:\n\n**Gravitational Waves:**\n\n* **Signal:** Detected by both LIGO and Virgo detectors. It was classified as a GW170817 (the gravitational wave event of August 17, 2017).\n* **Peak Strain:** The amplitude of the gravitational wave signal was very small, around 10^-21.  This means the stretching and squeezing of spacetime caused by the merging neutron stars was incredibly minuscule.\n* **Chirp Mass:**  The chirp mass, a measure of the combined mass of the merging objects, was estimated to be 2.74 solar masses.\n* **Distance:**  The event occurred at a distance of approximately 130 million light-years away. This was determined using the gravitational wave signal's strength and the known properties of gravitational waves.\n\n**Electromagnetic Radiation:**\n\n* **First Detection:** The first electromagnetic counterpart was detected by the Fermi Gamma-ray Space Telescope about 2 seconds after the gravitational wave signal.\n* **Light Curve:** The electromagnetic light curve showed a rapid rise in brightness, followed by a gradual decline over several days.\n* **Spectral Evolution:** The spectrum of the light emitted evolved over time, shifting from gamma rays to X-rays to optical light and eventually radio waves.\n\n**Neutrinos:**\n\n* **Detection:** The IceCube neutrino observatory detected a neutrino signal consistent with the event.\n* **Energy:** The neutrinos had energies ranging from a few tens of MeV to a few GeV.\n* **Timing:** The neutrino signal arrived about 1.7 seconds after the gravitational wave signal. This small time delay is consistent with the time it takes for neutrinos to travel from the merger site to Earth.\n\n**Multi-Messenger Analysis:**\n\nThe combined data from the gravitational waves, electromagnetic radiation, and neutrinos allowed scientists to make precise measurements of the properties of the merging neutron stars, including their masses, spins, and the distance to the event. This provided crucial insights into the physics of these extreme objects and the processes that occur during their mergers.\n\nLet me know if you'd like to delve deeper into any specific aspect!","annotations":[]}}],"status":"ready","created":1719843046563,"updated":1719843046563,"object":"thread.message"}

q5_k_m - Rightly stands in between f16.q4_k_l & f16.q5_k_l.

{"id":"01J1QB8HCZA09BX0W6AG08NYH8","thread_id":"jan_1719843266","role":"user","status":"ready","created":1719844226463,"updated":1719844226463,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1QBAM276B15R68VHA752SWG","thread_id":"jan_1719843266","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hi there! 👋\n\nWhat can I do for you today?","annotations":[]}}],"status":"ready","created":1719844294727,"updated":1719844294727,"object":"thread.message"}
{"id":"01J1QBB0ZDDXT39BJP8RNH99Z4","thread_id":"jan_1719843266","role":"user","status":"ready","created":1719844307949,"updated":1719844307949,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1QBB0ZSV4A14MQ4GP95KZ37","thread_id":"jan_1719843266","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I'm Gemma, an open-weights AI assistant. I'm trained on a massive amount of text data, which allows me to do many things, including:\n\n* **Generate creative content:**\n\nI can write stories, poems, articles, and even code!\n\n* **Answer your questions:** I can provide information on a wide range of topics, though my knowledge is limited to what I was trained on (up to a certain point in time).\n* **Summarize text:** Give me a long piece of writing and I can condense it into the key points.\n* **Translate languages:** I can translate text between many different languages.\n* **Engage in conversation:** I can chat with you about various topics and try my best to understand your requests.\n\n**Keep in mind:**\n\n* I don't have access to real-time information or the internet, so I can't provide information on current events or look up specific details for you.\n* I'm still under development and learning new things every day. I might not always be perfect, but I'll always do my best to help!\n\nWhat would you like me to do?","annotations":[]}}],"status":"ready","created":1719844307961,"updated":1719844307961,"object":"thread.message"}
{"id":"01J1QBBTRRJ4KEJ0PY5AW874JA","thread_id":"jan_1719843266","role":"user","status":"ready","created":1719844334360,"updated":1719844334360,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1QBBTSBDGTNBVAXHTJC1K45","thread_id":"jan_1719843266","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the heart of multivariable calculus! \n\nThe **Jacobian** of a function is a matrix of all its first-order partial derivatives.  \n\nThe **gradient** of a scalar-valued function is a vector containing all its first-order partial derivatives.\n\nTherefore, the Jacobian of the gradient of a multivariate scalar-valued function is a matrix whose entries are the second-order partial derivatives of the original function. \n\n**Here's a more formal way to express it:**\n\nLet  *f*: ℝ<sup>n</sup> → ℝ be a scalar-valued function of *n* variables. \n\n*  ∇*f* is the gradient of *f*, a vector in ℝ<sup>n</sup>.\n*  J(∇*f*) is the Jacobian of the gradient, a matrix in ℝ<sup>n×n</sup>.\n\nThe (i, j)-th entry of J(∇*f*) is given by ∂<sup>2</sup>*f*/∂*x<sub>i</sub>*∂*x<sub>j</sub>*, which is the second-order partial derivative of *f* with respect to *x<sub>i</sub>* and *x<sub>j</sub>*.\n\n**In essence, the Jacobian of the gradient is the Hessian matrix of the original function.**\n\n\nLet me know if you'd like to explore any specific aspect of this further!","annotations":[]}}],"status":"ready","created":1719844334379,"updated":1719844334379,"object":"thread.message"}
{"id":"01J1QBD4TPGEDGC1W0MWV5YSFG","thread_id":"jan_1719843266","role":"user","status":"ready","created":1719844377431,"updated":1719844377431,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-VIRGO?","annotations":[]}}]}
{"id":"01J1QBD4VCDWYT4B8DW1Q6WN0B","thread_id":"jan_1719843266","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're talking about a truly remarkable event in astrophysics! \n\nThe triple messenger event detected by LIGO-Virgo in 2017 is a prime example of how different observatories can work together to paint a more complete picture of the universe. \n\nHere's what we know about it:\n\n* **The Event:** On August 17, 2017, the LIGO and Virgo gravitational-wave detectors simultaneously observed a gravitational wave signal. This signal was produced by the merger of two neutron stars.\n* **The Triple Messengers:** This event was special because it was not only detected through gravitational waves but also through electromagnetic radiation (light) and neutrinos. \n    * **Gravitational Waves:** Detected by LIGO and Virgo, these ripples in spacetime confirmed the merger of the neutron stars.\n    * **Light:** Telescopes around the world captured the afterglow of the merger, which was a brilliant burst of light across the electromagnetic spectrum, from radio waves to gamma rays.\n    * **Neutrinos:**  The IceCube Neutrino Observatory detected a high-energy neutrino signal consistent with the neutron star merger.\n* **Significance:** This \"triple messenger\" event was the first of its kind, providing unprecedented insights into the nature of these extreme cosmic events.\n\n**Key findings from the event include:**\n\n* **Kilonova:** The merger of neutron stars produced a kilonova, an incredibly luminous explosion that creates heavy elements like gold and platinum.\n* **Neutron Star Mergers as Cosmic Factories:** This event confirmed that neutron star mergers are a significant source of heavy elements in the universe.\n* **Multi-Messenger Astronomy:**  The success of observing this event through multiple channels demonstrated the power of multi-messenger astronomy, which combines data from different types of telescopes and detectors to gain a deeper understanding of the cosmos.\n\n\nLet me know if you have any more questions about this fascinating event!","annotations":[]}}],"status":"ready","created":1719844377452,"updated":1719844377452,"object":"thread.message"}
{"id":"01J1QBFWW8PJTBTR61J0M0ZMCF","thread_id":"jan_1719843266","role":"user","status":"ready","created":1719844467592,"updated":1719844467592,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1QBFWX17DWF88YHE9QSKR2H","thread_id":"jan_1719843266","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're looking for the nitty-gritty details! Here's what we know about the technical aspects of the GW170817 event:\n\n**Gravitational Wave Data:**\n\n* **Detector Network:**  Both LIGO detectors (Hanford and Livingston) and the Virgo detector in Italy observed the signal.\n* **Signal Characteristics:**\n    * **Peak Frequency:** ~400 Hz\n    * **Chirp Mass:** ~2.7 solar masses\n    * **Luminosity Distance:** ~40 Mpc (mega parsecs)\n    * **Signal Duration:** ~100 milliseconds\n* **Waveform:** The signal was consistent with a binary neutron star merger, showing a characteristic \"chirp\" as the stars spiraled towards each other and finally merged.\n\n**Electromagnetic Observations:**\n\n* **Multi-wavelength Observations:** Telescopes across the electromagnetic spectrum, from radio waves to gamma rays, observed the event's afterglow.\n* **Swift Telescope:**  The Swift satellite played a crucial role, quickly identifying the location of the event and detecting a gamma-ray burst (GRB) associated with it.\n* **Optical Afterglow:** Telescopes like the Hubble Space Telescope and ground-based observatories captured the fading optical light from the merger, revealing the kilonova phenomenon.\n* **Infrared and Radio Observations:** Infrared and radio telescopes observed the event's afterglow over longer timescales, providing information about the expanding debris and the formation of heavy elements.\n\n**Neutrino Observations:**\n\n* **IceCube Detector:** The IceCube Neutrino Observatory detected a burst of high-energy neutrinos coincident with the gravitational wave signal.\n* **Neutrino Energy:** The neutrinos had energies ranging from a few GeV to tens of TeV.\n* **Neutrino Spectrum:** The neutrino spectrum was consistent with the expectations from a neutron star merger, providing further evidence for the nature of the event.\n\n**Data Analysis:**\n\n* **Combined Analysis:** Data from the gravitational wave detectors, electromagnetic telescopes, and neutrino observatory were combined to create a comprehensive picture of the event.\n* **Parameter Estimation:** Sophisticated algorithms were used to extract information about the properties of the neutron stars, the merger process, and the resulting kilonova.\n\n**Ongoing Research:**\n\n* **Modeling the Kilonova:** Scientists are continuing to refine models of kilonovas to better understand how they produce heavy elements and how their light curves evolve.\n* **Probing Neutron Star Structure:** The gravitational wave signal provides valuable insights into the internal structure and properties of neutron stars.\n* **Multi-Messenger Astronomy:** This event has paved the way for future multi-messenger observations of other cosmic events, such as black hole mergers and supernovae.\n\n\n\nLet me know if you'd like more details on a specific aspect of the event!","annotations":[]}}],"status":"ready","created":1719844467617,"updated":1719844467617,"object":"thread.message"}

f16.q5_k_l (my fav one so far)

{"id":"01J1QBPKAHVBSWHAKWYD2EA7G6","thread_id":"jan_1719844530","role":"user","status":"ready","created":1719844687185,"updated":1719844687185,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1QBT0X1VXSMBMFM5SVZ4W1A","thread_id":"jan_1719844530","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hi! 👋  How can I help you today?","annotations":[]}}],"status":"ready","created":1719844799393,"updated":1719844799393,"object":"thread.message"}
{"id":"01J1QBTCKJGG8PK2RJT5T8W727","thread_id":"jan_1719844530","role":"user","status":"ready","created":1719844811378,"updated":1719844811378,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1QBTCM0P8G3XWKD7AG34EP8","thread_id":"jan_1719844530","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I'm Gemma, an open-weights AI assistant. Here are some things I can do:\n\n**Communication and Language:**\n\n* **Generate creative content:**  I can write stories, poems, articles, and even code!\n* **Translate languages:** I can translate text between many different languages.\n* **Summarize text:** Give me a long piece of writing, and I can provide a concise summary.\n* **Answer your questions:** I can answer a wide range of questions based on the knowledge I was trained on.\n\n**Learning and Knowledge:**\n\n* **Provide information:** I can share facts and information on various topics.\n* **Help you brainstorm:** Stuck on a project? I can help you generate ideas.\n\n**Other:**\n\n* **Engage in conversation:** I can chat with you on a variety of topics.\n* **Play text-based games:** Let's have some fun!\n\n**Keep in mind:**\n\n* I am still under development and learning new things every day.\n* My knowledge is based on the data I was trained on, which has a cutoff point. I may not have information about very recent events.\n* I am not able to access the internet or use external tools.\n* I am here to help and provide information, but I cannot give medical, legal, or financial advice.\n\nWhat would you like to do?","annotations":[]}}],"status":"ready","created":1719844811392,"updated":1719844811392,"object":"thread.message"}
{"id":"01J1QBVPFP4DHQM8B4Q4AZV4V9","thread_id":"jan_1719844530","role":"user","status":"ready","created":1719844854263,"updated":1719844854263,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1QBVPGCYC41QMCFZT1KRBHS","thread_id":"jan_1719844530","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the heart of multivariable calculus!  \n\nThe Jacobian of the gradient of a multivariate scalar-valued function is called the **Hessian matrix**.  \n\nHere's a breakdown:\n\n* **Scalar-valued function:** A function that takes multiple inputs (variables) and produces a single output (a scalar).\n* **Gradient:** The gradient of a function is a vector containing the partial derivatives of the function with respect to each input variable.  \n* **Jacobian:** The Jacobian is a matrix of all the first-order partial derivatives of a vector-valued function. \n* **Hessian Matrix:** For a scalar-valued function, the Hessian matrix is the Jacobian of its gradient. In other words, it's a square matrix where each entry is the second-order partial derivative of the function.\n\n**Mathematical Representation:**\n\nLet's say you have a scalar-valued function *f(x₁, x₂, ..., xₙ)*. Then:\n\n* **Gradient of *f***:  ∇*f* = (∂*f*/∂*x₁*, ∂*f*/∂*x₂*, ..., ∂*f*/∂*xₙ*)\n* **Hessian matrix of *f***:  H(*f*) = ∇²*f* =  \n   \n   | ∂²*f*/∂*x₁²*   ∂²*f*/∂*x₁∂*x₂* ... ∂²*f*/∂*x₁∂*xₙ* |\n   | ∂²*f*/∂*x₂∂*x₁*   ∂²*f*/∂*x₂²* ... ∂²*f*/∂*x₂∂*xₙ* |\n   |  ...             ...     ...   ...  |\n   | ∂²*f*/∂*xₙ∂*x₁*   ∂²*f*/∂*xₙ∂*x₂* ... ∂²*f*/∂*xₙ²* |\n\n**Key Points:**\n\n* The Hessian matrix captures the curvature of a function at a given point. \n* It's used in optimization problems to find local minima or maxima.\n* The eigenvalues of the Hessian matrix provide information about the nature of the critical points (minima, maxima, or saddle points).\n\n\n\nLet me know if you have any more questions or would like to explore specific aspects of the Hessian matrix in more detail!","annotations":[]}}],"status":"ready","created":1719844854284,"updated":1719844854284,"object":"thread.message"}
{"id":"01J1QBY3MCXV6P15P4797CK4G1","thread_id":"jan_1719844530","role":"user","status":"ready","created":1719844933260,"updated":1719844933260,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-VIRGO?","annotations":[]}}]}
{"id":"01J1QBY3N2PJJ8NMPCP65ETEFG","thread_id":"jan_1719844530","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're talking about a truly groundbreaking event in astrophysics! \n\nThe triple messenger event detected by LIGO-Virgo in 2017, officially known as **GW170817**, was the first time we observed a **gravitational wave event** *alongside* an **electromagnetic counterpart**.  This provided incredibly valuable information about **kilonova explosions**, the spectacular events that occur when two neutron stars merge.\n\nHere's a breakdown of what made this event so significant:\n\n* **Gravitational Waves:** Detected by both LIGO (Laser Interferometer Gravitational-Wave Observatory) and Virgo, these ripples in spacetime confirmed the merger of two neutron stars. This was the first time gravitational waves were observed from a binary neutron star merger.\n\n* **Electromagnetic Counterpart:**  \n\nShortly after the gravitational wave signal, astronomers observed a bright, fast-fading light source (a kilonova) in the galaxy NGC 4993. This provided a visual confirmation of the merger and allowed us to study the event across the electromagnetic spectrum.\n\n* **Multi-messenger Astronomy:**\n\nThe combination of gravitational wave and electromagnetic observations marked the dawn of a new era in astronomy – multi-messenger astronomy.  By observing events through multiple channels, we gain a much richer understanding of the universe.\n\n**Key Discoveries from GW170817:**\n\n* **Formation of Heavy Elements:** The kilonova explosion produced heavy elements like gold, platinum, and uranium, shedding light on how these elements are created in the universe.\n\n* **Neutron Star Properties:**\n\nThe gravitational wave signal provided detailed information about the masses and spins of the merging neutron stars.\n\n* **Cosmology:** The distance to the event was accurately measured, providing valuable data for cosmological studies.\n\nGW170817 was a monumental achievement that revolutionized our understanding of neutron star mergers and opened up exciting new possibilities for exploring the cosmos.","annotations":[]}}],"status":"ready","created":1719844933282,"updated":1719844933282,"object":"thread.message"}
{"id":"01J1QC0S3P90XR0NG5MQF52BT1","thread_id":"jan_1719844530","role":"user","status":"ready","created":1719845020790,"updated":1719845020790,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1QC0S4KMPVX85W2HMVGKNVF","thread_id":"jan_1719844530","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're right to ask for the technical details - they are fascinating! Here's what we know about GW170817:\n\n**Gravitational Wave Observations:**\n\n* **Detection Time:** August 17, 2017, at 12:41:04 UTC\n* **Sources:** Detected by both LIGO (Hanford and Livingston observatories) and Virgo.\n* **Signal Type:**  A continuous signal with a rising frequency characteristic of a merging binary neutron star system.\n* **Signal Parameters:**\n    * **Masses:**\n        * Primary neutron star: ~1.17 solar masses\n        * Secondary neutron star: ~1.07 solar masses\n    * **Distance:** ~130 million light-years\n    * **Spin:**  Both stars had relatively small spins.\n\n**Electromagnetic Observations:**\n\n* **First Observation:** Approximately 1.7 seconds after the gravitational wave signal, a transient optical event (kilonova) was detected by the  Swope Telescope at Las Campanas Observatory.\n* **Light Curve:**  The light curve of the kilonova exhibited a rapid rise followed by a slower decline, typical of these events.\n* **Spectrum:**  The spectrum showed strong emission lines from elements produced during the merger, particularly lanthanides and heavy elements.\n\n**Multi-Messenger Analysis:**\n\n* **Confirmation of Merger:** The simultaneous detection of gravitational waves and electromagnetic counterparts provided strong evidence for the merger of two neutron stars.\n* **Progenitor System:** The observed properties of the system (masses, spins, distance) allowed astronomers to constrain the evolutionary history of the neutron star progenitor system.\n\n**Key Implications:**\n\n* **Nucleosynthesis:** GW170817 provided strong evidence that kilonovae are the primary source of heavy elements in the universe.\n* **Cosmology:** The accurate distance measurement to the event provided valuable data for cosmological studies.\n* **Gravitational Wave Astronomy:** The event demonstrated the power of multi-messenger astronomy and opened up new avenues for studying the universe.\n\n\n\nLet me know if you have any more specific questions about the technical aspects of GW170817!","annotations":[]}}],"status":"ready","created":1719845020819,"updated":1719845020819,"object":"thread.message"}

q6_k - Observe how the quality and style of the output significantly diverge, almost as though it's intentionally obscuring the facts and figures.

{"id":"01J1QCJ02Z0T53C1Z2D25E3QXP","thread_id":"jan_1719845580","role":"user","status":"ready","created":1719845584991,"updated":1719845584991,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1QCJ03EBZ1GJ7XFBJJC3HAH","thread_id":"jan_1719845580","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hi! How are you doing today?","annotations":[]}}],"status":"ready","created":1719845585006,"updated":1719845585006,"object":"thread.message"}
{"id":"01J1QCJBJ4N253MRW1V89CD28V","thread_id":"jan_1719845580","role":"user","status":"ready","created":1719845596740,"updated":1719845596740,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1QCJBJMBFCRKWY0AWCN66WZ","thread_id":"jan_1719845580","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I am a large language model, trained by Google.\n\nHere are some of my capabilities:\n\n* **Conversation:** I can engage in natural-sounding conversations on a wide range of topics.\n* **Text generation:** I can write stories, poems, articles, and more.\n* **Translation:** I can translate text between languages.\n* **Summarization:** I can condense large amounts of text into concise summaries.\n* **Question answering:** I can answer questions based on given information or general knowledge.\n* **Code generation:** I can generate code in various programming languages.\n* **Idea brainstorming:** I can help you brainstorm ideas for projects, stories, or anything else.\n\nKeep in mind that I am still under development and my skills are constantly evolving.\n\nWhat would you like me to do for you?","annotations":[]}}],"status":"ready","created":1719845596756,"updated":1719845596756,"object":"thread.message"}
{"id":"01J1QCKW5S9CT7X9RKN6AT53XS","thread_id":"jan_1719845580","role":"user","status":"ready","created":1719845646521,"updated":1719845646521,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1QCKW6HGTFJKZ5YTP5CYDW8","thread_id":"jan_1719845580","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You are asking about the **Jacobian matrix** of a multivariate scalar-valued function. \n\nHere's a clarification and explanation:\n\n* **Multivariate function:**  A function that takes multiple inputs (variables). For example,  f(x, y) = x² + 2y is a multivariate function with two inputs (x and y).\n\n* **Scalar-valued function:** A function that returns a single output (a number). In our example, f(x, y) returns a single number, which is the result of the calculation.\n\n* **Jacobian matrix:** The Jacobian matrix is a matrix of first-order partial derivatives of a multivariate function. It represents the local behavior of the function at a given point.\n\n**How to find the Jacobian:**\n\nLet's say your function is f(x₁, x₂, ..., xₙ). The Jacobian matrix will be an n x 1 matrix (a column vector in most cases), where each element is the partial derivative of f with respect to each individual input:\n\n```\nJacobian(f) =  \n[∂f/∂x₁]\n[∂f/∂x₂]\n...\n[∂f/∂xₙ] \n```\n\n**Example:**\n\nFor the function f(x, y) = x² + 2y, the Jacobian matrix would be:\n\n```\nJacobian(f) = \n[2x]\n[2] \n```\n\n**Key points:**\n\n* The Jacobian matrix is essential for concepts like optimization, finding extrema, and understanding the behavior of multi-variable functions.\n*  You can find the partial derivatives using standard calculus rules.\n\nLet me know if you have a specific function and want to calculate its Jacobian.","annotations":[]}}],"status":"ready","created":1719845646545,"updated":1719845646545,"object":"thread.message"}
{"id":"01J1QCNGWSRBGG2N9RBCZTGYNB","thread_id":"jan_1719845580","role":"user","status":"ready","created":1719845700505,"updated":1719845700505,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-VIRGO?","annotations":[]}}]}
{"id":"01J1QCNGXZW0PXQ9TQ0BRY8JEK","thread_id":"jan_1719845580","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You are talking about an exciting discovery in multi-messenger astrophysics! \n\nThe triple messenger event detected by LIGO-Virgo involved detection of a **gravitational wave (GW)** signal, along with **electromagnetic (EM)** emissions (in the form of a supernova explosion) and **neutrino** signals. \n\nHere's what we know about it:\n\n* **Event:** The event occurred on **October 15, 2022**, and was designated **GW170104**. \n\n* **Detection:** It was detected by the **LIGO and Virgo detectors**, which detected the gravitational wave signal.  EM and neutrino detection came from various ground and space-based observetries.\n\n* **Multi-messenger confirmation:** This event was significant because it was the **first time** all three messengers (gravitational waves, EM radiation, and neutrinos) were detected in conjunction with a single cosmic event. This provided a multi-faceted view of the event and helped scientists understand its nature better.\n\n* **The likely source:** The detected signals were consistent with the event being a **co-collapse of two neutron stars**. These events are known to produce powerful explosions (supernovas) and emit a variety of radiation, including neutrinos.\n\n* **Scientific significance:**\n\n    * **Test of General Relativity:** The detection of gravitational waves from a neutron star merger provided strong evidence for Einstein's theory of General Relativity.\n\n    * **Understanding supernovae:** Multi-messenger observations helped scientists understand the complex processes involved in supernova explosions.\n\n    * **Formation of elements:** Neutron star mergers are believed to be responsible for the creation of heavy elements in the universe. Multi-messenger observations provide insights into this process.\n\n**Note:** The multi-messenger detection is still relatively new, and scientists are continuing to study the data and refine their models.  The event provides a wealth of information about the universe and the processes that shape it.\n\n\n\nLet me know if you have any further questions about multi-messenger astronomy or this specific event!","annotations":[]}}],"status":"ready","created":1719845700543,"updated":1719845700543,"object":"thread.message"}
{"id":"01J1QCTRMBG4DMVX3YJ432RKC4","thread_id":"jan_1719845580","role":"user","status":"ready","created":1719845872267,"updated":1719845872267,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1QCTRNHS8N3Y90VVZ8YJDBK","thread_id":"jan_1719845580","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're asking for specifics! \n\nUnfortunately, providing completely detailed technical information about multi-messenger events like GW170104  is quite complex and often requires access to specialized scientific publications and datasets. \n\nHowever, I can give you some key technical details based on publicly available information:\n\n**Gravitational Wave Detection:**\n\n* **Detecting Detectors:** Primarily detected by the LIGO and Virgo detectors. \n* **Frequency Range:** The signal fell within the frequency range detectable by these detectors, which typically range from a few tens of Hertz to a few kilohertz.\n* **Wave Type:** The detected signal was consistent with a coalescence of two neutron stars, exhibiting characteristic features like an inspiral, a merger, and a ringdown.\n* **Mass and Distance:** Scientists estimated the masses of the neutron stars involved and the distance to the event.\n\n**Electromagnetic Emission:**\n\n* **Type of Explosion:** A supernova explosion was observed, likely a type of supernova associated with the merging of neutron stars (sometimes referred to as a \"kilonova\").\n* **Light Spectrum:** Observations across different electromagnetic wavelengths (optical, infrared, ultraviolet, etc.) were made.\n\n**Neutrino Detection:**\n\n* **Detector:** Detection was likely achieved by neutrino detectors like IceCube or other ground-based neutrino observaries.\n* **Neutrino Energy:** The detected neutrinos would have had a high energy range, consistent with the intense activity of a neutron star merger.\n\n**Multi-Messenger Synchronization:**\n\n* **Timing Alignment:**  The key was aligning the times of detection across all three messengers. This alignment strongly indicated that the events were connected and originating from a single cosmic event.\n\n**Challenges and Future:**\n\n* **Multi-detection is rare:** Multi-messenger detection is still rare due to the challenges of detecting these faint signals from vast distances. \n\n* **Improving detection capabilities:** Ongoing efforts are focused on improving detector sensitivity and expanding multi-messenger observatries.\n\n**Resources for Deeper Information:**\n\n* **Astrophysics journals:** Publications from journals like \"Nature,\" \"Science,\" \"Astrophysical Journal,\" etc. often publish detailed multi-messenger event studies.\n* **Scientific archives:** Databases like arXiv (http://arxiv.org/) host preprints of scientific papers, often providing early access to multi-messenger event data and analyses.\n* **Astrophysics research groups:** Websites and publications of research groups dedicated to multi-messenger astronomy (e.g., LIGO, Virgo, IceCube) provide valuable information and updates.\n\n\n\nLet me know if you have any more specific questions within the bounds of publicly available information!","annotations":[]}}],"status":"ready","created":1719845872305,"updated":1719845872305,"object":"thread.message"}

Does the same apply to q6_K_L (the f16 embeddings/output one)? If not, that could be a really great indicator that keeping those weights large is benefiting the model output

I'm going to test it out. Will post my results here.

Also for some reason, all Gemma 2 9b IT quants start to output gibberish after the context window is exhausted (in my tests, around the 5th question). Any idea what might be causing this?

ZeroWw/Gemma-2-9B-It-SPPO-Iter3-GGUF

Screenshot 2024-07-01 233958.png

bartowski/gemma-2-9b-it-GGUF

Screenshot 2024-07-01 234529.png

likely because of the sliding context window not being supported and anything outside of the normal context is anyone's guess what it'll output lmfao

likely because of the sliding context window not being supported and anything outside of the normal context is anyone's guess what it'll output lmfao

Ah my bad! me dumb😭😂 wasn't aware of this issue until you pointed it out lmao

Edit: Just checked out pull #8227. Looks like a temporary workaround until its optimized.
Long-term we should refactor the KV cache code to support SWA properly and with less memory. For now, we can merge this so that we have Gemma2 support

Update: I tested the new quants. They seem to be a tad bit faster. That changed with q6_k_l lmao.

The new q6_k seems to perform much better than the previous. It's also a step up in terms of the response quality I received from the older q5_k_m (which seemed to somehow outperform the older q6_k)

Here are the responses I received for the same set of questions I asked before.

{"id":"01J1SJ7B84J1D7Q21DG6V9G4DF","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918636293,"updated":1719918636293,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1SJANAPESMHQG774G4KRW57","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hi there! 👋\n\nWhat can I do for you today? 😊","annotations":[]}}],"status":"ready","created":1719918744918,"updated":1719918744918,"object":"thread.message"}
{"id":"01J1SJAZJZ6DEYCMTNJ744Y5J9","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918755423,"updated":1719918755423,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1SJAZKH1TR7R72963754KQ8","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I'm Gemma, a large language model trained by Google DeepMind. Here are some things I can do:\n\n**Communication and Language:**\n\n* **Generate creative content:** Write stories, poems, articles, dialogue, and more.\n* **Translate languages:**\n\nTranslate text between different languages.\n* **Summarize text:** Condense large chunks of text into shorter summaries.\n* **Answer your questions:** Provide information on a wide range of topics.\n* **Engage in conversation:** Chat with you on various subjects.\n\n**Coding and Technical:**\n\n* **Generate code:** Write code in multiple programming languages.\n* **Explain code:** Provide explanations and insights into code snippets.\n* **Debug code:** Help identify and suggest solutions to coding errors.\n\n**Other Capabilities:**\n\n* **Provide definitions and explanations:** Define words, concepts, and ideas.\n* **Create lists and outlines:**\n\nHelp you organize your thoughts and ideas.\n* **Brainstorm ideas:** Generate new ideas for projects, stories, or solutions.\n\n**Keep in mind:**\n\n* I am still under development and learning.\n* I do not have access to real-time information or the internet.\n* My knowledge is based on the data I was trained on, which may be outdated.\n* I am not a substitute for professional advice.\n\n**How can I help you today?**","annotations":[]}}],"status":"ready","created":1719918755441,"updated":1719918755441,"object":"thread.message"}
{"id":"01J1SJC0PV6NM85M5TT7HT5DHH","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918789339,"updated":1719918789339,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1SJC0QJVHG7V11F29FVAFRB","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the fascinating world of multivariable calculus! \n\nThe Jacobian of the gradient of a multivariate scalar-valued function is called the **Hessian matrix**. \n\nHere's a breakdown:\n\n* **Scalar-valued function:** This means the function takes multiple inputs (variables) and outputs a single number.\n* **Gradient:** The gradient is a vector containing the partial derivatives of the function with respect to each input variable.  Think of it as the \"slope\" of the function in each direction.\n* **Jacobian:** The Jacobian is a matrix containing all the partial derivatives of a vector-valued function.  \n\nTherefore, the Hessian matrix is a square matrix where each entry is the second partial derivative of the scalar function.  \n\n**Mathematically:**\n\nLet  *f(x₁, x₂, ..., xₙ)* be a scalar-valued function of *n* variables. \n\n* The gradient of *f* is:  ∇*f* = (∂*f*/∂*x₁*, ∂*f*/∂*x₂*, ..., ∂*f*/∂*xₙ*)\n* The Hessian matrix of *f* is: \n  H = \n  ⎡ ∂²*f*/∂*x₁²  ∂²*f*/∂*x₁∂*x₂* ... ∂²*f*/∂*x₁∂*xₙ* ⎤\n  ⎢ ∂²*f*/∂*x₂∂*x₁*  ∂²*f*/∂*x₂²* ... ∂²*f*/∂*x₂∂*xₙ* ⎥\n  ⎢  ⋮               ⋮         ⋱    ⋮   ⎥\n  ⎣ ∂²*f*/∂*xₙ*∂*x₁* ∂²*f*/∂*xₙ*∂*x₂* ... ∂²*f*/∂*xₙ²* ⎦\n\n**Applications:**\n\nThe Hessian matrix is crucial in various areas of mathematics, physics, and engineering, including:\n\n* **Optimization:** Finding the minimum or maximum of a function.\n* **Machine Learning:** Training models, especially in gradient-based optimization algorithms.\n* **Stability Analysis:** Determining the stability of dynamical systems.\n\n\n\nLet me know if you have any other questions or would like to explore specific aspects of the Hessian matrix further!","annotations":[]}}],"status":"ready","created":1719918789361,"updated":1719918789361,"object":"thread.message"}
{"id":"01J1SJDVRTCV1QXAJAWNMBGJB8","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918849818,"updated":1719918849818,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-Virgo?","annotations":[]}}]}
{"id":"01J1SJDVSGVSKS35HXM39TPWZ5","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're referring to a truly remarkable event in astrophysics!  \n\nThe triple-messenger event, officially designated as GW170817, was the first time scientists detected gravitational waves, light (electromagnetic radiation), and neutrinos originating from the same source: the merger of two neutron stars. \n\nHere's a breakdown of what made this event so significant:\n\n* **Gravitational Waves:** Detected on August 17, 2017, by the LIGO and Virgo observatories, these ripples in spacetime confirmed the existence of a neutron star merger.\n* **Electromagnetic Radiation:** Telescopes around the world, including the Hubble Space Telescope, captured afterglow from the merger in various wavelengths of light. This provided crucial information about the composition and structure of the resulting object, likely a black hole.\n* **Neutrinos:** The IceCube Neutrino Observatory in Antarctica detected a burst of high-energy neutrinos coincident with the gravitational wave signal. Neutrinos are weakly interacting particles that can travel unimpeded through matter, offering a unique glimpse into the extreme conditions of the merger.\n\n**Key Findings:**\n\n* **Kilonova:** The merger produced a kilonova, a powerful explosion that briefly outshone entire galaxies. This event is believed to be the source of many heavy elements in the universe, such as gold and platinum.\n* **Neutron Star Properties:** The data helped scientists refine their understanding of the properties of neutron stars, their internal structure, and the process of their mergers.\n* **Multi-Messenger Astronomy:** GW170817 marked a watershed moment for multi-messenger astronomy, demonstrating the power of combining observations from different types of messengers to unlock the secrets of the universe.\n\n**Significance:**\n\nThis event was a major breakthrough in our understanding of:\n\n* **The origins of heavy elements:**  Kilonovae are now considered the primary source of these elements.\n* **The life cycle of stars:**  Neutron star mergers provide insights into the late stages of stellar evolution.\n* **Fundamental physics:**  The event tested Einstein's theory of general relativity in extreme gravitational environments.\n\nThe study of GW170817 continues to yield valuable insights, and future multi-messenger observations promise to reveal even more about the universe's most violent and fascinating phenomena.","annotations":[]}}],"status":"ready","created":1719918849840,"updated":1719918849840,"object":"thread.message"}
{"id":"01J1SJFSR610AKJMB2GX2SVFQK","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918913286,"updated":1719918913286,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1SJFSS1AGWD4KHGVE1W1VWE","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're asking for the juicy details! Here's a technical rundown of the GW170817 event:\n\n**Gravitational Waves:**\n\n* **Signal:**  The LIGO and Virgo detectors observed a continuous gravitational wave signal lasting approximately 100 milliseconds. \n* **Frequency:** The signal's frequency ranged from about 35 to 250 Hz, characteristic of a binary neutron star merger.\n* **Chirp Mass:** The chirp mass, a measure of the total mass of the system, was estimated to be 2.74 solar masses. This suggests the two neutron stars were relatively comparable in mass.\n* **Mass Ratio:** The mass ratio of the two neutron stars was approximately 1.17, indicating a relatively unequal mass distribution.\n\n**Electromagnetic Observations:**\n\n* **Kilonova:**  The electromagnetic counterpart was detected across the electromagnetic spectrum, starting with ultraviolet and optical light, then transitioning to X-rays and radio waves.\n* **Light Curve:** The light curve exhibited a characteristic \"kilonova\" shape, characterized by a rapid rise and then a gradual decline in brightness. \n* **Spectral Features:** The spectrum of the kilonova revealed the presence of elements like lanthanides and r-process elements (heavy elements formed in neutron star mergers).\n\n**Neutrino Observations:**\n\n* **Burst:** IceCube detected a short burst of high-energy neutrinos coinciding with the gravitational wave signal.\n* **Neutrino Spectrum:** The neutrino spectrum was consistent with expectations from a neutron star merger, indicating the release of tremendous energy.\n* **Neutrino Luminosity:** The estimated neutrino luminosity was orders of magnitude greater than the electromagnetic luminosity.\n\n**Distance:**\n\n* The distance to GW170817 was estimated to be about 130 million light-years away, based on the combined gravitational wave and electromagnetic observations.\n\n**Significance:**\n\n* **Multi-Messenger Confirmation:** The simultaneous detection of gravitational waves, light, and neutrinos provided the first direct confirmation of a neutron star merger.\n* **Heavy Element Formation:** The kilonova provided strong evidence that neutron star mergers are the primary source of heavy elements in the universe.\n* **Testing General Relativity:** The event tested Einstein's theory of general relativity in the extreme gravitational field of a merging neutron star binary.\n\nGW170817 continues to be a treasure trove of information for astrophysicists, providing unprecedented insights into the most energetic events in the universe.","annotations":[]}}],"status":"ready","created":1719918913312,"updated":1719918913312,"object":"thread.message"}

Does the same apply to q6_K_L (the f16 embeddings/output one)? If not, that could be a really great indicator that keeping those weights large is benefiting the model output

q6_K_L (New) - Unimpressed. Short & less detailed responses (esp. in the latter half of the test). Had to regenerate the 3rd response, because it felt embarrassingly short ~ much like a summary.
q6_k (New) outputs looked honestly better.

{"id":"01J1SJ7B84J1D7Q21DG6V9G4DF","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918636293,"updated":1719918636293,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1SJANAPESMHQG774G4KRW57","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hi there! 👋\n\nWhat can I do for you today? 😊","annotations":[]}}],"status":"ready","created":1719918744918,"updated":1719918744918,"object":"thread.message"}
{"id":"01J1SJAZJZ6DEYCMTNJ744Y5J9","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918755423,"updated":1719918755423,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1SJAZKH1TR7R72963754KQ8","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I'm Gemma, a large language model trained by Google DeepMind. Here are some things I can do:\n\n**Communication and Language:**\n\n* **Generate creative content:** Write stories, poems, articles, dialogue, and more.\n* **Translate languages:**\n\nTranslate text between different languages.\n* **Summarize text:** Condense large chunks of text into shorter summaries.\n* **Answer your questions:** Provide information on a wide range of topics.\n* **Engage in conversation:** Chat with you on various subjects.\n\n**Coding and Technical:**\n\n* **Generate code:** Write code in multiple programming languages.\n* **Explain code:** Provide explanations and insights into code snippets.\n* **Debug code:** Help identify and suggest solutions to coding errors.\n\n**Other Capabilities:**\n\n* **Provide definitions and explanations:** Define words, concepts, and ideas.\n* **Create lists and outlines:**\n\nHelp you organize your thoughts and ideas.\n* **Brainstorm ideas:** Generate new ideas for projects, stories, or solutions.\n\n**Keep in mind:**\n\n* I am still under development and learning.\n* I do not have access to real-time information or the internet.\n* My knowledge is based on the data I was trained on, which may be outdated.\n* I am not a substitute for professional advice.\n\n**How can I help you today?**","annotations":[]}}],"status":"ready","created":1719918755441,"updated":1719918755441,"object":"thread.message"}
{"id":"01J1SJC0PV6NM85M5TT7HT5DHH","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918789339,"updated":1719918789339,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1SJC0QJVHG7V11F29FVAFRB","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the fascinating world of multivariable calculus! \n\nThe Jacobian of the gradient of a multivariate scalar-valued function is called the **Hessian matrix**. \n\nHere's a breakdown:\n\n* **Scalar-valued function:** This means the function takes multiple inputs (variables) and outputs a single number.\n* **Gradient:** The gradient is a vector containing the partial derivatives of the function with respect to each input variable.  Think of it as the \"slope\" of the function in each direction.\n* **Jacobian:** The Jacobian is a matrix containing all the partial derivatives of a vector-valued function.  \n\nTherefore, the Hessian matrix is a square matrix where each entry is the second partial derivative of the scalar function.  \n\n**Mathematically:**\n\nLet  *f(x₁, x₂, ..., xₙ)* be a scalar-valued function of *n* variables. \n\n* The gradient of *f* is:  ∇*f* = (∂*f*/∂*x₁*, ∂*f*/∂*x₂*, ..., ∂*f*/∂*xₙ*)\n* The Hessian matrix of *f* is: \n  H = \n  ⎡ ∂²*f*/∂*x₁²  ∂²*f*/∂*x₁∂*x₂* ... ∂²*f*/∂*x₁∂*xₙ* ⎤\n  ⎢ ∂²*f*/∂*x₂∂*x₁*  ∂²*f*/∂*x₂²* ... ∂²*f*/∂*x₂∂*xₙ* ⎥\n  ⎢  ⋮               ⋮         ⋱    ⋮   ⎥\n  ⎣ ∂²*f*/∂*xₙ*∂*x₁* ∂²*f*/∂*xₙ*∂*x₂* ... ∂²*f*/∂*xₙ²* ⎦\n\n**Applications:**\n\nThe Hessian matrix is crucial in various areas of mathematics, physics, and engineering, including:\n\n* **Optimization:** Finding the minimum or maximum of a function.\n* **Machine Learning:** Training models, especially in gradient-based optimization algorithms.\n* **Stability Analysis:** Determining the stability of dynamical systems.\n\n\n\nLet me know if you have any other questions or would like to explore specific aspects of the Hessian matrix further!","annotations":[]}}],"status":"ready","created":1719918789361,"updated":1719918789361,"object":"thread.message"}
{"id":"01J1SJDVRTCV1QXAJAWNMBGJB8","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918849818,"updated":1719918849818,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-Virgo?","annotations":[]}}]}
{"id":"01J1SJDVSGVSKS35HXM39TPWZ5","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're referring to a truly remarkable event in astrophysics!  \n\nThe triple-messenger event, officially designated as GW170817, was the first time scientists detected gravitational waves, light (electromagnetic radiation), and neutrinos originating from the same source: the merger of two neutron stars. \n\nHere's a breakdown of what made this event so significant:\n\n* **Gravitational Waves:** Detected on August 17, 2017, by the LIGO and Virgo observatories, these ripples in spacetime confirmed the existence of a neutron star merger.\n* **Electromagnetic Radiation:** Telescopes around the world, including the Hubble Space Telescope, captured afterglow from the merger in various wavelengths of light. This provided crucial information about the composition and structure of the resulting object, likely a black hole.\n* **Neutrinos:** The IceCube Neutrino Observatory in Antarctica detected a burst of high-energy neutrinos coincident with the gravitational wave signal. Neutrinos are weakly interacting particles that can travel unimpeded through matter, offering a unique glimpse into the extreme conditions of the merger.\n\n**Key Findings:**\n\n* **Kilonova:** The merger produced a kilonova, a powerful explosion that briefly outshone entire galaxies. This event is believed to be the source of many heavy elements in the universe, such as gold and platinum.\n* **Neutron Star Properties:** The data helped scientists refine their understanding of the properties of neutron stars, their internal structure, and the process of their mergers.\n* **Multi-Messenger Astronomy:** GW170817 marked a watershed moment for multi-messenger astronomy, demonstrating the power of combining observations from different types of messengers to unlock the secrets of the universe.\n\n**Significance:**\n\nThis event was a major breakthrough in our understanding of:\n\n* **The origins of heavy elements:**  Kilonovae are now considered the primary source of these elements.\n* **The life cycle of stars:**  Neutron star mergers provide insights into the late stages of stellar evolution.\n* **Fundamental physics:**  The event tested Einstein's theory of general relativity in extreme gravitational environments.\n\nThe study of GW170817 continues to yield valuable insights, and future multi-messenger observations promise to reveal even more about the universe's most violent and fascinating phenomena.","annotations":[]}}],"status":"ready","created":1719918849840,"updated":1719918849840,"object":"thread.message"}
{"id":"01J1SJFSR610AKJMB2GX2SVFQK","thread_id":"jan_1719918617","role":"user","status":"ready","created":1719918913286,"updated":1719918913286,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1SJFSS1AGWD4KHGVE1W1VWE","thread_id":"jan_1719918617","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're asking for the juicy details! Here's a technical rundown of the GW170817 event:\n\n**Gravitational Waves:**\n\n* **Signal:**  The LIGO and Virgo detectors observed a continuous gravitational wave signal lasting approximately 100 milliseconds. \n* **Frequency:** The signal's frequency ranged from about 35 to 250 Hz, characteristic of a binary neutron star merger.\n* **Chirp Mass:** The chirp mass, a measure of the total mass of the system, was estimated to be 2.74 solar masses. This suggests the two neutron stars were relatively comparable in mass.\n* **Mass Ratio:** The mass ratio of the two neutron stars was approximately 1.17, indicating a relatively unequal mass distribution.\n\n**Electromagnetic Observations:**\n\n* **Kilonova:**  The electromagnetic counterpart was detected across the electromagnetic spectrum, starting with ultraviolet and optical light, then transitioning to X-rays and radio waves.\n* **Light Curve:** The light curve exhibited a characteristic \"kilonova\" shape, characterized by a rapid rise and then a gradual decline in brightness. \n* **Spectral Features:** The spectrum of the kilonova revealed the presence of elements like lanthanides and r-process elements (heavy elements formed in neutron star mergers).\n\n**Neutrino Observations:**\n\n* **Burst:** IceCube detected a short burst of high-energy neutrinos coinciding with the gravitational wave signal.\n* **Neutrino Spectrum:** The neutrino spectrum was consistent with expectations from a neutron star merger, indicating the release of tremendous energy.\n* **Neutrino Luminosity:** The estimated neutrino luminosity was orders of magnitude greater than the electromagnetic luminosity.\n\n**Distance:**\n\n* The distance to GW170817 was estimated to be about 130 million light-years away, based on the combined gravitational wave and electromagnetic observations.\n\n**Significance:**\n\n* **Multi-Messenger Confirmation:** The simultaneous detection of gravitational waves, light, and neutrinos provided the first direct confirmation of a neutron star merger.\n* **Heavy Element Formation:** The kilonova provided strong evidence that neutron star mergers are the primary source of heavy elements in the universe.\n* **Testing General Relativity:** The event tested Einstein's theory of general relativity in the extreme gravitational field of a merging neutron star binary.\n\nGW170817 continues to be a treasure trove of information for astrophysicists, providing unprecedented insights into the most energetic events in the universe.","annotations":[]}}],"status":"ready","created":1719918913312,"updated":1719918913312,"object":"thread.message"}

Any suggestions regarding the test would be highly appreciated. I can conduct the test again if required.

@thethinkmachine you're a legend, the first person to actually attempt to quantify the differences in the _L versions, I've been begging for this hahaha.

I'm gonna throw one up with FP32 embed/outputs, it definitely will not be a permanent edition cause the size will be awful, but I'm hoping it'll be indicative of potential BF16 embed/output if CUDA support ever arrives

I'll also throw up one with Q8 embed/outputs, if you're willing to test 2 new ones

Haha thanks! Needless to say, the above set of questions was only meant to get our feet wet. In my opinion, it falls short of leading us to a proper, more definitive conclusion.

Honestly, I would've performed a detailed analysis had I enough time (psst! college exams). Scheme would be different, probably using a set of at least 15-20 carefully curated questions, a Colab notebook & by leveraging some knowledge of statistics by trying to compare the distributions of certain "must have" keywords in the responses generated by unquantized models, with those of quantized versions.

Should be easy to do on a Colab notebook and would actually help us quantify something of statistical significance.

I'm gonna throw one up with FP32 embed/outputs, it definitely will not be a permanent edition cause the size will be awful, but I'm hoping it'll be indicative of potential BF16 embed/output if CUDA support ever arrives
I'll also throw up one with Q8 embed/outputs, if you're willing to test 2 new ones

Sure! Throw 'em in, I'll see what I can do.
I'm thinking of testing them differently this time, just once I get free.

Uploaded!

https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K-Q8.gguf

https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K-f32.gguf

awful names but again but will consider swapping out all of the experimental ones with Q8 if it proves valuable. F32 one is huge, almost 3GB larger than regular Q6, but it's more of a proof-of-concept for if/when bf16 support is added. Think I'll be dropping the FP16 ones soon unless some tests prove they're worth it.

Q6_K-Q8 - This one is a tad bit slower than the Q6_K (new) I tested above.
The responses generated against the current set of questions feel no different than the ones produced by f16.Q5_K_L. In some questions, the former seems to perform relatively well, in some questions the latter.
The results are inconclusive. At this point I feel inclined towards changing the entire set of questions and testing all these models once again. @bartowski I'd recommend you wait for further results before swapping anything out.

{"id":"01J1TC1NMKHPD66FP4JCZQQWDC","thread_id":"jan_1719945591","role":"user","status":"ready","created":1719945713299,"updated":1719945713299,"object":"thread.message","content":[{"type":"text","text":{"value":"Hi","annotations":[]}}]}
{"id":"01J1TC6RKAQSE53MCXMDDAVQ0H","thread_id":"jan_1719945591","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"Hello! 👋\n\nHow can I help you today?","annotations":[]}}],"status":"ready","created":1719945880170,"updated":1719945880170,"object":"thread.message"}
{"id":"01J1TC78KS9JMRVFGB7PXBF45G","thread_id":"jan_1719945591","role":"user","status":"ready","created":1719945896571,"updated":1719945896571,"object":"thread.message","content":[{"type":"text","text":{"value":"What can you do?","annotations":[]}}]}
{"id":"01J1TC78MK6532J1243KG8FK15","thread_id":"jan_1719945591","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"I am a large language model, trained by Google DeepMind.  Here are some of the things I can do:\n\n**Communication and Language:**\n\n* **Generate creative content:** Write stories, poems, articles, dialogue, and more.\n* **Translate languages:**  Translate text between multiple languages.\n* **Summarize text:**  Condense large amounts of text into shorter summaries.\n* **Answer your questions:** Provide information on a wide range of topics based on my training data.\n* **Engage in conversation:** Chat with you on various subjects.\n\n**Code and Data:**\n\n* **Generate code:**  Write code in different programming languages.\n* **Explain code:**  Provide explanations for existing code snippets.\n* **Analyze data:**  Identify patterns and insights in text data.\n\n**Other Capabilities:**\n\n* **Brainstorm ideas:** Help you generate new ideas for projects or problems.\n* **Provide definitions and explanations:** Define words and concepts.\n* **Offer suggestions and recommendations:**  Give you suggestions for books, movies, or activities.\n\n**Keep in mind:**\n\n* I am still under development and learning.\n* My knowledge is based on the data I was trained on, which has a cutoff point. I may not have information on very recent events.\n* I am not able to access real-time information or interact with the physical world.\n\n**How to use me:**\n\nSimply type your request or question in natural language.  I will do my best to understand and provide a helpful response.\n\n\n\nLet me know if you have any other questions or if there's anything specific you'd like me to do!","annotations":[]}}],"status":"ready","created":1719945896595,"updated":1719945896595,"object":"thread.message"}
{"id":"01J1TC9FZAFR9R1EWAR0BQBV92","thread_id":"jan_1719945591","role":"user","status":"ready","created":1719945969642,"updated":1719945969642,"object":"thread.message","content":[{"type":"text","text":{"value":"What is the jacobian of the gradient of a multivariate scalar valued function?","annotations":[]}}]}
{"id":"01J1TC9FZX4T8ACJGMWSN8WZQH","thread_id":"jan_1719945591","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"That's a great question that delves into the fascinating world of multivariable calculus! \n\nThe Jacobian of the gradient of a multivariate scalar-valued function is called the **Hessian matrix**. \n\nHere's a breakdown:\n\n* **Scalar-valued function:** This means the function takes multiple inputs (vectors) and outputs a single number (a scalar).\n\n* **Gradient:** The gradient of a scalar-valued function is a vector containing the partial derivatives of the function with respect to each input variable.  \n\n* **Jacobian:** The Jacobian is a matrix containing all the first-order partial derivatives of a vector-valued function. In our case, the gradient is a vector, so its Jacobian is a matrix.\n\n* **Hessian Matrix:**  The Hessian matrix is the Jacobian of the gradient.  It's a square matrix where each element is the second-order partial derivative of the function. \n\n**In mathematical terms:**\n\nLet  *f* be a scalar-valued function of *n* variables,  *x* = ( *x*<sub>1</sub>, *x*<sub>2</sub>, ..., *x*<sub>n</sub>). \n\n* The gradient of *f* is denoted as ∇*f* and is a vector:\n   ∇*f* = ( ∂*f*/∂*x*<sub>1</sub>, ∂*f*/∂*x*<sub>2</sub>, ..., ∂*f*/∂*x*<sub>n</sub>)\n\n* The Hessian matrix of *f* is denoted as *H*<sub>*f*</sub> and is an *n*×*n* matrix:\n   *H*<sub>*f*</sub> = ∇(∇*f*) =  \n   [ ∂<sup>2</sup>*f*/∂*x*<sub>i</sub>∂*x*<sub>j</sub> ]\n\n   where *i* and *j* range from 1 to *n*.\n\n\nThe Hessian matrix is crucial in many areas of mathematics, physics, and machine learning, especially for optimization problems. It provides information about the concavity or convexity of a function and can be used to find its critical points (maxima, minima, and saddle points).","annotations":[]}}],"status":"ready","created":1719945969661,"updated":1719945969661,"object":"thread.message"}
{"id":"01J1TCBYFW8CEZWFZMTR7F4CTG","thread_id":"jan_1719945591","role":"user","status":"ready","created":1719946050044,"updated":1719946050044,"object":"thread.message","content":[{"type":"text","text":{"value":"What do you know about the triple messenger event detected by LIGO-Virgo?","annotations":[]}}]}
{"id":"01J1TCBYGRF3E29GRF7BP36JB7","thread_id":"jan_1719945591","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You're talking about a truly exciting event in astrophysics! \n\nThe triple messenger event you're referring to is **GW170817**, detected on August 17, 2017, by the LIGO and Virgo gravitational wave observatories. \n\nHere's what makes it so significant:\n\n* **Gravitational Waves:** LIGO and Virgo directly detected gravitational waves produced by the merger of two neutron stars. This was the first time gravitational waves were observed from a binary neutron star merger.\n* **Electromagnetic Signals:**  Following the gravitational wave detection, astronomers around the world observed a bright electromagnetic counterpart. This included:\n    * **Kilonova:** A powerful explosion of light and other radiation, powered by the radioactive decay of heavy elements created in the merger.\n    * **Afterglow:**  Fainter radiation that persisted for days, weeks, and even months after the kilonova.\n* **Neutrino Signals:**  \n\nThe IceCube neutrino observatory also detected a burst of high-energy neutrinos, likely originating from the same event.\n\n**Why is GW170817 so important?**\n\n* **Multi-Messenger Astronomy:** It was the first time we observed the same astrophysical event across multiple messengers (gravitational waves, light, and neutrinos). This opened a new window to understanding the universe.\n* **Neutron Star Physics:** The event provided valuable insights into the properties of neutron stars, their internal structure, and the processes that occur during their mergers.\n* **Heavy Element Formation:**  GW170817 confirmed the theory that neutron star mergers are a significant source of heavy elements (like gold and platinum) in the universe.\n\nThe study of GW170817 is ongoing, and scientists continue to analyze the vast amount of data collected from this incredible event.\n\n\n\nLet me know if you have any other questions about GW170817 or other gravitational wave events!","annotations":[]}}],"status":"ready","created":1719946050072,"updated":1719946050072,"object":"thread.message"}
{"id":"01J1TCE189RRGQ0A9YP2Y1RC1T","thread_id":"jan_1719945591","role":"user","status":"ready","created":1719946118409,"updated":1719946118409,"object":"thread.message","content":[{"type":"text","text":{"value":"Give me the technical details of the event.","annotations":[]}}]}
{"id":"01J1TCE198EVY3VXZAMYQTGBHJ","thread_id":"jan_1719945591","type":"Thread","role":"assistant","content":[{"type":"text","text":{"value":"You got it! Here are some technical details about GW170817, keeping in mind that research is still ongoing and our understanding is constantly evolving:\n\n**Gravitational Wave Signal:**\n\n* **Detection Time:** August 17, 2017, 12:41:04 UTC\n* **Sources:** \n    * LIGO Hanford Observatory (LIGO-H)\n    * LIGO Livingston Observatory (LIGO-L)\n    * Virgo Interferometer\n* **Signal Characteristics:**\n    * **Chirp Mass:** 2.74 ± 0.04 solar masses\n    * **Total Mass:** 2.7 ± 0.2 solar masses\n    * **Spin:** Both neutron stars were spinning rapidly, but their spins were not aligned with the orbital angular momentum.\n* **Waveform Analysis:** The observed waveform matched theoretical predictions for the inspiral and merger of two neutron stars remarkably well.\n* **Luminosity Distance:**  40 ± 8 megaparsecs (Mpc)\n\n**Electromagnetic Counterpart:**\n\n* **First Observation:**  ~ 1.7 seconds after the gravitational wave signal was detected.\n* **Telescopes:** \n    * The Zwicky Transient Facility (ZTF)\n    *  The Fermi Gamma-ray Space Telescope\n    *  Other optical and infrared telescopes worldwide\n* **Kilonova Emission:** A bright, rapidly fading light curve attributed to the radioactive decay of heavy elements synthesized in the merger.\n* **Afterglow:**  Faint emission that persisted for weeks and months, likely due to the interaction of the ejecta with the surrounding interstellar medium.\n\n**Neutrino Signals:**\n\n* **Detection Time:** ~ 1.7 seconds after the gravitational wave signal.\n* **Detector:** IceCube Neutrino Observatory\n* **Neutrino Energy:**  High-energy neutrinos, consistent with expectations from a neutron star merger.\n\n\n\n**Key Takeaways:**\n\n* GW170817 provided the first direct evidence for the connection between gravitational waves, electromagnetic radiation, and neutrinos from a single astrophysical event.\n* It confirmed the theory that neutron star mergers are the primary source of heavy elements in the universe.\n* The event pushed the boundaries of multi-messenger astronomy, opening new avenues for studying the most violent events in the cosmos.\n\n\n\nLet me know if you have any more specific questions about any aspect of the event!","annotations":[]}}],"status":"ready","created":1719946118440,"updated":1719946118440,"object":"thread.message"}

oh yeah don't worry, I'm likely not even gonna have a decision made this week

I've started running MMLU pro benchmark against a few phi 3 models since those are nice and tiny and should run.. well, within days instead of weeks on my hardware ;P

Are you gonna benchmark individual quants? 👀

Thanks Einstein, you just gave me a brilliant idea! 😂

You're actually right about the Phi-3(.1) mini models! :P (I see what you did there lmao) They're a breeze to load & run.
I am thinking of continuing my tests with them instead of this one. They load up quite easily on my system and the token speeds are blazing fast. Perfect for the job I suppose.

not quite, I'm using Q3_K_L as the base since that should be most affected by differences

So far, FP32 embed/output == Q8 > default > fp16...

will run as large a variety of tests as I can afford, but it's not looking good for fp16.

If I recall correctly, a lot of old GPUs can't inference BF16, and I don't think MAC devices support it either, so keeping the embedding and output tensors at FP16 for the experimental versions should give the most compatibility.

Even new GPUs can't do bf16 on llamacpp, so me doing fp32 is a proof of concept since it represents the range better

Currently though Q8 is ahead of fp16 so we may not even have to worry about that 🤷‍♂️

Similar observations with F32 e/o & F16 e/o.

I compared Phi-3 Mini F32.IQ8_0 (with F32 embed/output) by @Joseph717171 (https://huggingface.co/Joseph717171/Models/tree/main) and the F16.Q8_0_L (F16 embed/output) quant of Phi-3 Mini which you uploaded a few hours ago. The imatrix used is the same as yours I presume.

I matrix doesn't matter at Q8 anyways, all the weights are maxed (I should double check that fact, I've been saying it a lot lately)

Going to test the following:
F16 vs Non F16

  • Gemma 2 9B IT F16.Q3_K_XL vs Gemma 2 9B IT Q3_K_L
  • Phi-3.1 Mini 4K Instruct F16.Q3_K_XL vs Phi-3.1 Mini 4K Instruct Q3_K_L

@bartowski Could you make Q8 embed/output variants for the above quants? I'll be comparing them to non-Q8 e/o quants.

Hey great job with the analysis! 🤗
Q8 seems to be the way to go but the math performance bothers me.
Couldn't help but notice the sudden decline in Math scores. F32 seems to really stick to its name somehow when it comes to that. What do you think might be causing this?

I was caught up with some work yesterday and it seems I'll be keeping this way until Sunday (Exam). So hopefully looking forward to sharing my results in a few days.

I'm changing the set of questions this time; l'll compare Q3 & Q4 quants at 4 different levels of embed/output (f32, f16, Q8, default), focusing mostly on the kind of LR & Math based questions that the quants have a difficult time answering (but are breeze for full weights). Could you upload these embed/output levels for Phi-3 & Gemma-2? Would be easier if you could post all of them here, as I'm getting confused with the names. It's honestly a headache.

Going to test the following:
F16 vs Non F16

  • Gemma 2 9B IT F16.Q3_K_XL vs Gemma 2 9B IT Q3_K_L
  • Phi-3.1 Mini 4K Instruct F16.Q3_K_XL vs Phi-3.1 Mini 4K Instruct Q3_K_L

I'll include their results in that analysis.

I'm hoping the math score is statistical noise, computer science is often a category I find reliable for determining how well a model was quantized

I did want to rerun the math one to double check, ideally I'd rerun all of them multiple times but not sure I can afford it 😂

Actually I did rerun the q8 math I just remembered and the score improved to 43.9% which is now the highest score.. maybe I'll try them all again for real lol and average the results

Actually I did rerun the q8 math I just remembered and the score improved to 43.9% which is now the highest score.. maybe I'll try them all again for real lol and average the results

Okay this looks random. Then I'd suggest running the bench multiple times and averaging out the results. Somebody on reddit suggested doing the same thing.

  1. Caution on extending this finding to all models. Especially the new Gemma 2, that seems to benefit from using FP16.
  2. LLMs output a stochastic response. Ideally, the tests should be repeated about 10 times, and the results should report the average and the standard deviation of these 10 repetitions.
    Yeah, I'm working on the former. Doing analysis on at least two models should give a better sense of things. Will hopefully post my results in a few days.

Honestly, the idea of Gemma 2 benefitting from F16 embed/outputs seems a too strong statement. I'd refrain from saying anything as I've yet to ascertain this. I was under the same impression initially (read the first post of this discussion), but later on the newer 'default' quants (non-F16 embed/output) (Q6 mainly) that I tested seemed to be giving an equivalent (if not better) output (in terms of response length, detail and elaborateness). I believe that may be due to the recent llama.cpp merge which required most Gemma 2 quants to be redone ~Q5_K, Q6_K for instance, would give a relatively better output than the older quants, sometimes matching the levels of F16 embed/output versions, sometimes doing even better.

All this suggests an element of randomness (apart from the recent lcpp merge) is involved. Good idea, I'll set temperature to 0 next time before pressing 'generate' lmao.

Yes there's definitely an element of randomness, my hope is that the number of questions helps to alleviate the randomness, and I'm willing to call the trend across categories conclusive ENOUGH for me to change my releases, I would love to test it much further though

Hopefully any confusion regarding that should be resolved by the next week when I'll be primarily working with Q3s and Q4s at various levels of embed/output. Question bank will also be different, specially curated. Gemma will have a lot of explaining to do lol

https://boards.4chan.org/g/thread/101268178#p101269594

Somethin's wrong with the new _L quant, it looks like it's less accurate than the original one
For example here Q6_K is closer to Q8_0 compared to Q6_K_L
4chan /lmg/ saying f16 _L are behaving weird too.

Oh hey, people actually testing the _L models.. glad they posted some thoughts/conclusions, thanks for think!!

I believe that generating multiple responses for each question and going through each of them, averaging the results could boost some confidence into the conclusion.
So, for 15 questions, that'll make 45 responses for a single model. Now multiply that by 4 (f32, f16, q8, default) ~180

Well, that's just only Q3. What if I redo all this for Q4? 💀

I think I'll have to take a day off after 7th just for that lmao. A litre of coffee should do the trick.

@bartowski suggest me an ideal inference parameter setting for the test

like you mean temperature? I suppose in an ideal world a temperature setting of 0 would mean you don't have to run the tests multiple times, but a temp of .1 to .2 would give you small enough variance that you can get away with only running it 2-4 times

Yeah, have a basic idea about that, but what about top-p and top-k? Should I leave them as is?

if you have temp at 0 the others don't matter, if you aren't leaving it 0 i'd prob leave them close to default?

MMLU pro uses temp of 0.1 and top_p of 1, i feel i'd maybe pull back on the top_p a bit just cause if there's one clear contender for best token you should prob just take it and not risk bringing others along for the ride (so like .8?)

If you're going all the way up to .5-1.0 i'd start looking at things like min_p since those handle higher temperatures in a more natural way

Sign up or log in to comment