google/timesfm-1.0-200m

May 15

Hi, what exactly does this model do? can you give an example use case?

May 15

It's detailed pretty well here: https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/

takeraparterer

May 15

if I get this right, it's a generically applicable model to predict time series

sdalemorrey

May 15

•

edited May 15

That's correct and there's a few models both AI and algorithmic which do the same thing.

What's innovative is it's zero shot performance meaning you don't have to fine tune it. In fact as near as I can tell there is no way to fine tune it. The class doesn't have a "fit" method.

Stuff in 512 data points and get up to 512 more back.

I've been playing with it and looking at the papers and I can't replicate the paper though so YMMV.
To be perfectly honest though I can't replicate their results and giving it scraps with 512 daily datapoints just gives noise in return, I'm seeing HUGE mse.

SARIMA is the base line for time series data with a seasonal element and I noticed that they only tested against ARIMA even though nearly all the data they tested with has strong seasonality. That makes me question their assertions.

siriuz42

Google org May 15

•

edited May 15

This is not the exact checkpoint we used for the paper so there might be minor differences in the numbers. Before releasing I've tested this checkpoint on the Monash benchmark used in the paper and got GM = 0.6841, which should be close enough. Would you mind sharing which metrics you could not approximately reproduce with this checkpoint?

gabaid971

May 15

there is no way to fine tune the model on specific datas ?

sdalemorrey

May 15

•

edited May 16

@siriuz42

After a more careful examination of the paper I retract my earlier statement about SARIMA. I somehow skipped over the word “seasonal” next to ARIMA in most cases. SARIMA is the acronym I was expecting to see. In any case, my apologies.

As for reproducing the results, my comment was pretty global to unseen data. I’ve got a whole bunch of data very similar to the benchmarks, but they’re my own and mostly unseen. Nevertheless they are normalized and structured in similar ways to how it appears you prepared the training data and yet I’m not seeing numbers anything like your paper and more importantly, try as I might, I can’t get this thing to beat SARIMA on the same data.

If you want a quick test case, one of your data points compares this model with another model on Bitcoin data. 10 years worth of BTCUSD daily candle history is available for free from finance.yahoo.com.

BTCUSD is a good candidate for testing against since it looks random but is highly seasonal and cyclical.

The data has a 4 year major season where the mean closing price doubles at least every 4 years.
It’s a consistent 0.0279% daily increase to the mean when measured between any two points that at 365*4 days apart.

It also shows a hidden feature when you take close/volume starting with 0.82 correlation in 7 day intervals for 10 week windows decaying to 0.65 in fairly even steps. This means that big moves in any direction have an echo or an after shock at 7,14,21 days etc up to 70 days.

In theory at least it should get a 90day forecast trend right with at least with weekly alignment given a 512 day lead in, but instead it straight up decays in the direction of the most recent trend line.

Unfortunately, this is mostly down since BTC spikes high once every 4 years with a double top and then spends the rest of the time on an overall down trend until it reaches double the mean of the previous season.

That’s why I wanted to fine tune it on data I’ve prepared such that any closing change outside 1 std deviation is featured in the center.

In theory this would compensate for the overwhelming number of lower closes as compared to upper closes.

Try as I might I do not see any way to fine tune this puppy and at the moment SARIMA is getting at least the direction right while timesfm is hit or miss on the same data.

Is there a data prep and/or fine tune procedure documented anywhere?

koven2049

May 16

We have a lot of monitor data from prometheus. And we are trying to set alert threshold(which means abnormal metrics) by ML. Normally we need to training many models for each service or even each demension

I dont know whether it can help us to use one model do all services. But it worth to try (it can save a lot GPU)

sdalemorrey

May 16

@koven2049 I can't speak for the authors, but I don't think is going to help with that unless you need to forecast future errors.

In either event you need to load your data into a pandas data frame, calculate the mean and the std deviation then look for any events that occur outside of 1 stddev. You can put that into an autocorrelation tool to see if there is any time based correlation to your data. If there is then time series forecasting can help you predict errors. If however there's no correlation along the time axis then your data is random or at least stochastic, it can't be predicted and you're going to be a lot better off just monitoring for abnormalities by watching for events more than 1 or 2 std deviations outside the mean, as they occur.

Surendar0701

May 17

This comment has been hidden

sdalemorrey

May 17

@koven2049

I was taking a closer look at the model outputs and realized this may in fact help you.

In addition to a prediction, this model outputs a series of quantiles. These quantiles give you numbers to look for that represent the bands of frequency of occurrence. You can then just look for numbers outside that range.

Let's say you have a cooling system that uses a water pump and a float to turn a fan on and off. A so called "swamp cooler". You're attached to a municipal water source and the water is hard so over time the float degrades to the point it's stuck in position. You don't want to replace the float when you don't have to and you've noticed that water flow to the water pump is a good proxy for the condition of the float.

But your flowmeter is outside and in direct sunlight, and you live in a place with all 4 seasons so the data looks erratic but you can see when you graph it that it isn't random.

You could mark the dates where you've had to replace the float and use those end point readings as a "worst possible case", most likely it will be either all the way on or all the way off, but a constant flow rate could be possible as well. You can then feed your time series data into the model and project out 90 days. You'll get a prediction but more importantly you'll get a range of values ordered by their statistical likelihood in each column. You can use this range to tell you if the float is wearing out or if the float is fine but the flowmeter is giving numbers that are erratic because of temperature, season etc. It can serve as an early warning system to give you time to save up for a new float.

You could apply this to any monitoring where you need a range of values ordered by their statistical likely hood that could translate into a decision or an alert. Just be careful to know exactly what you're really measuring.

koven2049

May 23

@sdalemorrey I think so.
I don't know the generalization ability of this model and doing tests on our system now. But need some time to fix the engineering issue first.
Also if they can tell us what is the pretraining data(or if the dataset contains monitor data) it can help a lot. Maybe we need some fine-tuning or feature extraction

siriuz42

Google org May 30

@koven2049

re pretrain: please check out the table 1 in the paper https://arxiv.org/pdf/2310.10688

SpirosM10

Jun 18

That's correct and there's a few models both AI and algorithmic which do the same thing.

What's innovative is it's zero shot performance meaning you don't have to fine tune it. In fact as near as I can tell there is no way to fine tune it. The class doesn't have a "fit" method.

Stuff in 512 data points and get up to 512 more back.

I've been playing with it and looking at the papers and I can't replicate the paper though so YMMV.
To be perfectly honest though I can't replicate their results and giving it scraps with 512 daily datapoints just gives noise in return, I'm seeing HUGE mse.

SARIMA is the base line for time series data with a seasonal element and I noticed that they only tested against ARIMA even though nearly all the data they tested with has strong seasonality. That makes me question their assertions.

Hey, ive experemented with this model myself for high frequency data (10 minute intervals of meteorology data) and one tip i can give is dont use a massive context window and predict values one at a time in a sliding window manner where your context is the window size and your horizon is the step size.

google
/

timesfm-1.0-200m

Usecase