Quick Start Guide

Quick Start Guide

This quick start guide introduces the main concepts of using TimeSeriesClustering. The examples are taken from problems in the domain of scenario reduction for energy systems optimization. For more detail on the different functionalities that TimeSeriesClustering provides, please refer to the subsequent chapters of the documentation or the examples in the examples folder, specifically workflow_introduction.jl.

Generally, the workflow consists of three steps:

Example Workflow

After TimeSeriesClustering is installed, you can use it by saying:

julia> using TimeSeriesClustering

The first step is to load the data. The following example loads hourly wind, solar, and demand data for Germany (1 region) for one year.

julia> ts_input_data = load_timeseries_data(:CEP_GER1)
┌ Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.
│   caller = #add_timeseries_data!#12(::Int64, ::Int64, ::Array{Int64,1}, ::Function, ::Dict{String,Array}, ::SubString{String}, ::DataFrames.DataFrame) at load_data.jl:132
└ @ TimeSeriesClustering ~/build/holgerteichgraeber/TimeSeriesClustering.jl/src/utils/load_data.jl:132
ClustData("none", [2016], 366, 24, Dict{String,Array}("solar-germany"=>[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0],"wind-germany"=>[0.1429 0.1453 … 0.1329 0.1832; 0.1368 0.1758 … 0.1312 0.1802; … ; 0.1098 0.4955 … 0.1904 0.3122; 0.1254 0.4875 … 0.1865 0.3187],"el_demand-germany"=>[41913.0 39121.0 … 45343.0 45600.0; 40331.0 38271.0 … 44402.0 44332.0; … ; 44439.0 48859.0 … 50278.0 48988.0; 41257.0 45600.0 … 47534.0 47641.0]), [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], Dict{String,Array}("solar-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"wind-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"el_demand-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), Dict{String,Array}("solar-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"wind-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"el_demand-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]), [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  357, 358, 359, 360, 361, 362, 363, 364, 365, 366])

The output ts_input_data is a ClustData data struct that contains the data and additional information about the data.

julia> ts_input_data.data # a dictionary with the data.
Dict{String,Array} with 3 entries:
  "solar-germany"     => [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0…
  "wind-germany"      => [0.1429 0.1453 … 0.1329 0.1832; 0.1368 0.1758 … 0.1312…
  "el_demand-germany" => [41913.0 39121.0 … 45343.0 45600.0; 40331.0 38271.0 … …

julia> ts_input_data.data["wind-germany"] # the wind data (choose solar, el_demand as other options in this example)
24×366 Array{Float64,2}:
 0.1429  0.1453  0.4843  0.4279  0.248   …  0.3359  0.0793  0.1329  0.1832
 0.1368  0.1758  0.4819  0.4186  0.2574     0.318   0.0803  0.1312  0.1802
 0.1232  0.2135  0.4792  0.407   0.2682     0.2949  0.0791  0.1337  0.1779
 0.1096  0.2466  0.4838  0.3976  0.2764     0.2739  0.0775  0.1363  0.1796
 0.0964  0.2818  0.4917  0.3873  0.2784     0.2688  0.0774  0.1382  0.1872
 0.082   0.3209  0.4862  0.3776  0.2797  …  0.2638  0.0781  0.1387  0.1971
 0.0706  0.3548  0.4784  0.3655  0.2796     0.2419  0.08    0.1401  0.2072
 0.0593  0.3921  0.471   0.3528  0.2828     0.2151  0.0818  0.1416  0.2151
 0.0438  0.422   0.4786  0.3432  0.2878     0.1813  0.0777  0.1347  0.2067
 0.0317  0.4536  0.475   0.3259  0.2823     0.1494  0.0653  0.1177  0.1981
 ⋮                                       ⋱                          ⋮
 0.0116  0.5288  0.4637  0.248   0.2738  …  0.1112  0.1266  0.2099  0.2792
 0.0222  0.542   0.4832  0.262   0.2733     0.1005  0.1458  0.233   0.2855
 0.0346  0.5416  0.4879  0.2673  0.2672     0.0811  0.1578  0.2339  0.2864
 0.0497  0.5316  0.4804  0.2629  0.2585     0.0636  0.1626  0.2217  0.2926
 0.0669  0.521   0.4651  0.2549  0.2511     0.0521  0.1627  0.2086  0.2974
 0.0817  0.5103  0.4526  0.2448  0.2453  …  0.0472  0.1576  0.2014  0.2987
 0.0948  0.5017  0.446   0.2379  0.2403     0.0517  0.1493  0.1961  0.3033
 0.1098  0.4955  0.4387  0.2395  0.2356     0.0617  0.1431  0.1904  0.3122
 0.1254  0.4875  0.4349  0.2433  0.2316     0.0725  0.1376  0.1865  0.3187

julia> ts_input_data.K # number of periods
366

The second step is to cluster the data into representative periods. Here, we use k-means clustering and get 5 representative periods.

julia> clust_res = run_clust(ts_input_data;method="kmeans",n_clust=5)
ClustResult(ClustData("none", [2016], 5, 24, Dict{String,Array}("solar-germany"=>[-0.0 0.0 … -0.0 -0.0; -0.0 0.0 … -0.0 -0.0; … ; -0.0 0.0 … -0.0 -0.0; -0.0 0.0 … -0.0 -0.0],"wind-germany"=>[0.279123 0.111879 … 0.144016 0.120139; 0.275462 0.10753 … 0.140524 0.115391; … ; 0.278736 0.117154 … 0.158663 0.130994; 0.270642 0.11598 … 0.154858 0.128324],"el_demand-germany"=>[47728.1 44242.9 … 42119.1 50251.2; 47369.8 44332.4 … 41490.6 49689.1; … ; 52237.4 47708.6 … 44497.6 55596.0; 49488.9 45671.1 … 42358.4 52602.3]), [64.0, 113.0, 28.0, 95.0, 66.0], Dict{String,Array}("solar-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"wind-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"el_demand-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), Dict{String,Array}("solar-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"wind-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"el_demand-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]), [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], [4, 3, 3, 1, 1, 5, 1, 3, 4, 1  …  5, 1, 3, 3, 3, 3, 4, 4, 4, 4]), 5665.825531962604, Dict{String,Any}("attribute_weights"=>Dict{String,Float64}(),"n_clust"=>5,"method"=>"kmeans","iterations"=>300,"norm_op"=>"zscore","n_init"=>1000,"norm_scope"=>"full","representation"=>"centroid","n_seg"=>24))

julia> ts_clust_data = clust_res.clust_data
ClustData("none", [2016], 5, 24, Dict{String,Array}("solar-germany"=>[-0.0 0.0 … -0.0 -0.0; -0.0 0.0 … -0.0 -0.0; … ; -0.0 0.0 … -0.0 -0.0; -0.0 0.0 … -0.0 -0.0],"wind-germany"=>[0.279123 0.111879 … 0.144016 0.120139; 0.275462 0.10753 … 0.140524 0.115391; … ; 0.278736 0.117154 … 0.158663 0.130994; 0.270642 0.11598 … 0.154858 0.128324],"el_demand-germany"=>[47728.1 44242.9 … 42119.1 50251.2; 47369.8 44332.4 … 41490.6 49689.1; … ; 52237.4 47708.6 … 44497.6 55596.0; 49488.9 45671.1 … 42358.4 52602.3]), [64.0, 113.0, 28.0, 95.0, 66.0], Dict{String,Array}("solar-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"wind-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"el_demand-germany"=>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), Dict{String,Array}("solar-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"wind-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],"el_demand-germany"=>[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]), [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; … ; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], [4, 3, 3, 1, 1, 5, 1, 3, 4, 1  …  5, 1, 3, 3, 3, 3, 4, 4, 4, 4])

The ts_clust_data is a ClustData data struct, this time with clustered data (i.e. less representative periods).

julia> ts_clust_data.data # the clustered data
Dict{String,Array} with 3 entries:
  "solar-germany"     => [-0.0 0.0 … -0.0 -0.0; -0.0 0.0 … -0.0 -0.0; … ; -0.0 …
  "wind-germany"      => [0.279123 0.111879 … 0.144016 0.120139; 0.275462 0.107…
  "el_demand-germany" => [47728.1 44242.9 … 42119.1 50251.2; 47369.8 44332.4 … …

julia> ts_clust_data.data["wind-germany"] # the wind data. Note the dimensions compared to ts_input_data
24×5 Array{Float64,2}:
 0.279123  0.111879   0.548621  0.144016  0.120139
 0.275462  0.10753    0.551439  0.140524  0.115391
 0.274094  0.103913   0.555379  0.137479  0.111139
 0.271725  0.0979832  0.555575  0.132972  0.106932
 0.266947  0.0846575  0.557968  0.123634  0.103505
 0.261647  0.0703797  0.562114  0.113371  0.101791
 0.267825  0.0655434  0.562361  0.11      0.0997636
 0.284169  0.0728327  0.561371  0.115243  0.0961758
 0.305178  0.0824319  0.565257  0.124365  0.0925379
 0.321837  0.0881735  0.578132  0.131591  0.0918818
 ⋮
 0.326442  0.0904354  0.592429  0.136011  0.107488
 0.314616  0.0869142  0.590043  0.136342  0.122298
 0.308023  0.0888266  0.584318  0.140142  0.134076
 0.304941  0.099415   0.577296  0.149868  0.139071
 0.300441  0.110627   0.570618  0.158697  0.139342
 0.292245  0.115831   0.563343  0.1617    0.136606
 0.285137  0.117286   0.560425  0.161424  0.13353
 0.278736  0.117154   0.557464  0.158663  0.130994
 0.270642  0.11598    0.550454  0.154858  0.128324

julia> ts_clust_data.K # number of periods
5

The clustered input data can be used as input to an optimization problem. The optimization problem formulated in the package CapacityExpansion can be used with the data clustered in this example.