We will create a histogram for a selected time window with the WIN_SIZE width.
The histogram will hold the HIST_BINS value buckets. The histograms consisting of lists of doubles will be stored in an array list:
int WIN_SIZE = 500; int HIST_BINS = 20; int current = 0; List<double[]> dataHist = new ArrayList<double[]>(); for(List<Double> sample : rawData){ double[] histogram = new double[HIST_BINS]; for(double value : sample){ int bin = toBin(normalize(value, min, max), HIST_BINS); histogram[bin]++; current++; if(current == WIN_SIZE){ current = 0; dataHist.add(histogram); histogram = new double[HIST_BINS]; } } dataHist.add(histogram); }
The histograms are now completed. The last step is to transform them into Weka's Instance objects. Each histogram value will correspond to one Weka attribute, as follows:
ArrayList<Attribute> attributes = new ArrayList<Attribute>(); for(int i = 0; i<HIST_BINS; i++){ attributes.add(new Attribute("Hist-"+i)); } Instances dataset = new Instances("My dataset", attributes,
dataHist.size()); for(double[] histogram: dataHist){ dataset.add(new Instance(1.0, histogram)); }
The dataset has been now loaded, and is ready to be plugged into an anomaly detection algorithm.