# Let's start MANUFACIA 2

Creating a MANUFACIA project proceeds in following eight steps;

1.  Create a new project
2.  Select AI model type
3.  Select AI model training type
4.  Select tags for assets to use
5.  Label assets
6.  Split data
7.  Define pre-processing
8.  Train models

(create-new-project)=
## Create a new project

Click [Project list] tab after registering all assets.

```{image} media/CreateProject_Tab.png
:width: 800px
:name: img_create_project_tab
```

Click [+ Create New] of the project.

```{image} media/CreateProject_NewProject.png
:width: 600px
:name: img_create_new_project
```

Enter project name (obligatory) and overview (optional), then click [Next].

```{image} media/CreateProject_ProjectName.png
:width: 500px
:name: img_create_new_project_project_name
```

## Select AI model type

Select [Anomaly detection] or [Classification] due to which AI model to create (this example shows the case of Anomaly detection) and then click [Next]. See [AI model types](./index_1.html#ai-model-types) for AI model type detail.

```{image} media/CreateProject_AnomalyDetection.png
:width: 500px
:name: img_create_project_anomaly_detection
```

## Select AI model training type

Select [Supervised learning] or [Unsupervised learning] (This example shoes the case of supervised learning) and then clock [OK].

```{image} media/CreateProject_Supervised.png
:width: 500px
:name: img_create_project_supervised
```

Confirm the project has been created.

```{image} media/CreateProject_View_ProjectList.png
:width: 600px
:name: img_create_view_project_list
```

## Select tags for assets to use

Click [Select assets] of the project to use.

```{image} media/CreateProject_AssetSelection.png
:width: 600px
:name: img_asset_selection
```

Select all tags for assets to use. In addition to all tags for classification, select also "training", "test" if datasets were separately prepared for training and validation. See also [Example of tag settings](./index_2.html#examples-tag-settings)

Confirm if all necessary tags are selected, then click [OK].

```{image} media/CreateProject_AssetSelected.png
:width: 800px
:name: img_asset_selected
```

Click [Visualize] to show the percentage of assets related to each tag to the whole.

```{image} media/CreateProject_AssetPercentage.png
:width: 800px
:name: img_asset_percentage
```

(label-assets)=
## Label assets

Label should be set to each tag of assets for training.

On the right of the screen tag will be set, then on the left, label will be selected from the list or manually given for multi-class classification.

```{image} media/CreateProject_SelectTag.png
:width: 800px
:name: img_select_tag
```

To set tags, click on the tag entry field (space on the left of trash can icon) and then select from the list.

```{image} media/CreateProject_SelectTag_ok.png
:width: 600px
:name: img_select_tag_ok
```

Tags to be selected is a class; OK/NG for anomaly detection, fruits name etc. for multi-class classification. Even with the same class, assets that were prepared separately for training and validation cannot be labeled together.
If there are several classes of "normal" or "anomaly", label "normal" or "anomaly" seperately for each class/folder.

Entry field for only one class is available in the beginning, click [+ Create label] to increase the entry field. If all classes have been labeled, then the labeling has completed.

```{image} media/CreateProject_NewLabel.png
:width: 800px
:name: img_create_project_new_label
```

```{tip}
  For each model, 20 labels can be registered at most.
```

```{caution}
  1. Do not use half-width Katakana for label. It may cause training error.
  2. Selecting more than one tag, the assets should satisfy both tag conditions (AND condition).
```

```{important}
  Be sure that **assets percentage of all labels are not 0％**. If the percentage shows 0%, it is the case that more than one tags are linked to one label and **there are no overlapping of assets linked to all those tags.

  Confirm first, if there was no error to tag assets while register assets, or in labeling assets, then redo labeling.
  
```{image} media/CreateProject_LabelingError.png
:width: 800px
:name: img_create_project_labeling_error
```

### Label examples

**Example 1. Anomaly detection, supervised learning**

```{image} media/CreateProject_Label_Supervised_AnomalyDetection.png
:width: 800px
:name: img_create_project_label_supervised_anomaly_detection
```

**Example 2. Anomaly detection, unsupervised learning**

```{image} media/CreateProject_Label_Unsupervised_AnomalyDetection.png
:width: 800px
:name: img_create_project_label_unsupervised_anomaly_detection
```

**Example 3. Multi-class Classification**

```{image} media/CreateProject_Label_MultiClass_Classification.png
:width: 800px
:name: img_create_project_label_multiclass_classification
```

After finishing labeling and tagging for all classes, click [Complete labeling and continue].

```{caution}
  1.  Percentage displayed above for labeling is based on all assets registered in the lab. There is no problem that the percentage is below 100%, if not all the data in the lab is used.
  2.  For unsupervised learning, "training" tag cannot be applied to anomaly data, or the training fails.
  3.  Incomplete tagging or labeling (without labeling or the percentage is zero) will cause training error.
```

(split-data)=
## Split data

Split training and validation data. If training and validation dataset were separately registered to register assets, then select [Split using asset tag], if not, select [Split by percentage] to split data, and then click [Partition].

```{important}
  To split using asset tag, "training" and "test" tags should be linked to the assets. Once after uploading the assets, tag information cannot be modified. See [Examples of tag settings](./index_2.html#examples-tag-settings).
```

### Split data by asset tags

Select "training" tag for training data, "test" tag for validation data.

```{image} media/CreateProject_Label_Split_by_AssetTag.png
:width: 600px
:name: img_create_project_label_split_by_assettag
```

Confirm that the datasets are correctly split in the bar chart, then click [Next].

```{image} media/CreateProject_Label_Split_by_AssetTag_BarGraph.png
:width: 600px
:name: img_create_project_label_split_by_assetag_bargraph
```

### Split data by percentage

Move the slider to define the percentage of datasets for training and validation, then click [OK]. In this example, both "normal" and "anomaly" labels have two tags (classes).  

```{image} media/CreateProject_Label_Split_by_Percentage_BarGraph.png
:width: 600px
:name: img_create_project_label_split_by_percentage_bargraph
```

(preprocessing-time-vibration)=
## Define pre-processing (Time-series/vibration data)

Vibration data is used to treat vibration or noise issues occurring from facility or equipment that are connected to motors or cylinders and its procedure will be repeated periodically and is different from general time-series data. In addition to those of time-series data, following pro-processing options are available for vibration data.

- Sliding window
- Window function
- FFT

### Read CSV

Define which column to read from the input file, and how to modify to use for training.

- Select column:
>Select columns of the input CSV file to use for training.

- Number of lines to output:
- Specify data alighment:
- Select interpolation method:
>Input data can be trimmed left/right or repeat a short input data to have a specific length for training with the combination of these three settings. How interpolation method works depends on data alignment selection.

```{image} media/Csv_stretch.png
:width: 600px
:name: img_csv_stretch
```

- Case 1. Trim the first part of CSV data

```{image} media/Csv_trim_left_focus.png
:width: 600px
:name: img_csv_trim_left_focus
```

>a) Enter the number of lines "original number of lines - width (number of lines) to trim". (409 -> 320)<br>
>b) Select **End** for data alignment.<br>
>c) Select **one of the three (2, 3, 4) options** for interpolation method.

```{image} media/Csv_trim_left_Interpolate.png
:width: 600px
:name: img_csv_trim_left_interpolate
```

- Case 2. Trim the last part of CSV data

```{image} media/Csv_trim_right_focus.png
:width: 600px
:name: img_csv_trim_right_focus
```

>a) Enter the number of lines "original number of lines - width (number of lines) to trim". (409 -> 320)<br>
>b) Select **Beginning** for data alignment.<br>
>c) Select **one of the three (2, 3, 4) options** for interpolation method.

```{image} media/Csv_trim_right_Interpolate.png
:width: 600px
:name: img_csv_trim_right_interpolate
```

- Case 3. Extend a short CSV data by repeating data

>a) Enter the number of lines longer than the original one. (409 -> 1000)<br>
>b) Select **Beginning** for data alignment.<br>
>c) Select **4. Cycle data** for interpolation method.

```{image} media/Csv_repeat.png
:width: 600px
:name: img_csv_repeat
```

```{tip}
  For more complex data file manipulation, CSV format checker is available to extract lines from and to arbitrary line numbers, to thin out the data files.  See [CSV format checker user manual](../csv_format_checker/index_1.html#what-is-csv-format-checker)
```

### Clamp (Optional)

Remove noise by defining upper and lower limit to import data. Each limit should be from -256 to 256.

### Sliding window (Vibration data only)

To set sliding window properly, deep knowledge of data analysis is required. Here, an approach will be explained, how to change each value setting and try.

**Number of data points per sample:**

>1. 512, the initial value.<br>
>2. 256, a half of the initial value.<br>
>3. 1024, a double of the initial value.<br>
>4. total data length.

>Find with which setting the result will be the best, then use that.

**Interval:**

>Range of 1/5 to 1/10 of the "number of data points per sample" setting.

````{caution}
  Using data whose length  (line numbers) is shorter than 512, the initial value of "number of data points per sample" will cause training error. Set the value smaller than the line numbers, about 1/4 of the line numbers for the first attempt.
  
  To define the number of data points per sample and interval properly, please confirm that the whole signal will be covered with bands of green or yellow with some overlapping and not totally grey.

```{image} media/SlidingWindow.png
:width: 600px
:name: img_sliding_window
```

````

### Window function (Vibration data only)

Window function will be used for spectral analysis, digital filtering or audio compression. In MANUFACIA, following two window functions are available. Try both to know which option will bring a better result.

**Gauss：(Dafault)**

>This window function will be out of the list in versions after v2.2.

**Hamming:**

>One of the most commonly used window functions like Hann window below. It was developed as an improved version of Hann, which has higher frequency resolution and narrower dynamic range. However, it does not take zero at the both ends of the range which differs from Hann characteristics.

**Hann:**

>A specific feature of this function is that it will smoothly decrease toward the end of the range, and it is assured that it takes 0 at both ends of the range. In the spectrum of FFT of a sine wave there will be clear peaks compared to the one with rectangular window function.

(normalize-standardize)=
### Normalize/Standardize

**Normalize: (Default)**
>Normalization can minimize the variations in data from file to file.

>It converts the data to have the value range from -1.0 to 1.0 by dividing each by the maximum absolute value of the input data.

>If the datasets have outliers, normalize will be affected very much by them. Be sure that the datasets are without outliers to use this setting.

**Standardize:**
>With standardization it can take into account the mean and variance of the input data. It may bring better results than normalization for some cases.

>For image data, it can make the image stand out if it is rather monotonous with little variation across the image.

>It converts the input data to have the mean value of 0.0 and standard deviation of 1.0 in automatic setting. If the mean value μ and standard deviation σ are known, it can be given in manual setting. The converted value n’ is described as below with the original value n.

```
  n’ = (n - μ) / σ
``` 

### FFT (Vibration data only)

FFT is the abbreviation of **F**ast **F**ourier **T**ransform and in MANUFACIA it converts a signal from time-series domain to a representation in the frequency domain. It will be used in the combination of a Window function.

### Interpolate (Optional)

Define how to thinner or compensate data. It is recommended to switch off to have a better result with the model to being with.

If data with the length of 1000 will be set to 200, data will be thinned to the 1/5 of its original size. With less data it can make the training faster. (Initial value: 100)

- linear：Create new points between available two points by interpolating linearly, depending upon the distance between points.(Default)

>![](media/interpolate_linear.png)

- nearest：New elements to be created will have the value of the nearest neighboring points.

>![](media/interpolate_nearest.png)

(binarize)=
### Binarize (Optional)

It converts input data to zero or one with a threshold value between 0.0 and 1.0. It can be used for the data whose feature value will be displayed in easy-to-understand manner, but for time-series/vibration data in general, it may not be helpful.

(batch)=
### Batch

Batch size is the number of datasets that is used for one training. For deep learning, the algorithm gradient descent is used to minimize the loss function and to properly weight parameters. It is due to the resource not possible to pass all datasets at once for training, divided small subsets of datasets will be used. 

If the batch size is relatively small, the influence by one single data will be bigger especially if there is a distinctive or non-average data. If the batch size is relatively big, it will be affected less. However, it cannot be said in general, that the bigger the batch size is, the better it will be. The value of 2^n will be generally used. (Initial value: 4)

The upper limit of batch size depends on the number of assets and also RAM. See also [Training loss curve](./index_4.html#training-loss-curve).

## Define pre-processing (Image data)

### Load image

**Image size:**

>Resize the input image data to the given size. Height and width will be the same. Initial setting is 224 x 224. Defining smaller size than the initial one for training, pre-trained network will not be used for supervised learning, which higher accuracy can be expected. Using bigger image size will take time for training. **The upper limit of the image size is 512 x 512**.

**Display method:**

This display method will be applied, **if the input image file width and height are not the same**.

```{image} media/image_org.png
:width: 500px
```

- distort：Fit the output image to the image size without keeping the aspect ratio.

```{image} media/display_distort.png
:width: 500px
```

- cover：Fit the shorter side of width and height to the output image size by keeping the aspect ratio and cut the center part.

```{image} media/display_cover.png
:width: 500px
```

- contain：Fit the longer side of with and height to the output image size by keeping the aspect ratio and fill black outside the original image.

```{image} media/display_contain.png
:width: 500px
```

**Algorithm:**

- bilinear：Linear interpolation in both X and Y axes.
- nearest-neighbor：Image will be jaggy but it will make the process faster. It is not easy to tell about the accuracy in comparison with bilinear.

### Gray scale/Channel selection

Select color channel of the input image data.

- Gray scale: select if image will be processed in black and white.

```{caution}
  Selecting this option, only color channel "R" will be marked.
```

- Channel selection: select arbitrary color channel from RGB. (Initial setting)

### Crop (Optional)

Crop input image data to the given size (quadrat) from the center. It can be used to remove everything around the interesting item in the center.

### Normalize/Standardize

See [Time-series/vibration data -> Normalize/Standardize](#normalize-standardize) for detail. It is recommended to adjust the value to have better feature value while watching the preview.
See 
### Binarize (Optional)

See [Time-series/vibration data -> Binarize](#binarize) for detail. There is no preview function for this option, turn OFF this option.

### Batch

See [Time-series/vibration data -> Batch](#batch) for detail.

(train-models)=
## Train models

Move the slider to set the number of AI models to create up to 200.

In supervised learning with image data, models can be created by using pre-trained network, ResNet50 and MobileNetV2. They will be used in preference to existing neural networks (CNN) and CNN will not be used to create the model by setting the number to 2, then each of ResNet50 and MobileNetV2 will be used to train.

```{tip}
As same as CNN network, models of ResNet50 and MobileNetV2 can also fail to train. These models are not always available.
```

Click [Start training] after setting number of models to create.

```{image} media/CreateProject_Set_NumModels.png
:width: 800px
:name: img_set_num_models
```

```{tip}
Pressing the [Start training] button twice, after pressing it and UI does not seem to change, it may be displayed that training error occurred. However, as the training goes further and when it is possible to properly display the training progress, a circular graph with the progress will appear.
```

Once a training has started, training progress will be displayed. The training can be paused by clicking [Pause].

```{tip}
This "Pause" function will not interrupt batch jobs already started training, but will not allow to start awaiting batch jobs. It depends on how many models are to be created, it can happen that the training cannot be paused by pressing [Pause] when the training is about to finish, at around 80% if the number of models are little, 90% or more if many.
```

By clicking [Check the trained model], model list can be seen with the models already available then.

```{image} media/CreateProject_Training_Started.png
:width: 800px
:name: img_training_started
```

Click [Train/Predict] on the navigation bar to go back to the training progress from the model list.

```{image} media/CreateProject_Training_SeeProgress2.png
:width: 800px
:name: img_training_see_progress
```

```{tip}
Training of models is done in parallel process, the sum of training duration of each model will not be the total training duration.
```

When the training has finished, detail results will be viewable.

```{image} media/CreateProject_Training_Completed.png
:width: 800px
:name: img_training_completed
```

```{tip}
To show the training results for the first time, it may take some time to display the model list to calculate threshold or distance especially if there are many validation datasets over 1000.
```