2. Instructions

2.1. List of features

This tool has a feature to convert input CSV files to the one which MANUFACIA can read. Additionally, it can also process input files described as below.

Feature	Description
Show version number	Shows the version number of the tool.
Check format	Checks CSV file format and logs the result.
Correct format	Based on the error information, it will automatically correct the format so that MANUFACIA can read. It only logs where there is an error.
Process data: Divide file	Divides a file into several files with the size of specified line numbers.
Process data: Extract rows	Extract the block of rows specified.
Process data: Extract columns	Extract the block of columns specified.
Process data: Thin out data	Thin out the data every n-th of the specified number within data area.
Process data: Swap rows and columns	Swap row and column.
Process data: Replace header line	Add/overwrite CSV file header.

The options below are the ones to be used together with the options above, that do not have any effect by itself.

Additional feature	Description
Input data path	Specifies file path where the input data files are.
Rebuild time data	Rebuild time data while processing data.
Force data processing	Ignores errors and force process data files.

Important

Correct format first and then process data, because processing data will be only done for CSV files in proper format.
Log file or processed CSV files will be written in the same directory where the original CSV files are.
To process data, only one of the features above can be done. If there are multiple options specified in the command line, only the first option will be taken for the data processing.

2.2. Feature detail

2.2.1. Show version information of the tool

  csv_checker -i

Displays the following information about this tool.

Version number
Default setting values

2.2.2. Format check

  csv_checker

Verifies CSV files in the current directory. To check CSV files in arbitrary disk location, use -poption.

The result will be logged in the following style.

Format check result
Result	Description
OK (correct)	・Header comments are only in the first line of the file. ・Values are separated by comma (,). ・File encode is UTF-8.
Error1	There are comments in multiple lines.
Error2	Header comments are not the same in different files.
Error3	Delimiter is semicolon (;).
Error4	Delimiter is white space or tab.
Error5	Data are enclosed by double (“”)/single(‘’) quotation marks.
Error6	There are white space before/after the value.
Error7	Double-byte characters are used for the value.
Error8	Non-numeric characters are used.
Error9	Value is NULL or white space.
Error10	BOM (EF,BB,BF) is added to the file.
Error11	File is not encoded in UTF-8.
Error12	Slash (/) is used.
undefined	Other errors; no header line, there is no data lines etc.

2.2.3. Correct format

  csv_checker -c

It will correct errors by running format check so that MANUFACIA can read. The errors will be corrected in the following way.

Correction method due to errors
Error number	Correction method
Error1	Takes only the last comment line and removes the rest.
Error2	No correction.
Error3	Replaces the delimiter with comma (,). (Only for header line.)
Error4	Replaces the delimiter with comma (,). (Only for header line.)
Error5	Removes double/single quotation marks.
Error6	Removes white space.
Error7	Converts to one-byte character.
Error8	No correction.
Error9	Inserts 0.
Error10	Removes BOM (three bytes).
Error11	Converts to UTF-8 encode. (Only if the input file is in SHIFT-JIS encode.)
Error12	Removes slash.
undefined	No correction.

2.2.4. Process data

The following features will be done by adding corresponding option to call csv format checker. Multiple options at the same time will not be accepted. Both row and column number start from 1.

2.2.4.1. Divide file

  csv_checker -d <num>

Divides the data area of the file so that each file will have at most the number of lines specified with <num> and add header line at the first line.

2.2.4.2. Extract rows

  csv_checker -e <start> <end>

Extracts the data area block starts from <start> and ends at <end> from the original file and create a new one. Start line number <start> should not be 1 to exclude header line. If end line number <end> is not given, then it will extract until the end of the file. Header line will be copied to the first line of all the files.

2.2.4.3. Extract columns

  csv_checker -r <start> <end>

Extracts the data area block starts from <start> and ends at <end> from the original file and create a new one. If the specified numbers are above the number of columns available in the CSV file, error will be logged.

2.2.4.4. Thin out data

  csv_checker -t <interval>

Think out data lines specified by <interval> from the data area top (2nd line) until the end of the file.

Example of thinning out data (-t option)

If the interval is three, the lines in the parentheses () will be removed. {header line, 1, 2, (3), 4, 5, (6), 7, 8, (9), …}

2.2.4.5. Swap rows and columns

  csv_checker -s

Swaps the rows and columns of the data. Header line will also be included.

2.2.4.6. Replace header line

  csv_checker -h

If there is header.txt in the current directory, it will replace header line of existing CSV files with the string specified in header.txt. The number of columns in the header.txt should be equal to the one of data area. Only the first line will be used if there are multiple lines defined in the header.txt.

In the following cases, header line will not be changed.

No header.txt in the current directory.
No header line at the first line of the file.
No header line entry in header.txt.

2.2.5. Additional options

These additional options cannot be used alone but together with other process options.

2.2.5.1. Specify input data path

  csv_checker ........ -p <path>

Specify relative or absolute path in <path>. If the path name contains white space, the whole path should be enclosed by double quotation marks. (“”)

2.2.5.2. Rebuild time data

  csv_checker ........ -o

This option will be used together with dividing file option (-d), extracting rows options (-t) or thinning out data option (-e).

Time data of the output data file will be rebuild incrementally from 1 instead of using the time information of the corresponding line.

This feature assumes that time data is in the first columns. If it is not, the conversion will not properly work.

2.2.5.3. Force data processing

  csv_checker ........ -f

If there are errors detected by format check such as Error2, Error8, or undefined, processing data is usually not possible without correcting the format first. This option forces to process the data by ignoring the errors. In that case, the area where there is an error will not be modified and keep the data as it is.