ai-content-maker/.venv/Lib/site-packages/SudachiPy-0.6.8.dist-info/METADATA

Metadata-Version: 2.1
Name: SudachiPy
Version: 0.6.8
Summary: Python version of Sudachi, the Japanese Morphological Analyzer
Home-page: https://github.com/WorksApplications/sudachi.rs/tree/develop/python
Author: Works Applications
Author-email: sudachi@worksap.co.jp
License: Apache-2.0
Description-Content-Type: text/markdown
Provides-Extra: tests
Requires-Dist: tokenizers ; extra == 'tests'
Requires-Dist: sudachidict-core ; extra == 'tests'

# SudachiPy
[![PyPi version](https://img.shields.io/pypi/v/sudachipy.svg)](https://pypi.python.org/pypi/sudachipy/)
[![](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[Documentation](https://worksapplications.github.io/sudachi.rs/python)

SudachiPy is a Python version of [Sudachi](https://github.com/WorksApplications/Sudachi), a Japanese morphological analyzer.

This is not a pure Python implementation, but bindings for the
[Sudachi.rs](https://github.com/WorksApplications/sudachi.rs).

## Binary wheels

We provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture.
x86 32-bit architecture is not supported and is not tested.
MacOS source builds seem to work on ARM-based (Aarch64) Macs,
but this architecture also is not tested and require installing Rust toolchain and Cargo.

More information [here](https://worksapplications.github.io/sudachi.rs/python/topics/wheels.html).

## TL;DR

```bash
$ pip install sudachipy sudachidict_core

$ echo "高輪ゲートウェイ駅" | sudachipy
高輪ゲートウェイ駅	名詞,固有名詞,一般,*,*,*	高輪ゲートウェイ駅
EOS

$ echo "高輪ゲートウェイ駅" | sudachipy -m A
高輪	名詞,固有名詞,地名,一般,*,*	高輪
ゲートウェイ	名詞,普通名詞,一般,*,*,*	ゲートウェー
駅	名詞,普通名詞,一般,*,*,*	駅
EOS

$ echo "空缶空罐空きカン" | sudachipy -a
空缶	名詞,普通名詞,一般,*,*,*	空き缶	空缶	アキカン	0
空罐	名詞,普通名詞,一般,*,*,*	空き缶	空罐	アキカン	0
空きカン	名詞,普通名詞,一般,*,*,*	空き缶	空きカン	アキカン	0
EOS
```

```python
from sudachipy import Dictionary, SplitMode

tokenizer = Dictionary().create()

morphemes = tokenizer.tokenize("国会議事堂前駅")
print(morphemes[0].surface())  # '国会議事堂前駅'
print(morphemes[0].reading_form())  # 'コッカイギジドウマエエキ'
print(morphemes[0].part_of_speech())  # ['名詞', '固有名詞', '一般', '*', '*', '*']

morphemes = tokenizer.tokenize("国会議事堂前駅", SplitMode.A)
print([m.surface() for m in morphemes])  # ['国会', '議事', '堂', '前', '駅']
```


## Setup

You need SudachiPy and a dictionary.

### Step 1. Install SudachiPy

```bash
$ pip install sudachipy
```

### Step 2. Get a Dictionary

You can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).

```bash
$ pip install sudachidict_core
```

Alternatively, you can choose other dictionary editions. See [this section](#dictionary-edition) for the detail.


## Usage: As a command

There is a CLI command `sudachipy`.

```bash
$ echo "外国人参政権" | sudachipy
外国人参政権	名詞,普通名詞,一般,*,*,*	外国人参政権
EOS
$ echo "外国人参政権" | sudachipy -m A
外国	名詞,普通名詞,一般,*,*,*	外国
人	接尾辞,名詞的,一般,*,*,*	人
参政	名詞,普通名詞,一般,*,*,*	参政
権	接尾辞,名詞的,一般,*,*,*	権
EOS
```

```bash
$ sudachipy tokenize -h
usage: sudachipy tokenize [-h] [-r file] [-m {A,B,C}] [-o file] [-s string]
                          [-a] [-d] [-v]
                          [file [file ...]]

Tokenize Text

positional arguments:
  file           text written in utf-8

optional arguments:
  -h, --help     show this help message and exit
  -r file        the setting file in JSON format
  -m {A,B,C}     the mode of splitting
  -o file        the output file
  -s string      sudachidict type
  -a             print all of the fields
  -d             print the debug information
  -v, --version  print sudachipy version
```

__Note: The Debug option (`-d`) is disabled in version 0.6.0.__


### Output

Columns are tab separated.

- Surface
- Part-of-Speech Tags (comma separated)
- Normalized Form

When you add the `-a` option, it additionally outputs

- Dictionary Form
- Reading Form
- Dictionary ID
  - `0` for the system dictionary
  - `1` and above for the [user dictionaries](#user-dictionary)
  - `-1` if a word is Out-of-Vocabulary (not in the dictionary)
- Synonym group IDs
- `(OOV)` if a word is Out-of-Vocabulary (not in the dictionary)

```bash
$ echo "外国人参政権" | sudachipy -a
外国人参政権	名詞,普通名詞,一般,*,*,*	外国人参政権	外国人参政権	ガイコクジンサンセイケン	0	[]
EOS
```

```bash
echo "阿quei" | sudachipy -a
阿	名詞,普通名詞,一般,*,*,*	阿	阿		-1	[]	(OOV)
quei	名詞,普通名詞,一般,*,*,*	quei	quei		-1	[]	(OOV)
EOS
```


## Usage: As a Python package

### API

See [API reference page](https://worksapplications.github.io/sudachi.rs/python/).


### Example

```python
from sudachipy import Dictionary, SplitMode

tokenizer_obj = Dictionary().create()
```

```python
# Multi-granular Tokenization

# SplitMode.C is the default mode
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.C)]
# => ['国家公務員']

[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.B)]
# => ['国家', '公務員']

[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.A)]
# => ['国家', '公務', '員']
```

```python
# Morpheme information

m = tokenizer_obj.tokenize("食べ")[0]

m.surface() # => '食べ'
m.dictionary_form() # => '食べる'
m.reading_form() # => 'タベ'
m.part_of_speech() # => ['動詞', '一般', '*', '*', '下一段-バ行', '連用形-一般']
```

```python
# Normalization

tokenizer_obj.tokenize("附属", mode)[0].normalized_form()
# => '付属'
tokenizer_obj.tokenize("SUMMER", mode)[0].normalized_form()
# => 'サマー'
tokenizer_obj.tokenize("シュミレーション", mode)[0].normalized_form()
# => 'シミュレーション'
```

(With `20210802` `core` dictionary. The results may change when you use other versions)


## Dictionary Edition

There are three editions of Sudachi Dictionary, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.

SudachiPy uses `sudachidict_core` by default.

Dictionaries are installed as Python packages `sudachidict_small`, `sudachidict_core`, and `sudachidict_full`.

* [SudachiDict-small · PyPI](https://pypi.org/project/SudachiDict-small/)
* [SudachiDict-core · PyPI](https://pypi.org/project/SudachiDict-core/)
* [SudachiDict-full · PyPI](https://pypi.org/project/SudachiDict-full/)

The dictionary files are not in the package itself, but it is downloaded upon installation.

### Dictionary option: command line

You can specify the dictionary with the tokenize option `-s`.

```bash
$ pip install sudachidict_small
$ echo "外国人参政権" | sudachipy -s small
```

```bash
$ pip install sudachidict_full
$ echo "外国人参政権" | sudachipy -s full
```


### Dictionary option: Python package

You can specify the dictionary with the `Dicionary()` argument; `config_path` or `dict_type`.

```python
class Dictionary(config_path=None, resource_dir=None, dict_type=None)
```

1. `config_path`
    * You can specify the file path to the setting file with `config_path` (See [Dictionary in The Setting File](#Dictionary in The Setting File) for the detail).
    * If the dictionary file is specified in the setting file as `systemDict`, SudachiPy will use the dictionary.
2. `dict_type`
    * You can also specify the dictionary type with `dict_type`.
    * The available arguments are `small`, `core`, or `full`.
    * If different dictionaries are specified with `config_path` and `dict_type`, **a dictionary defined `dict_type` overrides** those defined in the config path.

```python
from sudachipy import Dictionary

# default: sudachidict_core
tokenizer_obj = Dictionary().create()

# The dictionary given by the `systemDict` key in the config file (/path/to/sudachi.json) will be used
tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json").create()

# The dictionary specified by `dict_type` will be set.
tokenizer_obj = Dictionary(dict_type="core").create()  # sudachidict_core (same as default)
tokenizer_obj = Dictionary(dict_type="small").create()  # sudachidict_small
tokenizer_obj = Dictionary(dict_type="full").create()  # sudachidict_full

# The dictionary specified by `dict_type` overrides those defined in the config path.
# In the following code, `sudachidict_full` will be used regardless of a dictionary defined in the config file.
tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json", dict_type="full").create()
```


### Dictionary in The Setting File

Alternatively, if the dictionary file is specified in the setting file, `sudachi.json`, SudachiPy will use that file.

```js
{
    "systemDict" : "relative/path/from/resourceDir/to/system.dic",
    ...
}
```

The default setting file is [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json). You can specify your `sudachi.json` with the `-r` option.

```bash
$ sudachipy -r path/to/sudachi.json
```


## User Dictionary

To use a user dictionary, `user.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json) to anywhere you like, and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.

```js
{
    "userDict" : ["relative/path/to/user.dic"],
    ...
}
```

Then specify your `sudachi.json` with the `-r` option.

```bash
$ sudachipy -r path/to/sudachi.json
```


You can build a user dictionary with the subcommand `ubuild`.


```bash
$ sudachipy ubuild -h
usage: sudachipy ubuild [-h] [-d string] [-o file] [-s file] file [file ...]

Build User Dictionary

positional arguments:
  file        source files with CSV format (one or more)

optional arguments:
  -h, --help  show this help message and exit
  -d string   description comment to be embedded on dictionary
  -o file     output file (default: user.dic)
  -s file     system dictionary path (default: system core dictionary path)
```

About the dictionary file format, please refer to [this document](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md) (written in Japanese, English version is not available yet).


## Customized System Dictionary

```bash
$ sudachipy build -h
usage: sudachipy build [-h] [-o file] [-d string] -m file file [file ...]

Build Sudachi Dictionary

positional arguments:
  file        source files with CSV format (one of more)

optional arguments:
  -h, --help  show this help message and exit
  -o file     output file (default: system.dic)
  -d string   description comment to be embedded on dictionary

required named arguments:
  -m file     connection matrix file with MeCab's matrix.def format
```

To use your customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.

```js
{
    "systemDict" : "relative/path/to/system.dic",
    ...
}
```

Then specify your `sudachi.json` with the `-r` option.

```bash
$ sudachipy -r path/to/sudachi.json
```


## For Developers

### Build from source

#### Install sdist via pip

1. Install python module `setuptools` and `setuptools-rust`.
2. Run `./build-sdist.sh` in `python` dir.
    - source distribution will be generated under `python/dist/` dir.
3. Install it via pip: `pip install ./python/dist/SudachiPy-[version].tar.gz`


#### Install develop build

1. Install python module `setuptools` and `setuptools-rust`.
2. Run `python3 setup.py develop`.
    - `develop` will create a debug build, while `install` will create a release build.
3. Now you can import the module by `import sudachipy`.

ref: [setuptools-rust](https://github.com/PyO3/setuptools-rust)


### Test

Run `build_and_test.sh` to run the tests.


## Contact

Sudachi and SudachiPy are developed by [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/).

Open an issue, or come to our Slack workspace for questions and discussion.

https://sudachi-dev.slack.com/ (Get invitation [here](https://join.slack.com/t/sudachi-dev/shared_invite/enQtMzg2NTI2NjYxNTUyLTMyYmNkZWQ0Y2E5NmQxMTI3ZGM3NDU0NzU4NGE1Y2UwYTVmNTViYjJmNDI0MWZiYTg4ODNmMzgxYTQ3ZmI2OWU))

Enjoy tokenization!
first commit 2024-05-03 04:18:51 +03:00			`Metadata-Version: 2.1`
			`Name: SudachiPy`
			`Version: 0.6.8`
			`Summary: Python version of Sudachi, the Japanese Morphological Analyzer`
			`Home-page: https://github.com/WorksApplications/sudachi.rs/tree/develop/python`
			`Author: Works Applications`
			`Author-email: sudachi@worksap.co.jp`
			`License: Apache-2.0`
			`Description-Content-Type: text/markdown`
			`Provides-Extra: tests`
			`Requires-Dist: tokenizers ; extra == 'tests'`
			`Requires-Dist: sudachidict-core ; extra == 'tests'`

			`# SudachiPy`
			`[![PyPi version](https://img.shields.io/pypi/v/sudachipy.svg)](https://pypi.python.org/pypi/sudachipy/)`
			`[![](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/release/python-360/)`
			`[Documentation](https://worksapplications.github.io/sudachi.rs/python)`

			`SudachiPy is a Python version of [Sudachi](https://github.com/WorksApplications/Sudachi), a Japanese morphological analyzer.`

			`This is not a pure Python implementation, but bindings for the`
			`[Sudachi.rs](https://github.com/WorksApplications/sudachi.rs).`

			`## Binary wheels`

			`We provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture.`
			`x86 32-bit architecture is not supported and is not tested.`
			`MacOS source builds seem to work on ARM-based (Aarch64) Macs,`
			`but this architecture also is not tested and require installing Rust toolchain and Cargo.`

			`More information [here](https://worksapplications.github.io/sudachi.rs/python/topics/wheels.html).`

			`## TL;DR`

			```bash
			`$ pip install sudachipy sudachidict_core`

			`$ echo "高輪ゲートウェイ駅" \| sudachipy`
			`高輪ゲートウェイ駅名詞,固有名詞,一般,,,* 高輪ゲートウェイ駅`
			`EOS`

			`$ echo "高輪ゲートウェイ駅" \| sudachipy -m A`
			`高輪名詞,固有名詞,地名,一般,, 高輪`
			`ゲートウェイ名詞,普通名詞,一般,,,* ゲートウェー`
			`駅名詞,普通名詞,一般,,,* 駅`
			`EOS`

			`$ echo "空缶空罐空きカン" \| sudachipy -a`
			`空缶名詞,普通名詞,一般,,,* 空き缶空缶アキカン 0`
			`空罐名詞,普通名詞,一般,,,* 空き缶空罐アキカン 0`
			`空きカン名詞,普通名詞,一般,,,* 空き缶空きカンアキカン 0`
			`EOS`
			```

			```python
			`from sudachipy import Dictionary, SplitMode`

			`tokenizer = Dictionary().create()`

			`morphemes = tokenizer.tokenize("国会議事堂前駅")`
			`print(morphemes[0].surface()) # '国会議事堂前駅'`
			`print(morphemes[0].reading_form()) # 'コッカイギジドウマエエキ'`
			`print(morphemes[0].part_of_speech()) # ['名詞', '固有名詞', '一般', '', '', '*']`

			`morphemes = tokenizer.tokenize("国会議事堂前駅", SplitMode.A)`
			`print([m.surface() for m in morphemes]) # ['国会', '議事', '堂', '前', '駅']`
			```


			`## Setup`

			`You need SudachiPy and a dictionary.`

			`### Step 1. Install SudachiPy`

			```bash
			`$ pip install sudachipy`
			```

			`### Step 2. Get a Dictionary`

			You can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).

			```bash
			`$ pip install sudachidict_core`
			```

			`Alternatively, you can choose other dictionary editions. See [this section](#dictionary-edition) for the detail.`


			`## Usage: As a command`

			There is a CLI command `sudachipy`.

			```bash
			`$ echo "外国人参政権" \| sudachipy`
			`外国人参政権名詞,普通名詞,一般,,,* 外国人参政権`
			`EOS`
			`$ echo "外国人参政権" \| sudachipy -m A`
			`外国名詞,普通名詞,一般,,,* 外国`
			`人接尾辞,名詞的,一般,,,* 人`
			`参政名詞,普通名詞,一般,,,* 参政`
			`権接尾辞,名詞的,一般,,,* 権`
			`EOS`
			```

			```bash
			`$ sudachipy tokenize -h`
			`usage: sudachipy tokenize [-h] [-r file] [-m {A,B,C}] [-o file] [-s string]`
			`[-a] [-d] [-v]`
			`[file [file ...]]`

			`Tokenize Text`

			`positional arguments:`
			`file text written in utf-8`

			`optional arguments:`
			`-h, --help show this help message and exit`
			`-r file the setting file in JSON format`
			`-m {A,B,C} the mode of splitting`
			`-o file the output file`
			`-s string sudachidict type`
			`-a print all of the fields`
			`-d print the debug information`
			`-v, --version print sudachipy version`
			```

			__Note: The Debug option (`-d`) is disabled in version 0.6.0.__


			`### Output`

			`Columns are tab separated.`

			`- Surface`
			`- Part-of-Speech Tags (comma separated)`
			`- Normalized Form`

			When you add the `-a` option, it additionally outputs

			`- Dictionary Form`
			`- Reading Form`
			`- Dictionary ID`
			- `0` for the system dictionary
			- `1` and above for the [user dictionaries](#user-dictionary)
			- `-1` if a word is Out-of-Vocabulary (not in the dictionary)
			`- Synonym group IDs`
			- `(OOV)` if a word is Out-of-Vocabulary (not in the dictionary)

			```bash
			`$ echo "外国人参政権" \| sudachipy -a`
			`外国人参政権名詞,普通名詞,一般,,,* 外国人参政権外国人参政権ガイコクジンサンセイケン 0 []`
			`EOS`
			```

			```bash
			`echo "阿quei" \| sudachipy -a`
			`阿名詞,普通名詞,一般,,,* 阿阿 -1 [] (OOV)`
			`quei 名詞,普通名詞,一般,,,* quei quei -1 [] (OOV)`
			`EOS`
			```


			`## Usage: As a Python package`

			`### API`

			`See [API reference page](https://worksapplications.github.io/sudachi.rs/python/).`


			`### Example`

			```python
			`from sudachipy import Dictionary, SplitMode`

			`tokenizer_obj = Dictionary().create()`
			```

			```python
			`# Multi-granular Tokenization`

			`# SplitMode.C is the default mode`
			`[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.C)]`
			`# => ['国家公務員']`

			`[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.B)]`
			`# => ['国家', '公務員']`

			`[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.A)]`
			`# => ['国家', '公務', '員']`
			```

			```python
			`# Morpheme information`

			`m = tokenizer_obj.tokenize("食べ")[0]`

			`m.surface() # => '食べ'`
			`m.dictionary_form() # => '食べる'`
			`m.reading_form() # => 'タベ'`
			`m.part_of_speech() # => ['動詞', '一般', '', '', '下一段-バ行', '連用形-一般']`
			```

			```python
			`# Normalization`

			`tokenizer_obj.tokenize("附属", mode)[0].normalized_form()`
			`# => '付属'`
			`tokenizer_obj.tokenize("SUMMER", mode)[0].normalized_form()`
			`# => 'サマー'`
			`tokenizer_obj.tokenize("シュミレーション", mode)[0].normalized_form()`
			`# => 'シミュレーション'`
			```

			(With `20210802` `core` dictionary. The results may change when you use other versions)


			`## Dictionary Edition`

			There are three editions of Sudachi Dictionary, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.

			SudachiPy uses `sudachidict_core` by default.

			Dictionaries are installed as Python packages `sudachidict_small`, `sudachidict_core`, and `sudachidict_full`.

			`* [SudachiDict-small · PyPI](https://pypi.org/project/SudachiDict-small/)`
			`* [SudachiDict-core · PyPI](https://pypi.org/project/SudachiDict-core/)`
			`* [SudachiDict-full · PyPI](https://pypi.org/project/SudachiDict-full/)`

			`The dictionary files are not in the package itself, but it is downloaded upon installation.`

			`### Dictionary option: command line`

			You can specify the dictionary with the tokenize option `-s`.

			```bash
			`$ pip install sudachidict_small`
			`$ echo "外国人参政権" \| sudachipy -s small`
			```

			```bash
			`$ pip install sudachidict_full`
			`$ echo "外国人参政権" \| sudachipy -s full`
			```


			`### Dictionary option: Python package`

			You can specify the dictionary with the `Dicionary()` argument; `config_path` or `dict_type`.

			```python
			`class Dictionary(config_path=None, resource_dir=None, dict_type=None)`
			```

			1. `config_path`
			* You can specify the file path to the setting file with `config_path` (See [Dictionary in The Setting File](#Dictionary in The Setting File) for the detail).
			* If the dictionary file is specified in the setting file as `systemDict`, SudachiPy will use the dictionary.
			2. `dict_type`
			* You can also specify the dictionary type with `dict_type`.
			* The available arguments are `small`, `core`, or `full`.
			* If different dictionaries are specified with `config_path` and `dict_type`, a dictionary defined `dict_type` overrides those defined in the config path.

			```python
			`from sudachipy import Dictionary`

			`# default: sudachidict_core`
			`tokenizer_obj = Dictionary().create()`

			# The dictionary given by the `systemDict` key in the config file (/path/to/sudachi.json) will be used
			`tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json").create()`

			# The dictionary specified by `dict_type` will be set.
			`tokenizer_obj = Dictionary(dict_type="core").create() # sudachidict_core (same as default)`
			`tokenizer_obj = Dictionary(dict_type="small").create() # sudachidict_small`
			`tokenizer_obj = Dictionary(dict_type="full").create() # sudachidict_full`

			# The dictionary specified by `dict_type` overrides those defined in the config path.
			# In the following code, `sudachidict_full` will be used regardless of a dictionary defined in the config file.
			`tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json", dict_type="full").create()`
			```


			`### Dictionary in The Setting File`

			Alternatively, if the dictionary file is specified in the setting file, `sudachi.json`, SudachiPy will use that file.

			```js
			`{`
			`"systemDict" : "relative/path/from/resourceDir/to/system.dic",`
			`...`
			`}`
			```

			The default setting file is [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json). You can specify your `sudachi.json` with the `-r` option.

			```bash
			`$ sudachipy -r path/to/sudachi.json`
			```


			`## User Dictionary`

			To use a user dictionary, `user.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json) to anywhere you like, and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.

			```js
			`{`
			`"userDict" : ["relative/path/to/user.dic"],`
			`...`
			`}`
			```

			Then specify your `sudachi.json` with the `-r` option.

			```bash
			`$ sudachipy -r path/to/sudachi.json`
			```


			You can build a user dictionary with the subcommand `ubuild`.


			```bash
			`$ sudachipy ubuild -h`
			`usage: sudachipy ubuild [-h] [-d string] [-o file] [-s file] file [file ...]`

			`Build User Dictionary`

			`positional arguments:`
			`file source files with CSV format (one or more)`

			`optional arguments:`
			`-h, --help show this help message and exit`
			`-d string description comment to be embedded on dictionary`
			`-o file output file (default: user.dic)`
			`-s file system dictionary path (default: system core dictionary path)`
			```

			`About the dictionary file format, please refer to [this document](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md) (written in Japanese, English version is not available yet).`


			`## Customized System Dictionary`

			```bash
			`$ sudachipy build -h`
			`usage: sudachipy build [-h] [-o file] [-d string] -m file file [file ...]`

			`Build Sudachi Dictionary`

			`positional arguments:`
			`file source files with CSV format (one of more)`

			`optional arguments:`
			`-h, --help show this help message and exit`
			`-o file output file (default: system.dic)`
			`-d string description comment to be embedded on dictionary`

			`required named arguments:`
			`-m file connection matrix file with MeCab's matrix.def format`
			```

			To use your customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachipy/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.

			```js
			`{`
			`"systemDict" : "relative/path/to/system.dic",`
			`...`
			`}`
			```

			Then specify your `sudachi.json` with the `-r` option.

			```bash
			`$ sudachipy -r path/to/sudachi.json`
			```


			`## For Developers`

			`### Build from source`

			`#### Install sdist via pip`

			1. Install python module `setuptools` and `setuptools-rust`.
			2. Run `./build-sdist.sh` in `python` dir.
			- source distribution will be generated under `python/dist/` dir.
			3. Install it via pip: `pip install ./python/dist/SudachiPy-[version].tar.gz`


			`#### Install develop build`

			1. Install python module `setuptools` and `setuptools-rust`.
			2. Run `python3 setup.py develop`.
			- `develop` will create a debug build, while `install` will create a release build.
			3. Now you can import the module by `import sudachipy`.

			`ref: [setuptools-rust](https://github.com/PyO3/setuptools-rust)`


			`### Test`

			Run `build_and_test.sh` to run the tests.


			`## Contact`

			`Sudachi and SudachiPy are developed by [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/).`

			`Open an issue, or come to our Slack workspace for questions and discussion.`

			`https://sudachi-dev.slack.com/ (Get invitation [here](https://join.slack.com/t/sudachi-dev/shared_invite/enQtMzg2NTI2NjYxNTUyLTMyYmNkZWQ0Y2E5NmQxMTI3ZGM3NDU0NzU4NGE1Y2UwYTVmNTViYjJmNDI0MWZiYTg4ODNmMzgxYTQ3ZmI2OWU))`

			`Enjoy tokenization!`