NAME
subst - Greple module for text search and substitution
VERSION
Version 2.35
SYNOPSIS
greple -Msubst --dict dictionary [ options ]
Dictionary:
--dict dictionary file
--dictdata dictionary data
--dictpair dictionary entry pair
Check:
--check=[ng,ok,any,outstand,all,none]
--select=N
--linefold
--stat
--with-stat
--stat-style=[default,dict]
--stat-item={match,expect,number,ok,ng,dict}=[0,1]
--subst
--[no-]warn-overlap
--[no-]warn-include
File Update:
--diff
--diffcmd command
--create
--replace
--overwrite
DESCRIPTION
This greple module supports check and substitution of text files based on dictionary data.
Dictionary file is given by --dict option and each line contains matching pattern and expected string pairs.
greple -Msubst --dict DICT
If the dictionary file contains following data:
colou?r color
cent(er|re) center
above command finds the first pattern which does not match the second string, that is "colour" and "centre" in this case.
In practice, the last two elements of a space-separated string are treated as a pattern and a replacement string, respectively.
Dictionary data can also be written separated by //
as follows:
colou?r // color
cent(er|re) // center
There must be spaces before and after the //
. In this format, strings before and after it are treated as a pattern and replacement string, rather than last two element. Leading spaces and spaces before and after //
are ignored, but all other whitespace is valid.
You can use same file by greple's -f option and string after //
is ignored as a comment in that case.
greple -f DICT ...
Option --dictdata can be used to provide dictionary data in the command line.
greple -Msubst \
--dictdata $'colou?r color\ncent(er|re) center\n'
Dictionary entry starting with a sharp sign (#
) is a comment and ignored.
Option --dictpair can be used to provide raw dictionary entries in the command line. In this case, no processing is done regarding whitespace or comments.
greple -Msubst \
--dictpair 'colou?r' color \
--dictpair 'cent(er|re)' center
Overlapped pattern
When the matched string is same or shorter than previously matched string by another pattern, it is simply ignored (--no-warn-include by default). So, if you have to declare conflicted patterns, place the longer pattern earlier.
If the matched string overlaps with previously matched string, it is warned (--warn-overlap by default) and ignored.
Terminal color
This version uses Getopt::EX::termcolor module. It sets option --light-screen or --dark-screen depending on the terminal on which the command run, or TERM_BGCOLOR environment variable.
Some terminals (eg: "Apple_Terminal" or "iTerm") are detected automatically and no action is required. Otherwise set TERM_BGCOLOR environment to #000000 (black) to #FFFFFF (white) digit depending on terminal background color.
OPTIONS
- --dict=file
-
Specify dictionary file.
- --dictdata=data
-
Specify dictionary data by text.
- --dictpair pattern replacement
-
Specify dictionary entry pair. This option takes two parameters. The first is a pattern and the second is a substitution string.
- --check=
outstand
|ng
|ok
|any
|all
|none
-
Option --check takes argument from
ng
,ok
,any
,outstand
,all
andnone
.With default value
outstand
, command will show information about both expected and unexpected words only when unexpected word was found in the same file.With value
ng
, command will show information about unexpected words. With valueok
, you will get information about expected words. Both with valueany
.Value
all
andnone
make sense only when used with --stat option, and display information about never matched pattern. - --select=N
-
Select Nth entry from the dictionary. Argument is interpreted by Getopt::EX::Numbers module. Range can be defined like --select=
1:3,7:9
. You can get numbers by --stat option. - --linefold
-
If the target data is folded in the middle of text, use --linefold option. It creates regex patterns which matches string spread across lines. Substituted text does not include newline, though. Because it confuses regex behavior somewhat, avoid to use if possible.
- --stat
- --with-stat
-
Print statistical information. Works with --check option.
Option --with-stat print statistics after normal output, while --stat print only statistics.
- --stat-style=
default
|dict
-
Using --stat-style=dict option with --stat and --check=any, you can get dictionary style output for your working document.
- --stat-item item=[0,1]
-
Specify which item is shown up in stat information. Default values are:
match=1 expect=1 number=1 ng=1 ok=1 dict=0
If you don't need to see pattern field, use like this:
--stat-item match=0
Multiple parameters can be set at once:
--stat-item match=number=0,ng=1,ok=1
- --subst
-
Substitute unexpected matched pattern to expected string. Newline character in the matched string is ignored. Pattern without replacement string is not changed.
- --[no-]warn-overlap
-
Warn overlapped pattern. Default on.
- --[no-]warn-include
-
Warn included pattern. Default off.
FILE UPDATE OPTIONS
- --diff
- --diffcmd=command
-
Option --diff produce diff output of original and converted text.
Specify diff command name used by --diff option. Default is "diff -u".
- --create
-
Create new file and write the result. Suffix ".new" is appended to original filename.
- --replace
-
Replace the target file by converted result. Original file is renamed to backup name with ".bak" suffix.
- --overwrite
-
Overwrite the target file by converted result with no backup.
DICTIONARY
This module includes example dictionaries. They are installed share directory and accessed by --exdict option.
greple -Msubst --exdict jtca-katakana-guide-3.dict
- --exdict dictionary
-
Use dictionary flie in the distribution as a dictionary file.
- --exdictdir
-
Show dictionary directory.
- --exdict jtca-katakana-guide-3.dict
- --jtca-katakana-guide
-
Created from following guideline document.
外来語(カタカナ)表記ガイドライン 第3版 制定:2015年8月 発行:2015年9月 一般財団法人テクニカルコミュニケーター協会 Japan Technical Communicators Association https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf
- --jtca
-
Customized --jtca-katakana-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
- --exdict jtf-style-guide-3.dict
- --jtf-style-guide
-
Created from following guideline document.
JTF日本語標準スタイルガイド(翻訳用) 第3.0版 2019年8月20日 一般社団法人 日本翻訳連盟(JTF) 翻訳品質委員会 https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf
- --jtf
-
Customized --jtf-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
- --exdict sccc2.dict
- --sccc2
-
Dictionary used for "C/C++ セキュアコーディング 第2版" published in 2014.
https://www.jpcert.or.jp/securecoding_book_2nd.html
- --exdict ms-style-guide.dict
- --ms-style-guide
-
Dictionary generated from Microsoft localization style guide.
https://www.microsoft.com/ja-jp/language/styleguides
Data is generated from this article:
https://www.atmarkit.co.jp/news/200807/25/microsoft.html
- --microsoft
-
Customized --ms-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
Amendment dictionary can be found here. Please raise an issue or send a pull-request if you have request to update.
JAPANESE
This module is originaly made for Japanese text editing support.
KATAKANA
Japanese KATAKANA word have a lot of variants to describe same word, so unification is important but it's quite tiresome work. In the next example,
イ[エー]ハトー?([ヴブボ]ォ?) // イーハトーヴォ
left pattern matches all following words.
イエハトブ
イーハトヴ
イーハトーヴ
イーハトーヴォ
イーハトーボ
イーハトーブ
This module helps to detect and correct them.
INSTALL
CPANMINUS
$ cpanm App::Greple::subst
SEE ALSO
https://github.com/kaz-utashiro/greple
https://github.com/kaz-utashiro/greple-subst
https://github.com/kaz-utashiro/greple-update
https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf
https://www.jtf.jp/jp/style_guide/styleguide_top.html, https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf
https://www.microsoft.com/ja-jp/language/styleguides, https://www.atmarkit.co.jp/news/200807/25/microsoft.html
文化庁 国語施策・日本語教育 国語施策情報 内閣告示・内閣訓令 外来語の表記
https://qiita.com/kaz-utashiro/items/85add653a71a7e01c415
AUTHOR
Kazumasa Utashiro
LICENSE
Copyright 2017-2024 Kazumasa Utashiro.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.