西夏文在线输入 Tangut IME Online

A lightweight web app input tool for Tangut text. Features include reverse lookup,  handwriting recognition, code-based input(using RIME-based dictionary files), editing, copying, importing, exporting.

网页端西夏文输入。提供反查、手写输入或编码输入(作为基于RIME的输入法之辅助)、编辑、复制导出、导入。

2025.9 增加手写输入 Handwriting mode is supported. You can also use it at https://huggingface.co/spaces/raycosine/Detangutify

西夏文倉頡輸入法方案存在更新,如正在使用该方案,请至 Hulenkius/rime_tangutcjkk 下载新的字典文件使用 :)

The Tangut Cangjie input method has recently been updated by its authors. Please use a new version of the dictionary file :)

如手写模式无法运作,请访问部署于Hugging Face的手写功能页面,等待其自动重启。重启大约需要五分钟。If the handwriting mode is not responding, please visit Detangutify on Hugging Face and wait a few minutes for the space to restart.

Input Modes 输入模式

具有四种输入模式:

  1. 英文释义反查(初始默认)。键入英文释义,返回模糊匹配结果。 (模糊匹配由 Fuse.js实现,系字符串的近似而非语义上的近似)
  2. 手写输入。在手写模式中,使用画笔绘制,或黏贴、上传、拍摄图片,点击搜索按钮返回结果。但自动调整和匹配实际拍摄的、非正方形画幅、细笔触的手写照片功能并没有实装,需要提前自行处理图片。(识别功能部署于 Hugging Face,可能需翻墙)
  3. 编码输入。具体采用何种编码方案由所上载的输入法字典文件决定。
  4. 普通输入模式。不触发任何西夏文的匹配。

Three input modes are supported:

  1. Reverse lookup mode (init).  Type English definitions to get fuzzy-matched Tangut characters. (Fuzzy search is powered by Fuse.js, and please note that it is based on string similarity, not semantic similarity.)
  2. Handwriting mode. Use the brush to draw, or paste/upload/take a photo. Click the search button to get results. Currently, it doesn't work with real-world non-square pictures of characters with thin strokes. (Recognition is deployed on Hugging Face.)
  3. Code-based input mode. Which encoding scheme is used depends on the uploaded dictionary file.
  4. Regular input mode. In this mode, Tangut character matching is not triggered.

键入F2或单击屏幕底部图标以切换模式。

You can switch between input modes by pressing F2 or clicking the button below.

Reverse lookup 英文释义反查

英文释义来自夏漢字典序號与英文释义对表https://www.babelstone.co.uk/Tangut/XHZD_2008_Definitions.txt

English definitions come from https://www.babelstone.co.uk/Tangut/XHZD_2008_Definitions.txt.

载有字典文件时,所匹配的编码亦将显示于匹配结果中。

If a dictionary is loaded, all matching codes will be displayed in the candidate area as well.

Handwriting 手写识别

使用 Noto Serif Tangut (Noto 西夏宋体) 作为基础字体,经过图像增强及线宽归一化,对每个字形生成多个样本,以缩小手写体与印刷体的差异。前端传入手写或上传的图像后,通过特征向量比对返回候选字形。

由于衬线印刷字体与手写体还是存在差异的,所以准确率一般。是否需要书写板正、避免连笔,不太好说;只能宽泛地说,字形复杂度与识别准确率之间没有关联,但对于某些特定偏旁部件的字形准确率会显著降低。

Using Noto Serif Tangut as the base font, multiple samples are generated for each glyph through image augmentation and stroke-width normalization to reduce the gap between handwritten and printed forms. When a handwritten or uploaded image is submitted from the frontend, candidates are returned via feature-vector matching. Since serif printed fonts differ significantly from handwriting, the overall accuracy is quite limited. Whether one needs to write neatly or avoid cursive strokes is hard to say; roughly speaking, glyph complexity does not correlate with recognition accuracy, but for certain components, the accuracy drops noticeably.

Load the Dictionary 载入字典

可择一基于RIME的输入方案,将其中后缀为dict.yaml的文件载入到浏览器的本地储存之中。网页初始没有任何字典,需要手动加载。

理论上,任何基于RIME的输入方案都能载入使用。经过测试可以使用的输入方案有:

  1. 夏漢字典序號和四角號碼輸入法 (ccamc)
  2. 索號和四角號碼輸入法 (ccamc)
  3. 西夏文部件輸入法 (ccamc)
  4. 萱拼西夏文輸入法 (ccamc)
  5. 西夏文倉頡輸入法 (Hulenkius/rime_tangutcjkk)


Technically, any RIME-based input schemes are compatible with this tool. Currently, dictionary files (the ones ending with dict.yaml) of the following input schemes work well:

  1. 夏漢字典序號和四角號碼輸入法 (XHZD+4 corner, ccamc)
  2. 索號和四角號碼輸入法 (Sofronov+4 corner, ccamc)
  3. 西夏文部件輸入法 (Tangut Radicals, ccamc)
  4. 萱拼西夏文輸入法 (Xuanpin, ccamc)
  5. Tangut Cangjie (Tangut Cangjie, Hulenkius/rime_tangutcjkk)

Load and re-load any of the "dict.yaml" files via the upload button. This tool initially doesn't have any built-in dictionaries, so you need to upload one to enable the code-based input mode. Once loaded, the dictionary is saved in your browser's local storage and will persist across sessions unless you delete it.


作者尚未对更多输入方案进行研究,如有其它方案可以转为在线使用,欢迎提出 :)

考虑到版权问题,保持现有的将字典文件载入到本地的做法。

PS: I dont' know many Tangut IMEs or encoding schemes. Please let me know if there are other schemes you would like to use online :) 

Due to copyright concerns, this tool is designed with dictionary files loaded and used locally in your browser.

Input & Edit 编辑和输入

界面分为四个区域:输入区、候选区、虚拟键盘、工具栏。

The interface is divided into four parts: the input area, the candidate area, the virtual numpad, and the toolbar.


输入区中,通过在输入框键入对应的码字以匹配候选字。候选字过多时,可通过点击Prev/Next按钮或键入PageUp/PageDown 翻页。单击已经择定的西夏文字的左侧或右侧,可以将输入框移动到该文字的左侧或右侧。文本亦存储在浏览器本地,下次打开仍可使用。

In the input area, type a code in the green input box to view candidates. Candidate pages can be navigated by clicking the Prev/Next buttons or pressing PageUp/PageDown. The input box can be moved before (or after) a character by double-clicking its left (or right) side. Tangut characters (or components) that take the current typed code as their prefix are listed in the candidate area. The text will also be kept in the local storage of your browser.

使用退格 Backspace 键或虚拟的退格按钮以删除码字的最后一位。若码字为空且输入框前方有已经择定的文字,再次退格则会删除此文字。

Use the Backspace key or the virtual backspace button to delete the last digit/character of the code. If the input box is empty, an additional backspace will then delete the previous Tangut character if there is one.

键入候选文字所对应的字母或数字,抑或点击该文字,皆能选定完成输入。

You may confirm a character by either pressing its associated key or simply clicking it.

键入回车 Enter 时,若有候选选项,则择定首个候选字符;若无候选选项,则将输入框的内容确认存入(无论其是否为西夏文字);若输入框为空,则进行换行(换行亦能被回车键删除)。

When you press Enter, one of the following will happen:

  1. If there are candidates, the first character will be selected.
  2. If there are no candidcates, the content in the input box will be confirmed, regardless of whether it's Tangut or not; if the input box is empty, a line break will be added to the location.

Copy & Paste 复制和导入 

点击工具栏下方的复制按钮,输入框中所有择定的文字将会被复制到剪贴板。在手机端其它区域,这些文字可能无法渲染,但可以发送到装载有西夏文字体的设备。也可以通过提供导入功能的弹窗,将文字复制粘贴,载入到输入区。

Click the copy button to copy all confirmed characters in the input area. The text may not render correctly on other mobile apps, but they are still Unicode characters and viewable once sent to any PC with Tangut fonts installed.

A pop-up is provided for pasting and formatting the Tangut text.

Additional Features 其余功能

本工具支持切换日间/夜间模式。

This tool supports both dark and light themes, and automatically adapts to your system preferences. Manual switching is also available.


本工具附带Noto Serif Tangut;对于常见西夏文字体未包括的扩展区西夏文字,暂时使用Tangut Yinchuan Beta(唐兀银川 Beta)中的字形渲染,附有该字体的一个子集。两种字体风格不同,后续会提供用户自定义font family的功能。

另外使用Google Fonts的在线链接与(在本地已安装的)Tangut N4694作为备用字体,理论上不需要自行安装西夏文字体。

This tool includes the Noto Serif Tangut font, and a subset of the Tangut Yinchuan Beta font for extended Tangut character rendering. The styles of these two fonts are different. A user-defined font family option will hopefully be provided soon.

Additionally, it uses Noto Serif Tangut's online link and Tangut N4694 (if locally installed) as fallback options. Generally, you do not need to install any Tangut fonts on your devices.

Disclaimer 声明

作者在开发过程中使用了ChatGPT与Gemini辅助编程。

The (vibe) coding is assisted by ChatGPT and Gemini.

除手写模式需要与 Hugging Face 交互以外,一切上载、导出、复制粘贴行为发生在浏览器本地,除 itch.io 网站必需之措施,本工具不会索求、或上传任何数据。

Except for handwriting mode (which connects to Hugging Face), all data, including uploaded dictionaries, pasted text, and copied content, stays completely on your device. Nothing is sent to any server, except for essential itch.io site functions (like cookies).

License 许可证

CC0 1.0 Universal 知识共享-CC0 1.0 通用公共领域贡献

请注意此工具中装载的字体、释义文件等适用原作者所有许可。

Please note that the fonts and definition files included in this tool are subject to the original authors’ licenses.


Acknowledgement 致谢

作者感谢黄俊亮分享 Tangut Yinchuan Beta 字体,并提出与此工具相关的建议。

The author thanks Huáng Jùnliàng for sharing the Tangut Yinchuan Beta font and for his valuable suggestions.

Download

Download
Tangut IME v2 (source code).zip 106 kB

Development log

Leave a comment

Log in with itch.io to leave a comment.