Japanese Studies
Resources by Topic
Text mining in Japanese - Introductions and Exercises
-
Text-mining as a Research Tool in the Humanities and Social SciencesThis slide deck from Ryan Shaw is a good basic starting point. The accompanying blog post links to many other resources.
-
Seven ways humanists are using computers to understand textOverview in a blog post by Ted Underwood.
-
Text Analytics 101From professor John Laudun, some basic exercises with a single text. [may remove this one]
-
NLTK Japanese Corpora – NLTKで使える日本語コーパスExamples of using the Natural Language Toolkit with Japanese texts, by Masato Hagiwara.
Tools
Japanese Text Mining workshop at Emory University (May 30-June 2, 2017) - Useful to learn the basic tools and how they work.
IIIF (International Image Interoperability Framework) in Japanese language materials - "digitalnagasaki" Blog by Dr. Kiyonori Nagasaki.
Awesome International Image Interoperability Framework (IIIF) - English
East Asia Digital Humanities Portal
東アジアDHポータル Kansai University Asia Open Resarch Center
-
Online OCROne example online service for carrying out OCR on image or PDF files. (Guest mode can only convert one file at a time, free account is limited to 25 pages.)
-
Free OCR softwareThere are various desktop programs for OCRing large quantities of text. These are the programs available for free on CNet Downloads.
-
KH CoderA free front end combining other text mining tools such as R, ChaSen and MeCab. Developed for Japanese and can do several other languagues as well.
-
KuromojiOne of several tools that will perform basic morphological analysis, i.e., splitting text into words in a way that a computer will understand.
-
茶まめ形態素解析
-
Natural Language Toolkit (NLTK)Programming platform for analyzing text with Python.
-
TinySegmenterPython code for tokenizing Japanese text, by Masato Hagiwara. Can be plugged in to NLTK.
-
HimawariTool for searching and analyzing XML-encoded Japanese texts.
-
Voyant Tools 2.0Applicable for Japanese language. Prepare texts for corpora analysis, including creating, modifying and accessing corpora, more powerful search capabilities, new tools (like Phrases) and tools rewritten in HTML, as well as significant speed and scale improvements all around.
-
How to use VoyantDr. Stéfan Sinclair, the creator, explaines the Voyant tool at the University of Tokyo.
-
Comainu音声研究に適した中単位、及び、構文・意味研究に適した長単位を自動構成するツール
-
SMART-GSThe SMART-GS system is a software system for transcription of and studies on handwritten documents. It can be used to read Kuzushiji texts individually or group.
-
AutoMemo(オートメモ)- Not open access. Commercial product.Transcribe voice to texts in Japanese. (This is only for informational purpose.)
-
Commnication and Empire (Chinese DH)Provides tools and visualization platform for Chinese character texts, such as Kanbun 漢文.
This is only a brief sampling of sources for data sets and texts offered in more or less "ready to use" form. For links to many more, try:
- Center for Open Data in the Humanities (CODH)
- NINJAL Databases page
- NIJL 電子資料館
- Next Digital Library at the National Diet Library -- search full texts OCR-generated from books at the National Diet Library.
- N-Gram viewer of the texts held at the National Diet Library.
- じんもんそんで使えそうなデータやAPI等のリスト --post from the "Digital Humanities Notes in Japan" blog
- 書籍デジタル化委員会電子図書館
-
Aozora bunko CorpusDeveloped by Dr. Hoyt Long of the University of Chicago and University of Chicago's ARTFL project. Text only - Search Aozora bunko (https://www.aozora.gr.jp/)
-
Corpus listCenter for Corpus Development at the National Institute for Japanese Language and Linguistics (NINJAL) 国立国語研究所
-
Corpus of Spontaneous JapaneseCorpus of spoken Japanese: sound files with transcriptions and a great variety of metadata derived therefrom.
-
Data sets Repository at NII, JapanList of the Data sets offered by the National Institute of Informatics, Japan.
-
Japanese WordnetA semantic dictionary of Japanese used in many apps and online dictionaries. Here you can download the entire dataset as an SQLite3 database or subsets of it.
-
Next Generation Library at NDLDigital Humanities lab using digitized materials by the National Diet Library. See the inside more detail.
-
OpenMWE for JapaneseCorpus of Japanese idioms and usage examples.
-
The SAT Daizokyo Text DatabaseText database of TAISHO SHINSHU DAIZOKYO, 『大正新脩大藏經』Vol.1-Vol.85
-
Tanaka CorpusCorpus of matched sentence pairs in Japanese and English. These are the example sentences seen in the WWWJDIC online dictionary and are the basis for Tatoeba.org.
-
国文学研究資料館データセットの使い方Explanation how to use datasets of the National Institute of Japanese Literature. Blog offered by Professor Kiyonori Nagasaki. (Open source)
-
国文研古典籍データセット(第0.1版)Historical Japanese texts by the National Institute of Japanese Literature (NIJL)
-
日本語日本文学研究の未来のためにProfessor Miyuki Kondo, Professor Yasuhiro Kondo share their resource on research of Japanese language and classic literature, using N-gram analysis.
Introductions
If you are working with maps and data to be overlaid on maps, start at the Clark Library! They provide maps and data, as well as software and consultations and classes to help you learn to put it all together.
-
GIS LoungeInformation portal site including news, introductory articles, resources and job postings.
-
A Gentle Introduction to GISStep-by-step introduction meant to be used with QGIS software.
-
Geographic Information Systems BasicsCreative Commons textbook by professors Jonathan E. Campbell and Michael Shin.
-
Introduction to GISSlide deck from an MIT workshop.
-
GIS-Based Studies in the Humanities and Social Sciences by Atsuyuki Okabe (Editor)
ISBN: 9780849327131Publication Date: 2005-10-31Resulting from a six-year project entitled Spatial Information Science for the Humanities and Social Sciences (SIS for HSS), GIS-Based Studies in the Humanities and Social Sciences details the tools and processes for deploying GIS in economic and social analyses. Through the use of this book, readers can understand how GIS technology can be utilized in advancing studies.
Sample Data Sources
The Clark Map library has compiled many data sources on their LibGuide GIS, Map and Statistics: Japan.
Tools
-
Google MapsCreate personal maps using Google maps (log in with a Google / Gmail account to save maps).
-
Google Fusion TablesExperimental Google Drive app for information visualization, including mapping.
-
MapboxSuite of tools and APIs built on OpenStreetMap.
-
GeoCommonsCommunity building an open repository of maps and data. Includes tools for mapping and analysis as well as an API.
-
GeoNLP ProjectNII web service with dictionaries of geographic names, mapping tools, and place-names detection.
-
Batch GeocodingQuick online tools to convert addresses to latitude-longitude pairs or vice versa, along with some other functions.
-
QGISFree, open-source GIS software. Also available at campus computer labs.
-
ArcGISProfessional-level proprietary GIS software. Available for students on some campus computers, especially at the Clark Library. (Also has a Japanese site: www.esrij.com).
-
International Institute for Digital Humanities (人文情報学研究所)Japan-based organization with a focus on Buddhist Studies.
-
IPSJ SIG Computers and the Humanities (人文科学とコンピュータ研究会)Check the 発表一覧 link to see a list of recent articles and projects.
-
翻デジ2014Crowdsourcing project that aims to transcribe the contents of the 近代デジタルライブラリー (see "Other" tab).
-
Digital Humanities Questions and AnswersOnline forum for "crowd-sourced digital humanities expertise" supported by the Association for Computers and the Humanities (ACH).
-
Digital Humanities in Japan Facebook Group(Must be logged into Facebook to access).
-
Academic communities relating to digital humanitiesExtensive list of organizations whose work relates to digital humanities, provided by the "Digital Humanities Notes in Japan" blog
Date sets
Platforms
-
Bodies and StructuresHistory Digital Humanities study platform
The Basics
What is digital humanities?
Digital humanities is still a new field, and there are many different ways to define and approach the term. Basically, it refers to using digital, data-analysis style tools to do research on humanities topics.
Introductory reading:
- Digital Humanities in Japanese Studies Presentation by Hoyt Long, University of Chicago: an introduction and "progress report" on Digital Humanities in Japanese.
- A Short Guide to the Digital_Humanities (Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp, MIT Press, 2012)
- デジタル・ヒューマニティーズ入門 (Japanese translation of A Short Guide to the Digital_Humanities, from the University of Tokyo)
- Introduction to Digital Humanities (DH101) - Online coursebook from the University of California - Los Angeles
- Digital Orientalist -- online journal covering DH in Asia 東洋.
General Skills
Technical skills and concepts likely to come up in any digital humanities work.
[will probably update as I work on "Resources by Topic"]
-
"GUI" vs. "Command Line"The two main ways of interacting with software: through graphics or by typing commands. GUI interfaces are easier; Command Line is faster once one is used to it.
-
OCR (Optical Character Recognition)Technique by which software "reads" text in an image and transcribes it into an editable text form.
-
Text Encodings and UnicodeText encoding affects how (and whether) software reads and displays individual characters.
-
Regular ExpressionsLanguage that allows one to search for patterns of characters.
-
PythonFlexible and easy-to-learn programming language.
-
RProgramming language for data manipulation and visualization, with built-in statistics functions.
-
.CSV (Comma Separated Values)A basic file format for organizing data into tables; often used for loading data in and out of other formats (e.g. spreadsheets).
-
HTMLThe basic language of webpages; necessary to understand if you will be scraping sites.
-
API (Application Programming Interface)An application or web service's procedures for other applications to access it and use its data.
-
XMLSimilar in structure to HTML. Most often used to structure texts or organize metadata.
Absolute beginner?
If you are new to academic research and writing in general, here are some places to start:
- On campus: Sweetland Center for Writing
- Research Guide: Citation Help
- Pudue University's Online Writing Lab
Technical HELP at the U-M Library
ScholarSpace online - questions, technical help to use basic tools, such as scanning, ORC, etc.
Media Production Room at Shapiro
Digital Media Commons on North Campus
Acknowledgments
This research guide owes a great deal to some other wonderful guides:
- Digital Humanities, especially the "Mapping and Spatial Methods" section -- Christine Murray, University of Pennsylvania
- Japanese Text Analysis -- Molly Des Jardin, University of Pennsylvania
- Introduction to Text Analysis -- Angela Zoss, Duke University
Digital audio
-
AutoMemo(オートメモ)- digital recorder + transcribing textsJapanese. Commercial product. (This is only for informational purpose.)
-
Rev.aiFree hours + fee. Multi-language.
-
Speech-to-Text APIsinformational purpose commercial pro
-
Audio Transcription with YouTube (open access)YouTube/Google can be a good tool for an automatically generated rough draft, which will then require various degrees of manual cleanup. Upload the audio file (in a video format) to Google, have it create a transcript, and download the transcript and proofread. (explanation by J.Schell at the Design Lab, U-M Library.)
-
Google Cloud Speech to TextSee "How to use Google's Cloud Speech API to transcribe a large audio file"
-
TRANSCRIPTION SERVICES for U-M members.includes transcription, translation and caption services. The following suppliers have master agreements with the University, therefore, the Pcard restriction has been waived for these suppliers. The use of all other service providers will require prior approval from Procurement.
-
TrintInformational purpose only (commercial product) - transcribe and translate many languages including Japanese.