Octanove Institute

NeuralMorse — reinventing Morse code with neural networks

I redesigned Morse code with modern statistical techniques including neural networks. NeuralMorse dynamically tokenizes input text and encodes it as sequences of eight tonal alphabets optimized by word embeddings and the assignment problem.

Free Post

Natural Language Processing

NAACL 2021 Education×NLP Research Recap

In this post, I'm going to summarize the Education×NLP papers presented at NAACL 2021 which was hosted online last week. Topics range from grammatical error correction to prerequisite relation extraction.

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Education

Free Post

Natural Language Processing

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Education

We are happy to announce GrammarTagger, an open-source toolkit for grammatical profiling for language learning. It can analyze text in English and Chinese and show you grammatical items included in the input, along with its estimated difficulty.

Complete Guide to Subword Tokenization Methods in the Neural Era

Free Post

Natural Language Processing

Complete Guide to Subword Tokenization Methods in the Neural Era

In deep natural language processing (NLP), the input text is often broken into "subword"—units that are shorter than words. In this article, we'll review common subword tokenization techniques including WordPiece, byte-pair encoding (BPE), and SentencePiece,

Complete Guide to Japanese BERT: Choosing the Right Pretrained Language Model for You

Free Post

Natural Language Processing

Complete Guide to Japanese BERT: Choosing the Right Pretrained Language Model for You

Pretrained language models (PLMs) such as BERT are used for solving more and more NLP applications in many languages, including Japanese. In this post, I'm going to compare various Japanese pretrained BERT models and their task performance and make a specific recommendation as of this writing.

State of Automated Essay Scoring with Pretrained Language Models

Free Post

Natural Language Processing

State of Automated Essay Scoring with Pretrained Language Models

Automated essay scoring (AES), where a computer algorithm scores student essays automatically, is an important application of natural language processing for education and language learning. In this article, we'll review some of the recent advances in automated essay scoring.

Most Frequently Mentioned ML Topics in 2020

Free Post

Machine Learning

Most Frequently Mentioned ML Topics in 2020

In this post, I'm going to use NLP techniques to analyze all the ML/NLP/CV papers published on arXiv this year and summarize the "most frequently mentioned ML topics in 2020."