March 2024 – Mindware Research Institute

2024年3月27日

Innovation Maps: Brain-Computer Interface

Today's topic is "Brain Computer Interface". As of March 25th, as a result of scraping news articles from the past 30 days, we were only able to collect about 20…

2024年3月26日

In Innovation Maps

Innovation Maps: 日本の皆さまへ

Innovation Mapsは、世界に向けて発信しておりますので、普段は英語で発信しておりますが、日本のお客様にも知って頂きたいので、今日は例外的に日本語でお届けします。 Innovation Mapsは、これからの社会を大きく変容させる主要な技術革新テーマについて、ニュース記事や技術論文を自己組織化マップ(Viscovery SOMine）でマッピングしてお届けしております。方法は常に改良されていくかと思われますが、今のところ、以下のような方法でマップを提供しております：データ準備（１）ニュース記事、技術論文をインターネットからスクレイピング（収集および構造化データの抽出）する。（２）記事のタイトル・要約（論文はアブストラクト）をEmbedding Vecotrsに変換する。（３）Embedding VectorsをPCAで次元削減する。-->Visual Explorer用マップ（４）事前に定義された数10個の特徴から記事のタイトル・要約（論文はアブストラクト）に含まれる特徴を抽出する。（５）記事のタイトル・要約（論文はアブストラクト）に含まれる用語を抽出して、ダミー変数（０と１の値の表）を作成する。-->Enterprise Data用マップ Embedding Vectorについて自然言語処理を効率化するためにEmbedding Vectors（埋め込みベクトル）が最近よく使用されるようになっています。基本的な考え方は、機械学習で入力データを正規化してベクトル値に変換し、それを主成分分析などで次元削減するようなことの延長のようです。ChatGPTなどの大規模言語モデルで、どのようなテキストでもベクトル値に変換できる汎用的なAPIが提供されているので、本プロジェクトでもそれを使用します。 OpenAI APIが提供するEmbedding Vectorsは、1536次元の値を持っており、その値自体を人間が解釈することはできません。Embedding Vectorsは、テキストや文書を分類するには、たいへん便利なのですが、結果を解釈するのが難しいという問題があります。世間で出回っているEmbedding Vectorの解説でも、2次元のグラフ上にマッピングして、「分類ができた」と言って喜んで終わっています。実践的であるためには、もう一歩が必要です。もう１つ、クラスタリングや分類の問題で理解しておくべき最も重要なことは「醜いアヒルの仔の定理」です。Embedding Vectorを使っても分類に失敗する本当の理由は、すべての次元を完全に平等に扱うことが「客観的」だと思い込んでいることにあります。ほとんどの人々はここでつまづいています。「分類」とは、我々が生物として生きるために必要から生じている「方法」であることに気づけば、重みづけの重要さが理解できます。そこで、我々は高次元データを解釈するために自己組織化マップを使用します。自己組織化マップも2次元にマッピングしますが、多次元の情報も保持されて成分マップ(Viscoveryでは属性ピクチャと呼ぶ）として利用できます。自己組織化マップ(Viscovery SOMine）は、1536次元のデータでも十分学習が可能で、きれいに文書を分類することができます。しかし、それをそのまま人間が見てもやはり理解できないので、我々はEmbedding Vectorsをさらに次元圧縮して15次元の主成分得点を使用することにしました。実験の結果、これで十分、文書を分類することができ、かつ、テキストから抽出した特徴や用語との相関を観察するにも（人間のためには）ちょうどよい次元数となります。テキストから抽出した特徴や用語から直接、マップを作成しても良いのですが、抽出の仕方によっては偏りができてしまい、完全な網羅性というのが保証できません。その点で、Embedding Vectorsはプロジェクトを効率化するために欠かせません。解釈可能性と完全性の両立のために、我々はEmbedding Vectors、PCA、SOMを組み合わせて使用します。 Visual Explorer用プロジェクト・ファイルの作成…

2024年3月25日

In Innovation Maps

Innovation Maps: News articles about Self Driving Part 2

To make the previous map easier to interpret, we extracted keywords from the article titles. The weighting for attributes is 1 for PCs and 0 for others. Viewer:You can display…

2024年3月24日

In Innovation Maps

Innovation Maps: News articles about Self Driving

In the previous time, we created maps from technical papers on Autonomous Vehicles; this time deals with news articles. Procedure The term "Autonomous Vehicles" has been paraphrased as "Automated Car,"…

2024年3月19日

In Innovation Maps

Innovation Maps on Autonomous Vehicles

Autonous Vehicles are one of the most impactful innovations in terms of directly changing the way we work and live. The development of autonomous driving technology is commonly categorized into…

2024年3月14日

In Innovation Maps

Innovation Map on Agent System

Many say that the next boom after LLM is the agent system. Agent systems are collections of software and hardware that operate autonomously and seek to achieve goals within their…

2024年3月12日

In Innovation Maps

Clustering of Tech News

This is a prototype version of the Innovation Map based on Tech News. Tech News retrieved from ChatGPT is summarized using ChatGPT, Embedding Vectors are obtained from the text, and…

2024年3月12日

In Innovation Maps

Innovation Map on Neuromorphic Computing

Here is an Innovation Map on Neuromorphic computing which created from 118 abstracts from IEEE Xplore (https://ieeexplore.ieee.org/Xplore/home.jsp). We got Embedding vectors for each abstract using OpenAI API, and then reduced…

Month: March 2024