人工知能/ニューラルネットワークの履歴(No.1)

履歴一覧
差分を表示
現在との差分を表示
ソースを表示
人工知能/ニューラルネットワークへ行く。
- 1 (2026-02-20 (金) 14:20:38)

「ニューラルネット（Neural Network / ニューラルネットワーク）」は、ざっくり言うと **入力→計算→出力**をする「関数（計算の仕組み）」を、たくさんの小さな計算部品（層）を積み重ねて作ったものです。 [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks), [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/)

以下、順にわかりやすく説明します。

## 1) ニューラルネットとは何か？（一言で）

ニューラルネットは、データから“非線形なパターン”を学習するためのモデルの一種**で、入力特徴量から出力（分類・回帰・生成など）を作る計算機構です。†

学習は、多くの場合 **誤差を小さくするように重み（パラメータ）を調整**して行います。 [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks), [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/) [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/), [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks)

↑

†

## 2) どんな構造になっているか？（基本の部品）

ニューラルネットは、典型的に次の部品で構成されます。

### 2.1 ノード（ニューロン）と重み（パラメータ）

↑

各ノードは入力の重み付き和を作り、そこにバイアスを足して、次の計算へ渡します。 [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/), [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks)&aname(tf12ea8f,super,full,nouserselect){†};

↑

この「重み」と「バイアス」が、学習で調整されるパラメータです。 [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/), [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/)&aname(w91c980a,super,full,nouserselect){†};

### 2.2 層（layer）＝演算ブロック

あなたが引用した “線形層/正規化層/埋め込み層” は、どれも「層」の具体例です。

↑

線形層（Linear / Fully Connected）：†

   ベクトル $$x$$ に対して $$Wx + b$$ のような一次変換をする層です（Transformerの `q_proj` 等も本質的には線形変換）。 [\[rectified-....github.io\]](https://rectified-scaling-law.github.io/), [\[arxiv.org\]](https://arxiv.org/abs/2308.08747)

↑

活性化関数（Activation）：†

   線形だけだと表現力が弱いので、ReLUやSiLUなどの **非線形**を挟みます。 [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks), [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/)

↑

正規化層（LayerNorm / RMSNorm）：†

   値のスケールを整えて学習を安定させる目的で使われます（LLMではLayerNormやRMSNormがよく登場）。 [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/), [\[arxiv.org\]](https://arxiv.org/abs/2308.08747)

↑

埋め込み層（Embedding）：†

   単語ID（トークンID）を連続ベクトルに変換する層で、LLMの入口にあります。 [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/), [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks)

↑

†

## 3) 「深層」ニューラルネット（Deep Neural Network）＝層をたくさん積む

層を1〜2個ではなく、**何十層、何百層**と積み重ねたものを「深層（ディープ）」と呼びます。 LLM（大規模言語モデル）もこの仲間で、特に多くは **Transformer** という構造です。 [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks), [\[developer.ibm.com\]](https://developer.ibm.com/articles/cc-cognitive-neural-networks-deep-dive/) [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[openreview.net\]](https://openreview.net/forum?id=g7rMSiNtmA)

↑

†

## 4) Transformer（LLM）の場合、層（ブロック）はどうなっている？

LLMでよく言う「第n層」は、\*\*Transformerブロック（DecoderLayerなど）\*\*を指すことが多いです。その1ブロックには典型的に： [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[openreview.net\]](https://openreview.net/forum?id=g7rMSiNtmA)

↑

Self-Attention（注意機構）：入力からQ/K/Vを作って重み付けして混ぜる [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[openreview.net\]](https://openreview.net/forum?id=g7rMSiNtmA), [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/)&aname(xc565da6,super,full,nouserselect){†};

↑

MLP（Feed Forward）：各トークン独立に大きな非線形変換をする [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/)&aname(sa0c1a49,super,full,nouserselect){†};

↑

残差接続 + 正規化：学習安定化のための仕組み [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[openreview.net\]](https://openreview.net/forum?id=g7rMSiNtmA), [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/)&aname(o2ccd71c,super,full,nouserselect){†};

が含まれます。

そして LoRA を「全層（Attention+MLP）に刺す」というのは、これらのブロック内の \*\*線形層（q/k/v/o, up/down/gate 等）\*\*に広く追加パラメータを入れることを意味します。 [\[rectified-....github.io\]](https://rectified-scaling-law.github.io/), [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/), [\[rectified-....github.io\]](https://rectified-scaling-law.github.io/)

↑

†

## 5) どうやって学習する？（超ざっくり：誤差逆伝播）

ニューラルネットの学習で中心になるのが \*\*バックプロパゲーション（誤差逆伝播）\*\*です。

↑

順伝播（forward）：入力→層→出力を計算し、予測と正解の差（損失）を出す [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/), [\[towardsdat...cience.com\]](https://towardsdatascience.com/backpropagation-step-by-step-derivation-99ac8fbdcc28/)&aname(s9a265b5,super,full,nouserselect){†};

↑

逆伝播（backward）：その損失が各重みにどれだけ影響したか（勾配）を、連鎖律で後ろから計算する [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/), [\[towardsdat...cience.com\]](https://towardsdatascience.com/backpropagation-step-by-step-derivation-99ac8fbdcc28/)&aname(z4f1bbaf,super,full,nouserselect){†};

↑

更新（optimizer）：勾配に基づき重みを少しずつ更新して損失を下げる [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/), [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks)&aname(acf1f6c7,super,full,nouserselect){†};

↑

†

## 6) どんな言語で開発されている？（実務は“Python中心＋高速部分はC++/CUDA”）

ここが現場感のあるポイントです。

### 6.1 研究・開発（モデルを書く部分）は **Python** が主流

PyTorchやTensorFlow、JAXなどの主要フレームワークは **Pythonでモデルを組み立てる**ことが中心です。 [\[github.com\]](https://github.com/pytorch/pytorch), [\[tensorflow.org\]](https://www.tensorflow.org/api_docs), [\[docs.jax.dev\]](https://docs.jax.dev/en/latest/)

↑

PyTorch：Python向けに「テンソル計算＋自動微分（autograd）」を提供する、と明確に説明されています。 [\[github.com\]](https://github.com/pytorch/pytorch), [\[pypi.org\]](https://pypi.org/project/torch/)&aname(k94476b2,super,full,nouserselect){†};

↑

TensorFlow：APIは複数言語があるが、Python APIが最も完全で使いやすいと公式が述べています。 [\[tensorflow.org\]](https://www.tensorflow.org/api_docs), [\[tensorflow.org\]](https://www.tensorflow.org/guide)&aname(n10f55d2,super,full,nouserselect){†};

↑

JAX：Python向けの高性能数値計算ライブラリで、GPU/TPU向けのコンパイルや自動微分を提供します。 [\[docs.jax.dev\]](https://docs.jax.dev/en/latest/), [\[pypi.org\]](https://pypi.org/project/jax/)&aname(hbf8a598,super,full,nouserselect){†};

## 7) 直感のための超ミニ例（概念だけ）

ニューラルネットは「層」を積んだ関数です：

↑

入力 $$x$$†

↑

線形層：$$h = W_1 x + b_1$$†

↑

活性化：$$h' = \text{ReLU}(h)$$†

↑

線形層：$$y = W_2 h' + b_2$$†

こういう部品の積み重ねが巨大化したのが、LLMのようなモデルです。 [\[developers...google.com\]](https://developers.google.com/machine-learning/crash-course/neural-networks), [\[geeksforgeeks.org\]](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/)

ご質問は大きく **(1) 開発体制（独自開発 vs OSS共同開発）

↑

基盤（フレームワーク／ランタイム／コンパイラ／推論エンジン）の多くはOSSで、企業・研究機関が共同で開発しています。 [\[unfoldai.com\]](https://unfoldai.com/catastrophic-forgetting-llms/), [\[blog.winerva.com\]](https://blog.winerva.com/llama-transformer-architecture/), [\[reddit.com\]](https://www.reddit.com/r/LocalLLaMA/comments/15sgg4m/what_modules_should_i_target_when_training_using/), [\[pytorch.org\]](https://pytorch.org/), [\[github.com\]](https://github.com/microsoft/onnxruntime)&aname(j90b6473,super,full,nouserselect){†};

↑

一方で最終的なプロダクト（モデル重み、学習データ、運用ノウハウ、独自最適化）は各社独自の部分も多く、“OSS＋自社独自”のハイブリッドが実態です。 [\[github.com\]](https://github.com/microsoft/onnxruntime), [\[github.com\]](https://github.com/vllm-project/vllm), [\[github.com\]](https://github.com/ggml-org/llama.cpp)&aname(g67f6813,super,full,nouserselect){†};

以下、順に詳しく説明します。

↑

†

## 1) ニューラルネット開発は独自？それともOSS共同？

### 1.1 実態は「層（レイヤー）」ごとに違う

ニューラルネットの“開発”は、ざっくり次の層に分けて考えると理解しやすいです。

1. **研究・実装（モデル定義・学習ループ）**：Python中心、OSSが主流 2. **計算基盤（テンソル演算・自動微分・カーネル）**：C++/CUDA等、OSSが主流（＋一部ベンダー最適化） 3. **推論最適化・配備（コンパイル、サービング、ランタイム）**：OSSが多いが、商用最適化も多い 4. **モデル重み・学習データ・運用**：企業独自が多い（ただしオープンモデルも増加）

PyTorchは「Python API が大きな C++ コードベースの上に乗っている」と明示しており、Pythonは主に上位層の役割です。このように \*\*“土台は共同（OSS）／上物は独自”\*\*になりがちです。 [\[docs.pytorch.org\]](https://docs.pytorch.org/tutorials/advanced/cpp_frontend.html), [\[developers...redhat.com\]](https://developers.redhat.com/articles/2026/02/19/understanding-aten-pytorchs-tensor-library) [\[unfoldai.com\]](https://unfoldai.com/catastrophic-forgetting-llms/), [\[github.com\]](https://github.com/microsoft/onnxruntime), [\[github.com\]](https://github.com/vllm-project/vllm)

↑

†

## 2) 代表的なOSSプロジェクト（URL付き）

「どこで共同開発されているか」を示すために、主要OSSをカテゴリ別に挙げます。

### A) 学習・研究で最も使われるフレームワーク

↑

PyTorch（公式）: <https://pytorch.org/>†

   **GitHub**: <https://github.com/pytorch/pytorch> [\[unfoldai.com\]](https://unfoldai.com/catastrophic-forgetting-llms/) [\[arxiv.org\]](https://arxiv.org/abs/2106.09685)

↑

TensorFlow（公式）: <https://www.tensorflow.org/api_docs>†

   **GitHub**: <https://github.com/tensorflow/tensorflow> [\[note.com\]](https://note.com/kan_hatakeyama/n/nfdc1c020a1e6) [\[blog.winerva.com\]](https://blog.winerva.com/llama-transformer-architecture/)

↑

JAX（ドキュメント）: <https://docs.jax.dev/en/latest/>†

   **GitHub**: <https://github.com/jax-ml/jax> [\[en.wikipedia.org\]](https://en.wikipedia.org/wiki/Attention_Is_All_You_Need) [\[reddit.com\]](https://www.reddit.com/r/LocalLLaMA/comments/15sgg4m/what_modules_should_i_target_when_training_using/)

### B) 互換フォーマット／推論ランタイム（“学習→推論”の橋渡し）

↑

ONNX（公式）: <https://onnx.ai/> [\[pytorch.org\]](https://pytorch.org/)&aname(e745e069,super,full,nouserselect){†};

↑

ONNX Runtime（GitHub）: <https://github.com/microsoft/onnxruntime>†

   **C++ API（公式ドキュメント）**: <https://onnxruntime.ai/docs/get-started/with-cpp.html>   
   ※ONNX Runtimeは「cross-platform, high performance inference accelerator」として位置づけられています。 [\[github.com\]](https://github.com/microsoft/onnxruntime) [\[onnxruntime.ai\]](https://onnxruntime.ai/docs/get-started/with-cpp.html) [\[github.com\]](https://github.com/microsoft/onnxruntime), [\[learn.microsoft.com\]](https://learn.microsoft.com/en-us/azure/machine-learning/concept-onnx?view=azureml-api-2)

### C) LLM推論・サービング（高スループット運用）

↑

vLLM（GitHub）: <https://github.com/vllm-project/vllm>†

   **公式サイト**: <https://vllm.ai/>   
   ※「高スループットでメモリ効率の良いLLM推論・サービングエンジン」と明記。 [\[github.com\]](https://github.com/vllm-project/vllm) [\[vllm.ai\]](https://vllm.ai/) [\[github.com\]](https://github.com/vllm-project/vllm), [\[vllm.ai\]](https://vllm.ai/)

### D) C/C++で直接推論（ローカル推論や組込みに強い）

↑

llama.cpp（GitHub）: <https://github.com/ggml-org/llama.cpp>†

   ※“LLM inference in C/C++”として明確にC/C++推論エンジン。 [\[github.com\]](https://github.com/ggml-org/llama.cpp) [\[github.com\]](https://github.com/ggml-org/llama.cpp), [\[deepwiki.com\]](https://deepwiki.com/ggml-org/llama.cpp)

### E) カーネル開発（CUDAを書かずに高速カーネルを書ける系）

↑

Triton（GitHub）: <https://github.com/triton-lang/triton>†

   ※Tritonは「高効率な深層学習プリミティブを書くための言語＆コンパイラ」と説明されています。 [\[github.com\]](https://github.com/triton-lang/triton) [\[github.com\]](https://github.com/triton-lang/triton), [\[openai.com\]](https://openai.com/index/triton/)

### F) コンパイラ（Pythonを書いたまま高速化を狙う）

↑

OpenXLA / XLA（GitHub）: <https://github.com/openxla/xla>†

   **公式サイト**: <https://openxla.org/>   
   ※XLAはPyTorch/TensorFlow/JAXなどからモデルを受け取り最適化する、と説明されています。 [\[github.com\]](https://github.com/openxla/xla) [\[openxla.org\]](https://openxla.org/) [\[github.com\]](https://github.com/openxla/xla), [\[openxla.org\]](https://openxla.org/)

### G) モデル定義・周辺エコシステム（“同じモデルを皆で共有”）

↑

Hugging Face Transformers（GitHub）: <https://github.com/huggingface/transformers>†

   **公式ドキュメント**: <https://huggingface.co/docs/transformers/index>   
   ※Transformersは「モデル定義の共通化」によって、多様な推論エンジンと連携する、と述べています。 [\[github.com\]](https://github.com/huggingface/transformers) [\[huggingface.co\]](https://huggingface.co/docs/transformers/index) [\[huggingface.co\]](https://huggingface.co/docs/transformers/index), [\[github.com\]](https://github.com/huggingface/transformers)

人工知能/ニューラルネットワーク の履歴(No.1)

ニューラルネットは、データから“非線形なパターン”を学習するためのモデルの一種**で、入力特徴量から出力（分類・回帰・生成など）を作る計算機構です。†

†

**線形層（Linear / Fully Connected）**：†

**活性化関数（Activation）**：†

**正規化層（LayerNorm / RMSNorm）**：†

**埋め込み層（Embedding）**：†

†

†

**MLP（Feed Forward）**：各トークン独立に大きな非線形変換をする [\[arxiv.org\]](https://arxiv.org/abs/2308.08747), [\[aclanthology.org\]](https://aclanthology.org/2024.findings-emnlp.249/)&aname(sa0c1a49,super,full,nouserselect){†};

†

†

**PyTorch**：Python向けに「テンソル計算＋自動微分（autograd）」を提供する、と明確に説明されています。 [\[github.com\]](https://github.com/pytorch/pytorch), [\[pypi.org\]](https://pypi.org/project/torch/)&aname(k94476b2,super,full,nouserselect){†};

**TensorFlow**：APIは複数言語があるが、**Python APIが最も完全で使いやすい**と公式が述べています。 [\[tensorflow.org\]](https://www.tensorflow.org/api_docs), [\[tensorflow.org\]](https://www.tensorflow.org/guide)&aname(n10f55d2,super,full,nouserselect){†};

**JAX**：Python向けの高性能数値計算ライブラリで、GPU/TPU向けのコンパイルや自動微分を提供します。 [\[docs.jax.dev\]](https://docs.jax.dev/en/latest/), [\[pypi.org\]](https://pypi.org/project/jax/)&aname(hbf8a598,super,full,nouserselect){†};

入力 $$x$$†

線形層：$$h = W_1 x + b_1$$†

活性化：$$h' = \text{ReLU}(h)$$†

線形層：$$y = W_2 h' + b_2$$†

†

†

**PyTorch（公式）**: <https://pytorch.org/>†

**TensorFlow（公式）**: <https://www.tensorflow.org/api_docs>†

**JAX（ドキュメント）**: <https://docs.jax.dev/en/latest/>†

**ONNX（公式）**: <https://onnx.ai/> [\[pytorch.org\]](https://pytorch.org/)&aname(e745e069,super,full,nouserselect){†};

**ONNX Runtime（GitHub）**: <https://github.com/microsoft/onnxruntime>†

**vLLM（GitHub）**: <https://github.com/vllm-project/vllm>†

**llama.cpp（GitHub）**: <https://github.com/ggml-org/llama.cpp>†

**Triton（GitHub）**: <https://github.com/triton-lang/triton>†

**OpenXLA / XLA（GitHub）**: <https://github.com/openxla/xla>†

**Hugging Face Transformers（GitHub）**: <https://github.com/huggingface/transformers>†