A personal schizophrenic idea blog

Observations and Uncertainties of the Past

We humans, as living organisms, should have some information in our memory. But that information should not be complete, it should be a store of repeated observations as an individual.


For example, if a mosquito is flying around at night while you are sleeping, you can choose to squash it or ignore it. Since the actual decision is not finished until tomorrow, you have uncertainty until tomorrow, a point in the future. But once you have crushed a mosquito, the world will probably split and you will move to the world where you have crushed it. In other words, you have memorized the information that you crushed the mosquito. And the world that you didn't smash it could exist as another branch.

The question is, what past paths are possible for the world where you crushed a mosquito? Because the information you have is incomplete, there are multiple possible paths. Looking at the above diagram, normal thinking would bring to mind the left diagram, but as a schizophrenic, I think of the right diagram. That is, there may be a point where the number of "possible past paths" is greater than 1.

It may be too complex and difficult to know exactly what the impact is, since individuals (or everything) is influenced by the environment and observations. However, given the biological significance of human memory and confirmation bias, it may be possible that choosing information that is convenient for you may result in choosing a world that works in your favor. If, from the level of the individual, there is the possibility of having a different past-path, what biological significance is there in synchronizing the information with other individuals? Isn't the information that humans have in their memory far more transposable than the fixed physical laws like speed of light?

These are just hypotheses, but what I'm trying to say is that uncertainty is a concept that can be applied to the past as well as the future. To demonstrate in a physical sense, of course, you need evidence.

観測と過去の不確実性 [日本語]







Is memorization just overfitting?

Memorization is considered to be one of the easiest ways to over-fitting. If a school teacher tells a student that the test will be taken directly from a given question, the student can answer the questions by rote without "understanding" how to solve them. But what is "understanding"?

Memorization is quite simple at first glance: it can be built by giving two ways, storing a problem as index, and retrieving that problem by sending some query. The following variations are possible:

  • Problem as a query to find the exact matching problems.
  • With problem as the query, you can search for problems by ranking them based on some score (e.g., similarity), etc.

On the other hand, a system based on some "intelligence" might use a different method than search.

  • Enter a problem and you can deduce it using some rule.
  • Enter a problem, and you will find something (e.g., a pattern) in the problem.
  • Enter a problem, and the problem is simplified, abstracted, or did reasoning.
  • Enter a problem, and you splits it into sub-problems.


Whereas the memorization system was the type of knowledge that retrieves the problem itself, the intelligence-based system appears at first glance to be executing some program (deduction, pattern discovery, etc.).

In addition, some of them appear to be executed by a combination of the two systems:

  • Enter problem and you will find similar programs from a seemingly unrelated domain to the problem.
  • Entering problem will search for problems with an exact match or some score based on the sub-problems encompassed by the problem.
  • Deduce problem for the retrieved problems.

Indeed, you will not be able to solve 1+2=3 simply by memorizing the problem 1+1=2; if the rule for calculating 1+1=2 is not "understood", you will not be able to cope with the change in the number.

How about "memorization of the rules"? For example, you can solve a quadratic equation by memorizing the rule of the formula for solving a quadratic equation, but can this be called "understanding"? The rules in this case are simple, but it is possible to memorize rules in combination with rules, and a meta-rule about how to "combine" them might also be memorable.

When students solve quadratic equations, they don't just memorize the solution formula, they also use discriminant expressions and factorization, depending on the situation. This kind of knowledge seems to be acquired through habituation by solving a lot of problems, but does that habituation have anything to do with intelligence or understanding?

John von Neumann had this to say:

“Young man, in mathematics you don't understand things. You just get used to them.” ― John von Neumann

Personally, I think intelligence is the ability to execute a program. Memorized data itself does not have the ability to perform memorization. Abstraction from experience and knowledge might also mean that it has the ability to execute some program that uses the data.

Why can't a computer run a program like a human being runs in his or her brain?

丸暗記は単なる過学習なのか [日本語]




  • problemをクエリとして、完全一致するproblemsを検索する。
  • problemをクエリとして、何らかのスコア(例えば類似性)などに基づいてランク付けしてproblemsを検索する。


  • problemを入力すると、何らかのルールを利用して演繹する。
  • problemを入力すると、problemから何か(例えばパターン)を発見する。
  • problemを入力すると、problemを単純化・抽象化・意味付けなどをする。
  • problemを入力すると、sub-problemsに分割する。

丸暗記システムが問題そのものを検索するタイプの知識であったのに対し、知性に基づいたシステムは一見すると何らかのプログラム(演繹、パターンの発見、等) を実行しているように見えます。


  • problemを入力すると、problemとは一見関連のないドメインから類似するproblemsを発見する。
  • problemを入力すると、problemが包含するsub-problemsに基づいて、完全一致または何らかのスコアでproblemsを検索する。
  • problemをクエリとして検索されたproblemsに対して演繹する。


確かに、受験に出題される可能性のある、ありとあらゆる問題を丸暗記した場合、1+1=2 という問題を暗記するだけでは、1+2=3 を解くことはできません。1+1=2を計算したルールが"理解"されていなければ、数字が変わっただけで対応できなくなります。




“Young man, in mathematics you don't understand things. You just get used to them.” ― John von Neumann




A thought experiment about hypothesis-driven and data-driven

In Japan, the dogmatism that "it is difficult to do anything with data without predetermining a hypothesis in advance, so you should set a hypothesis" is sometimes circulated on Twitter. At the same time, the dogmatism that "Recent trend is data-driven, so we should create hypotheses from data" is also rampant.

But if you google the terms hypothesis-driven or data-driven, you'll find that there's few formulation. So let's do a little thought experiment on these terms.


Consider the claim "I found a pattern in the data". It seems to me that given labels, a model fitted against it from features would imply a pattern. So is the "pattern" a hypothesis? Did the hypothesis exist a priori? Or did you find it later?

This is a difficult question because at the point where we have labels, we implicitly hypothesize that "there must be a pattern that can explain the labels from the features by more than P%". However, "this model (pattern) will explain the data above the P%" is a hypothesis that emerged later.

As already pasted the above image, the function f1 means that the hypothesis was "generated" from the data D1 that was there first. The function f2 is "testing" the generated hypothesis with different data than the first. Think of the range [0, 1] as implying a normalized score. Ideally, D1 and D2 should be independent.

In a place like kaggle, you generate a model (a hypothesis as a pattern) from the provided data D1 and show its generalization performance in the leaderboard test data D2. This is "data-driven" if you think of the model as a hypothesis, since you did not generate data D1 yourself.

However, generating the data D1 is labeled based on some pre-hypothesis. In other words, the hypothesis is, "With this feature, we can explain this label with a score of P% or better". In this case, it could be called "hypothesis-driven".

Then we might say that the annotation work is hypothesis-driven, and the work we model in the kaggle competition is data-driven. But more fundamentally, the question is: "Why did you think the label would be predictive if you used the feature?". The annotation work uses the hypothesis that we can predict that label. So where did the hypothesis come from?

If the hypothesis is "generated by someone looking at the data trends of a feature," then this is data-driven. And the data used as features may or may not be designed to be labeled as such.

When Alice published a photo online with the title "My Cute Cat", if the photo was scraped and used for cat image classification, Alice didn't think, "I'll share it to contribute to cat image classification (with the hypothesis defined that this photo would allow for cat image classification)."

On the one hand, you can say, "The data collected here will be useful for this kind of hypothesis later on, so let's design a system for it," or "This kind of data column is going to need for this hypothesis".

That's what I think. If someone says "this is the only way to do it", I think it's possible that they're only looking at one side of things. Even this article is just a "hypothesis".

Do you think Twitter envisioned all the natural language processing tasks (i.e., hypotheses) before lunching the service? If so, you are a complete worshiper in hypothesis-driven.

Ultimately, everything is a physical phenomenon, and what is produced by it is data. Did God generate that data on the premise of some hypothesis? In the end, we can observe is not a hypothesis, but data, I think. Or, will the hypothesis itself affect the data being observed?

Remove all ads

仮説駆動とデータ駆動に関する思考実験 [日本語]






すでに図を貼りましたが、関数f1は、最初にあったデータから仮説を「生成した」ことを意味します。関数f2は、生成された仮説を最初とは異なるデータで「検証している」ものです。[0, 1]の範囲は、正規化されたスコアを意味すると考えてください。理想的には、D1とD2は独立である必要があります。








Twitter社は最初の設計において、そのデータから行えるすべての自然言語処理タスク(つまり仮説)をイメージしていたと思いますか? もしそうなら、あなたは完全に仮説駆動の信者と言えるでしょう。

究極的には、すべては物理現象であり、それによって生じたものはデータです。神は何らかの仮説を前提としてそのデータを生成しましたか? 最終的に行き着くところは仮説ではなく、データではないでしょうか。それとも、仮説そのものが観測されるデータに影響を与えるのでしょうか?

Is it possible to build a model of the science fiction world?

As a mentally challenged person, I have certain unusual experiences.

For example, I may be walking down one street, and when I look back, I see that it is a different street. The back of the apartment I was living in became a shrine one day, a school another day, a forest another day, and a concrete tower another day. I wondered if there was a yakiniku restaurant nearby, and suddenly a yakiniku restaurant appeared right in front of me.

After such an experience, it is not surprising that you might have a further delusion, such as,

  • "Perhaps this world has multiple parallel worlds, and if there is a change in the perception in your brain, you will be replaced by your counterpart in another world."
  • "When the entropy of some kind in the brain is at its maximum, it has the possibility to go to another world until the entropy goes down"

It sounds like a kind of sci-fi world. I think this delusion can model to some extent as just mathematics, so I will give it a try.


 t_0 \rightarrow t_1 \rightarrow t_2 , and so on, following various branching possibilities from a point in time to the future. Normally, time cannot be reversed, but given a path  P_I, P_J , the question is: what is a jumpable tuple of  I, J (or a jumpable condition)? Is it possible to jump vertically, but not in the opposite direction? Unfortunately, this is not only in the realm of mathematics, but also the physical laws of this science fiction world must be taken into account.

Also, if there are worlds  W_I = f(P_I) that share the same path  P_I , and  X_I(1),...,X_I(k) \in W_I is each world, is  k finite? Or how many branches from one path to the next,  n(I) . If  W_I \in U , how many  |U| ?

In this way, it seems that the world of science fiction can be mathematized. Of course, this would not be a mathematicalization that would explain the real world, but if we can make a mathematicalization that assumes fictitious physical laws of a totally fictitious world, there may be no guarantee that mathematics will correctly explain the physical laws. I think it would be difficult to find a model that fits without repeated experimentation and observation.

However, "algorithm" is a bit different stuff. We can prove something by creating it! (However, sometimes, we need to prove that we can't create it.)