Mathematical Representation of Knowledge

Author: Zhiming Ou

Publisher: 3265 Public Way

ISBN: 978-1-997036-03-6

Summary

Knowledge is usually understood as information that has been understood, theorized, and can be used for prediction. Information is organized or meaningful data. Knowledge often involves understanding relationships and the ability to apply the information. It is the result of mental activities. The bottom concept is data: when organized, they form useful information; when understood, they become the knowledge of a person. When a person knows how to predict something, he/she has wisdom.

A person can receive many types of information obtained by the sensory organs. Physically, it can be described as the states of the observed object: size, weight, color, smell, density, temperature, phase-changing points, components, etc. Physical signals are converted to numbers through measurements, and then stored in a magnetic field as an electric signal in the binary digital form. We can use feature vector to represent these macroscopic features; and use tensor notation to record the numbers, and operator notation to represent the actions taken on these data.

In the social level, human receive and process information in the form of image and sound. An image is usually stored as a grid of tiny picture elements called pixels. A photograph is divided into many small squares; Each square (pixel) stores color information. For color images, computers often use the RGB model; pixel = (r, g, b), where each color intensity ranges from 0 to 255. For example (255, 0, 0) is pure red. Resolution matters. More pixels, more details, and larger file.

Sound is a continuous wave in air pressure. A vibration can be determined by its frequency and amplitude, (f, A), since the wavelength can be determined by the speed in the medium. Continuous vibration can be converted into discrete data through sampling. The rate of sampling is the number of measurements per unit time (usually per second). For a vibration with frequency f in the range (0, F), if the sampling rate is higher than 1/F, then the continuous wave can be uniquely determined by the Fourier series composed of sin(2πft) and cos(2πft). The coefficients are the amplitudes. When the frequency is known, an electronic device can record the amplitudes per second is a sequence: A1, A2, …, Am, m is the number of measurements taken per second.

The biological information is engraved in the genes, or segments in a DNA molecule (like beads on a string). In the molecular level, we need to record the molecule’s shape (geometry and bonding) and structure (how are the components linked), and understand its chemical properties: why and how they change and interact with other molecules; their nutritious and medical function to human. No person can archive the 290 million registered chemical compounds, maybe only AI can analyze a protein in a short time.

For written text information, modern language models split sentences into tokens, and each token is expressed numerically. These tokens can be combined into sets of meaningful sequences, according to the relationship between tokens and the grammar. Using the rules for logical reasoning, which can be realized by the set operations, AI can infer important conclusions.

Each type of information can be expressed as a set of codes: numbers and symbols. These codes have a certain statistical distribution. We are required to reveal the pattern of the distribution through some training; this can be done by Machine Learning. The statistical inferences can be done by intelligent artifice, or executable algorithms. Once the algorithm is given, the codes can be written by AI again. It is the two purposes of this project: express information mathematically by sets of numbers and symbols, and then design the algorithms to operate on these sets.

 

Content

Part 1 Tensors and Operators

Part 2 Representation of Physical information

Part 3 Biochemical information

Part 4 Text information

Part 5 Understanding through Training

Part 6 Collection of Algorithms