Author: eric2i

Notes of “The Age of AI has begun”
On March 21, 2023, Bill Gates posted “The Age of AI has begun” months after the release of the possibly latest and definitely most popular ever Artificial Intelligence product, ChatGPT.

As a former CEO of Microsoft company, Bill still plays an important role in affecting some major businesses of this super corporation, and since Microsoft and OpenAI are at the center of this AI technology boom, it is more than necessary for us to learn how these companies tend to use the technologies they possess and how our life may be affected by them.

According to Mr. Gates’s article, contemporary artificial intelligence (AI)’s achievements are one of two technological revolutions he has seen so far, the other one is the graphical user interface (GUI). And he believes that AI can help “reduce some of the world’s worst inequities.” and be beneficial to human beings in multiple ways.

Productivity Enhancement
- The AI model can change our way of interacting with computing devices: no more clicking and tapping but just ask
- A personal agent: work across all devices to help you schedule, communicate and e-commerce
- Company-wide agent: provide insights for the company and make employee more productive
Health
- Help health-care workers make the most of their timeby taking care of ﬁling insurance claims, dealing with paperwork …
- AIs give patients the ability to do basic triage, get advice about how to deal with health problems, and decide whether they need to seek treatment.
- Accelerate the rate of medical breakthroughs
- Predict side effects and ﬁgure out dosing levels
Education
- Measure your understanding, notice when you’re losing interest, and understand what kind of motivation you respond to. It will give immediate feedback.
- Assist teachers and administrators, including assessing a student’s understanding of a subject and giving advice on career planning.
- Need to be trained on diverse data sets so they are unbiased and reﬂect the different cultures where they’ll be used.
Risks and Problems with AI
- Not good at understanding the context of a human’s request
- Struggle with abstract reasoning
- AI can be used for good purposes or malign ones and runs out of control
Frontiers
- Innovative chips will allow you to run an AI on your own device, rather than in the cloud
- Three principles should guide the public conversation about AI technology:
  - Guard against the risks and spread the beneﬁts to as many people as possible
  - Ensure that AIs are used to reduce inequity
  - Whatever limitations AI has today will be gone before we know it
Now, new tools like Copilot is embedded into some of the most successful software of Microsoft and is enabling programmers and Office user to be much more efficient. It seems that the age of AI is begun, yet, whether such age is benign for everyone still requires further experimentation and investigation.
March 24, 2023
Lexical Analysis
The following contents are my reading notes for Compiler (the Dragon Book)-Chapter III. An implementation of the Lexical analyzer is provided in the last section.

Contents:
- The Role of the Lexical Analysis
- Input Buffering
- Specification of Tokens
- Recognition of Tokens
- LEX
- Finite Automata
- My Implementation
The Role of the Lexical Analyzer

The very first layer of the front end of a compiler’s architecture

Lexical Analysis Versus Parsing

The reason why the analysis portion of a compiler is normally separated into two lexical analysis and parsing:
- (Most Importantly) Simplify the design of compile
- Improve compiler efficiency
- enhance compiler portability
Tokens, Patterns, and Lexems
- TOKEN: a pair consisting of a token name and optional attribute value
- PATTERN: a description of the form that lexemes of a token may take
- LEXEME: s sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token
examples: see page 112.

Attributes for Token
- WHY WE NEED ATTRIBUTES: Since more than one lexemes can match a single pattern, such as both 0 and 1 matches number, it is necessary to use an attribute for a token to distinguish different concrete lexemes
- IMPORTANT EXAMPLE: The most important example is the token id, which we associate with lots of information. Normally the information about an id, such as {lexemes, type, location, …}, is kept in the symbol table. Thus the proper attribute value for an id is a pointer to the symbol-table entry for that identifier
Lexical Errors
- Errors like the typo ”fi”:
  - This is not the error we are discussing here!
  - because: the lexical analyzer cannot tell whether fi is a misspelling keyword if, or an undeclared function identifier. Since **fi *************is a valid lexeme for the token id. Thus, the lexeme analyzer must return the token id to the parser and let some other phases of the compiler handle such errors.
- Errors like the lexical analyzer being unable to proceed due to none of the patterns for tokens matching any prefix of the remaining input
  - These are the errors we are discussing here and there’re several error-recovery actions:
  - delete successive character
  - delete one character
  - insert a missing character
  - replace a character
  - transpose two adjacent characters
Input Buffering
- Buffer Pair techniques:|——BUFFER_i———|——BUFFER_ii———|maintaining two pointers:
  - Pointer lexemeBegin, marks the beginning of the current lexeme
  - Pointer forward scans ahead until a pattern match is found
  By using such technique, for each character read we have to make two tests: end of buffer? && what char is read?
- Improvement SENTINELS:Combining two tests by inserting extra EOF at end of each buffer
```
switch(*forward++) {
	case eof:
		if(forward is at end of first buffer) {
				reload second buffer;
				forward = beginning of the second buffer;
		}
		else if(forward is at the end of second buffer) {
				reload first buffer;
				forward = beginning of the first buffer;
		}
		else /* end of the input source, terminate lexical analysis*/
} 
```
Specification of Tokens

a regular expression, aka regex, is important in specifying patterns we actually need for tokens.

String and languages
- STRING: a string over an alphabet $\Sigma$ is a finite sequence of symbols drawn from that alphabet.
- LANGUAGE: a language is any countable set of strings over some fixed alphabet.
Operations on Languages
- Union $L\cup M = \{s | s \in L \vee s \in M\}$
- Concatenation $LM = \{ st | s \in L \wedge m \in M\}$
- Kleene closure $L^*$
- Positive closure $L^+$
notice that $LM \neq ML$

Regular Expression
- BASIS
- INDUCTION
Extension to Regex
- one or more instances: +
- zero or one instance: ?
- character classes: [0,1,2,3] or [0-3]
Recognition of Tokens

Transition Diagram
- the diagram has a collection of nodes or circles, called state
- Important Conventions of diagram:
  - certain states are said to be accepting or final state
  - if it’s necessary to retract forward pointer one position, we additionally place a * near the accepting state
  - One state is designated the start state or initial state
Reserved Words and Identifier

reserved words and identifiers are very similar, and there are two ways to distinguish them:
- place reserved words in the symbol table beforehand
- prioritize diagrams for each keyword so that reserved-word tokens are recognized in preference to id
Architecture of a Transition-Diagram-Based Lexical Analyzer

such one diagram can be written as a C++ function that returns a structure containing a token and attribute. The function is basically a switch statement or multiway branch that determines the next state.

Examples, see Page135

How do these diagrams work in the big picture of the analyzer?
- one by one, but prioritize keyword diagram
- “in parallel”, take the longest prefix of the input that matches any pattern
- combining diagrams altogether, take the longest prefix of the input that matches any pattern (this combining process can be hard)
LEX

Use of Lex (Pipeline)

Lex source program (lex.l) → [Lex Compile] → lex.yy.c

lex.yy.c → [C compiler] → a.out

Input stream → [a.out] → Sequence of tokens

Structure of Lex Programs

There are basically three sections separated by %% in lex.l file:
- declarations
- translate rules
- auxiliary functions
The I and III sections will be copied directly to lex.yy.c.

Conflict Resolution in Lex
- always prefers a longer prefix to a shorter prefix
- If the longest possible prefix matches two or more patterns, prefer the pattern listed first in the Lex Program
The lookahead operator /

what follows / is additional pattern that must be matched before we can decide that the token in question was seen, but what matches this second pattern is not part of the lexeme.

Finite Automata

automata are essentially graphs, with few diffs:
- finite automata are recognizers, simply say “yes” or “no” about each possible input string
- Finite automata come in two flavors
  - NFA
  - DFA
NFA

consists of:
- A finite SET of states S
- A set of input symbols $\Sigma$
- A transition function
- A state $s_0$ from S that is distinguished as the start state
- A set of states $F$, a subset of S, that is distinguished as the accepting states.
This graph is very much like a transition graph except:
- one symbol can label edges from one state to several states
- an edge may be labeled by $\epsilon$, the empty string $\phi$, symbols from $\Sigma$
Transition Tables

Instead of representing finite automata as a graph, we can use a transition table.

Example, see Page 149

Acceptance of Input String by Automata
- NFA accepts a string as long as exists one path labeled by that string leads from the start state to an accepting state.
- $L(A)$ stands for the language accepted by automaton $A$
DFA

A special case of NFA.
- no move on input $\epsilon$
- for each state s and input symbol a, there is exactly one edge out of s labeled a.
March 4, 2023
More about Matrix – Differentiation

It was not until recently that I realize that matrix differentiation is significantly important when using matrix representation to do computation. After searching for some relevant materials and lecture notes, I found more useful formulas than I expected. Now I list some of them which are pretty handy for me and may possibly be helpful for you one day.

Definition

The notation \(\frac{\partial y}{\partial x}\) =

\[\begin{bmatrix}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\\vdots & \vdots & \vdots \\\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \\\end{bmatrix}\]

denotes the $m \times n$ matrix of first-order partial derivatives of the transformation from $x$ to $y$. Such a matrix is called the Jacobian matrix of the transformation matrix $\varphi ()$, if $y=\varphi(x)$, where $y$ is a $m\times1$ vector and $x$ is a $1\times n$ vector.

Proposition

(1) Let $y=Ax$ and $A$ does not depend on $x$, then:

\[\frac{\partial y}{\partial x}=A\]

(2) Let $y=Ax$ and $x$ be a function of $z$ and $A$ does not depend on $z$, then:

\[\frac{\partial y}{\partial z}=A\frac{\partial x}{\partial z}\]

(3) For scalar $\alpha = y^TAx$ and $A$ is independent of $x, y$, then:

\[\frac{\partial \alpha}{\partial x}=y^TA\]

\[\frac{\partial \alpha}{\partial y}=A^Ty\]

(4) For scalar $\alpha = x^TAx$ and $A$ is independent of $x$, then:

\[\frac{\partial \alpha}{\partial x}=x^T(A+A^T)\]

(5) Based on (4) and now $A$ is a symmetric matrix, then:

\[\frac{\partial \alpha}{\partial x}=2x^T(A)\]

(6) For scalar $\alpha = y^Tx$ and $x, y$ are functions of $z$, then:

\[\frac{\partial \alpha}{\partial z}= x^T\frac{\partial y}{\partial z} + y^T\frac{\partial x}{\partial z}\]

(7) For scalar $\alpha = x^Tx$ and $x$ is functions of $z$, then:

\[\frac{\partial \alpha}{\partial z}=2x^T\frac{\partial x}{\partial z}\]

(8) Let scalar $\alpha = y^TAx$, then:

\[\frac{\partial \alpha}{\partial z} = x^TA^T\frac{\partial y}{\partial z} + y^TA\frac{\partial x}{\partial z}\]

(9) Let scalar $\alpha = x^TAx$ and $x$ be function of $z$:

\[\frac{\partial \alpha}{\partial z} = x^T(A+A^T)\frac{\partial y}{\partial z}\]

(10) Based on (9) and now $A$ is a symmetric matrix, then:

\[\frac{\partial \alpha}{\partial z} = 2x^TA\frac{\partial x}{\partial z}\]

Definition

\[\frac{\partial A}{\partial \alpha} = \begin{bmatrix}\frac{\partial a_{11}}{\partial \alpha} & \frac{\partial a_{12}}{\partial \alpha} & \cdots & \frac{\partial a_{1n}}{\partial \alpha} \\\vdots & \vdots & \vdots \\\frac{\partial a_{m1}}{\partial \alpha} & \frac{\partial a_{m2}}{\partial \alpha} & \cdots & \frac{\partial a_{mn}}{\partial \alpha} \\\end{bmatrix}\]

Proposition

Let $A$ be a nonsingular $m\times m$ matrix, whose elements are functions of scalar parameter $\alpha$, then:

\[\frac{\partial A}{\partial \alpha} = -A^{-1}\frac{\partial A}{\partial \alpha}A^{-1}\]

Summary

This article lists some useful matrix differentiation formulas that inspired me when understanding the least squares approximation of linear systems. Yet, everything is still in the scope of linear algebra and calculus.

October 10, 2022
Understanding Classical Methods of Image Interpolation using Linear Algebra based on Numerical Analysis
This semester I am taking two courses that are related to image and computer vision and I started to better understand that some amazing applications require even basic mathematical tools that we learned in the first two years in college. Yet many open source project now enables us to do lots of image analysis without a deep understanding of how basic mathematics work underneath, It is still fruitful and inspiring for me to all of sudden connect all the dots that we once collected.

Here, I want to introduce one of the examples – Image Interpolation.

What is image interpolation and why that is important?

It is convenient for us to zoom an image(say transform a \(500\times500\) image to \(1000\times1000)\) thanks to the help of the digital device. But when an image is zoomed up, there are actually extra pixels inserted into the original one, and how to color those extra pixels is a critical task called image interpolation.

A naive method

A reasonable image is always locally continuous. That’s saying that we are expecting neighboring pixels have similar colors. Thus, a naive way to fill one extra pixel is by copying the color from the nearest pixel. Noticing that there is still one more problem undefined – how to find the nearest pixel?

Fig1

Let’s assume we have two pixel coordinates (see Fig1, one is \(s\) for the original image and another is \(S\) for the zoomed image. And the image is scaled by \(k\) times.

\(\forall (u, v) \in S\), the corresponding pixel in s is \((\frac{u}{k}, \frac{v}{k})\) and four candidate neighboring pixels in s to are: \((i, j), (i, j+1), (i+1, j), (i+1,j+1)\), where \(i = floor(\frac{u}{k})\) and \(j = floor(\frac{v}{k})\). And Define \(\Delta = (\Delta_i, \Delta_j) = (\frac{u}{k} – i, \frac{v}{k} – j)\)

Finally, the distance we are discussing here is the euclidean distance. Now we could find the nearest pixel to POI.

An improvement

Since we know how close each candidate pixel and my POI are, why not use a weighted average to gain a better estimation?

If possible, then how to select weights wisely? Intuitively, we can use the area as a set of weights (which requires fewer computations than euclidean distance and all the weights (total area) magically sum up to 1!).

Fig2

Bilinear Image Interpolation

In another word, the technique we get from the improvement above is:

$$f(i+\Delta_i, j+\Delta_j) = \sum_{i, j} f(i,j)S_{i,j}$$

And it also has a fancy name – bilinear image interpolation.

Mathematical Foundation Behind

It idea of the weighted average is natural and intuitive, yet we can find a solid mathematical foundation from numerical analysis – Interpolation.

Considering one dimension, we have multiple ways to find the interpolation function giving a set of points. For instance, if we are given two points \((x_i, f_i)\), \((x_{i+1}, f_{i+1})\) then using linear polynomial we could estimate \(f(x), \forall x in [i, i+1]\). The Lagrange interpolation form can be written as:

$$f(x) = f_{i}\frac{x-x_{i+1}}{(x_{i} – x_{i+1})} + f_{i+1}\frac{x-x_{i}}{(x_{i+1} – x_{i})}$$

In two dimensions, we could use the same trick three times to estimate our POI (See Fig3).
1. estimate \( f_{i, j+\Delta_{j}} \) based on \(f_{i, j} , f_{i,j+1}\)
2. estimate \(f_{i+1, j+\Delta_j}\) based on \(f_{i+1, j} , f_{i+1,j+1}\)
3. estimate \(f_{i+\Delta_i, j+\Delta_j}\) based on \(f_{i, j+\Delta_j} , f_{i+1,j+\Delta_j}\)
Fig3

Consider using linear algebra form to represent the Lagrange interpolation process steps above:

$$f_{i, j+\Delta_j} =\begin{bmatrix}f_{i, j} & f_{i,j+1}\end{bmatrix}\begin{bmatrix}1-\Delta_j \\ \Delta_j\end{bmatrix}$$

$$f_{i+1, j+\Delta_j} =\begin{bmatrix}f_{i+1, j} & f_{i+1,j+1}\end{bmatrix}\begin{bmatrix}1-\Delta_j \\ \Delta_j\end{bmatrix}$$

$$f_{i+\Delta_i, j+\Delta_j} =\begin{bmatrix}1-\Delta_i & \Delta_i\end{bmatrix}\begin{bmatrix}f_{i+1, j+\Delta_j} \\ f_{i,j+\Delta_j}\end{bmatrix}$$

To sum up, the pixel value \(f_{i+\Delta_i, j+\Delta_j}\) can be written as:

\[f_{i+\Delta_i, j+\Delta_j} = \begin{bmatrix}1-\Delta_i & \Delta_i\end{bmatrix}\begin{bmatrix}f_{i,j} & f_{i,j+1} \\ f_{i+1, j} & f_{i+1, j+1} \end{bmatrix}\begin{bmatrix}1-\Delta_j \\ \Delta_j\end{bmatrix}\]

Noticing how amazing we could arrange those four pixels’ intensity in the middle matrix(say \(F\)) as if they are in the \(s\) coordinate system(See Fig4).

Fig4

Now it is relatively easy to see that:

\[f_{i+\Delta_i, j+\Delta_j}=(1-\Delta_i)(1-\Delta_j)\cdot f_{i, j}+\Delta_{i}(1-\Delta_{j})\cdot f_{i, j+1}+(1-\Delta_{i})\Delta_{j}\cdot f_{i+1,j}+\Delta_{i}\Delta_{j}\cdot f_{i+1, j+1}\\=\sum_{i,j}S_{i,j}f(i,j)\]

which proves our intuitive idea at the beginning.

Bicubic Image Interpolation

Fig5

Bicubic image interpolation is trying to consider more neighboring pixels [latex](4\times 4)[/latex]when estimating our POI. Noticing the matrix form in the bilinear interpolation section above, without loss of generality, the universal model we are adapting is actually:

$$f_{i,j}=W(\Delta_j)^{T}FW(\Delta_i)$$

In one dimension, considering the Lagrange Interpolation function based on cubic polynomial given 4 points:

$$f(x)=\sum_{k=i-1}^{i+2}f_{k}\frac{\Pi_{l=i-1, l\neq k}^{i+2}(x-x_{l})}{\Pi_{l=i-1, l\neq k}^{i+2}(x_{k}-x_{l})}$$

Taking \(x=i+\Delta_i, x_{k} = k\) and simplifying the right-hand part of the equation above, we can see that the weight vector \(W(\Delta_i)\) should be:

\[\begin{bmatrix}-\frac{1}{6}\Delta_i(\Delta_i-1)(\Delta_i-2) \\ \frac{1}{2}(\Delta_i+1)(\Delta_i-1)(\Delta_i-2)\\-\frac{1}{2}(\Delta_i+1)\Delta_i(\Delta_i-2) \\ \frac{1}{6}(\Delta_i+1)\Delta_i(\Delta_i-1)\end{bmatrix}\]

Now, we have a clear and neat equation for estimating POI using the bicubic method.

Summary

This article shows a new way to understand image interpolation and mathematics from numerical analysis behind the scene. A unified matrix form of pixel interpolation is presented and also raised a feasible algorithm using the Lagrange interpolation formula to obtain weights for neighboring pixels.
October 2, 2022