June 22, 2020 7 min to read

當一回語言律師

目錄： C/C++ Macro 宏系列 | CWKSC’s blog | 博客

上一篇說到宏的 # 和 ## 運算子、一些瑣碎規則和預先定義的宏

有沒有想過，這些規則是從哪裏來

我說的？老師說的？書本或者網站說的？

他們就一定對嗎？像是我上一篇說的那堆規則，說不定是我在騙你（？

當然他們大概都是對的

但有時候我們要試試追根究底

來試試當一回語言律師吧！

▌C/C++ 標准文檔

標準文檔要收費，但草案免費

從 Stack Overflow 中，找到一些免費草案的來源鏈接

草稿版本與標準文檔非常接近，所以可放心食用

ISO/IEC 9899:2018 (C18)

ISO/IEC 14882:2017 (C++17)

雖然這些標準比較新，但是我會偏向看 doc.imzlp.me 的 C11 和 C++14

因爲有左側的導航欄，如果要拿來尋找或者學習也會方便一些

下面會用 ISO/IEC 14882:2017 (C++17) 這個作為基礎

▌預處理指令 Preprocessing directives

根據 § 19/1 ：

A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character. The last token in the sequence is the first new-line character that follows the first token in the sequence.¹⁴⁴ A new-line character ends the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.

144) Thus, preprocessing directives are commonly called “lines”. These “lines” have no other syntactic significance, as all white space is equivalent except in certain situations during preprocessing (see the # character string literal creation operator in 19.3.2, for example).

翻譯：

預處理指令由滿足以下約束的一系列預處理 Token 組成：該序列中的第一個 Token 是 # 預處理 Token，（在翻譯階段4的開始）它是源文件中的第一個字符（可選在不包含換行符的空白之後）或緊隨包含至少一個換行符的空格的空格。序列中的最後一個 Token 是序列中第一個 Token 之後的第一個換行字符。¹⁴⁴ 換行符結束預處理指令，即使該指令出現在本應調用類似函數的宏中。

144) 因此，預處理指令通常稱為“行”。這些“行”沒有其他語法意義，因為在預處理期間的某些情況下，所有空格都是等效的（例如，請參見19.3.2中的＃字符串文字創建運算符）。

▌簡單來講：

預處理指令是以一序列的 Token 組成
序列的第一個 Token 需要是 #
# 可以是源文件的第一個字符，或者不包含換行的空白之後，這意味

#define foo 42
    #define foo2 43 //可以

第二行前面有空格，這是允許的

序列中的最後一個 Token ，是第一個 Token 之後的第一個換行字符
換行符結束預處理指令

▌另一項重要特性在 § 19/6 ：

The preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise stated.

[Example: In:
#define EMPTY
EMPTY # include <file.h>
the sequence of preprocessing tokens on the second line is not a preprocessing directive, because it does not begin with a # at the start of translation phase 4, even though it will do so after the macro EMPTY has been replaced. — end example ]

翻譯：

除非另有說明，否則預處理指令中的預處理 Token 不會進行宏擴展。

#define EMPTY
EMPTY # include <file.h>

第二行的預處理 Token 序列不是預處理指令，因為它在翻譯階段 4 開始時不是以 # 開頭，即使它在替換了 EMPTY 宏之後也是這樣。

它的例子和文字很清楚，不解釋了

這個特性讓我不能 #define 一個 #define ，不然圖靈完全早做出來了ヽ(#`Д´)ﾉ

▌`#` 字符串化運算子

§ 19.3.2 The # operator:

1 Each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list.

2 A character string literal is a string-literal with no prefix. If, in the replacement list, a parameter is immediately preceded by a # preprocessing token, both are replaced by a single character string literal preprocessing token that contains the spelling of the preprocessing token sequence for the corresponding argument. Each occurrence of white space between the argument’s preprocessing tokens becomes a single space character in the character string literal. White space before the first preprocessing token and after the last preprocessing token comprising the argument is deleted. Otherwise, the original spelling of each preprocessing token in the argument is retained in the character string literal, except for special handling for producing the spelling of string literals and character literals: a \ character is inserted before each “ and \ character of a character literal or string literal (including the delimiting “ characters). If the replacement that results is not a valid character string literal, the behavior is undefined. The character string literal corresponding to an empty argument is “”. The order of evaluation of # and ## operators is unspecified.

翻譯：

1 對於類似函數的宏，替換列表中的每個 # 預處理 Token 應後面跟一個參數，作為替換列表中的下一個預處理 Token 。

2 字符串文字是沒有前綴的字符串文字。如果在替換列表中，參數緊跟在 # 預處理 Token 之前，則兩個參數都將被單個字符串文字預處理 Token 替換，該字符串包含相應參數的預處理 Token 序列的拼寫。參數的預處理 Token 之間每次出現空格都將成為字符串文字中的單個空格字符。刪除第一個預處理 Token 之前和最後一個包含參數的預處理 Token 之後的空白。否則，參數中每個預處理 Token 的原始拼寫將保留在字符串文字中，除非進行特殊處理以生成字符串文字和字符文字的拼寫：在字符文字的每個 " 和 \ 字符之前插入一個 \ 字符或字符串文字（包括 " 字符）。如果替換結果不是有效的字符串文字，則該行為未定義。對應於空參數的字符串文字是""。 # 和 ## 運算符的評估順序未指定。

粗體的部分就是上一篇提到的規則：

中間的空白會被壓縮為一個
前後的空白會被忽略
會自動添加逸出反斜線字元，對特殊字符進行轉義

另外提及到 ＃ 和 ## 運算符的評估順序未指定

▌`##` Token 粘貼運算子

§ 19.3.3 The ## operator:

1 A ## preprocessing token shall not occur at the beginning or at the end of a replacement list for either form of macro definition.

2 If, in the replacement list of a function-like macro, a parameter is immediately preceded or followed by a ## preprocessing token, the parameter is replaced by the corresponding argument’s preprocessing token sequence; however, if an argument consists of no preprocessing tokens, the parameter is replaced by a placemarker preprocessing token instead.¹⁵¹

151) Placemarker preprocessing tokens do not appear in the syntax because they are temporary entities that exist only within translation phase 4.

3 For both object-like and function-like macro invocations, before the replacement list is reexamined for more macro names to replace, each instance of a ## preprocessing token in the replacement list (not from an argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token. Placemarker preprocessing tokens are handled specially: concatenation of two placemarkers results in a single placemarker preprocessing token, and concatenation of a placemarker with a non-placemarker preprocessing token results in the non-placemarker preprocessing token. If the result is not a valid preprocessing token, the behavior is undefined. The resulting token is available for further macro replacement. The order of evaluation of ## operators is unspecified.

[Example: In the following fragment:
#define hash_hash # ## #
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define join(c, d) in_between(c hash_hash d)
char p[] = join(x, y); // equivalent to char p[] = "x ## y";
The expansion produces, at various stages:
join(x, y)
in_between(x hash_hash y)
in_between(x ## y)
mkstr(x ## y)
"x ## y"
In other words, expanding hash_hash produces a new token, consisting of two adjacent sharp signs, but this new token is not the ## operator. — end example ]

翻譯：

1 對於任何形式的宏定義，## 預處理 Token 都不應出現在替換列表的開頭或結尾。

2 如果在類似函數的宏的替換列表中，參數緊跟在 ## 預處理 Token 之前或之後，則該參數將被相應參數的預處理 Token 序列替換；但是，如果參數不包含預處理 Token，則該參數將替換為 PlaceMarker 預處理 Token。¹⁵¹

151) PlaceMarker 預處理 Token 不會出現在語法中，因為它們是僅存在於翻譯階段4中的臨時實體。

3 對於類對象和類函數宏調用，在重新檢查替換列表以查找更多要替換的宏名稱之前，將刪除替換列表中的每個 ## 預處理 Token 實例（而不是從參數中），並刪除前面的預處理 Token 與以下預處理 Token 連接。PlaceMarker 預處理 Token 是專門處理的：兩個 PlaceMarker 的串聯會生成一個 PlaceMarker 預處理 Token，而 PlaceMarker 與一個非 PlaceMarker 預處理 Token 的串聯會導致非 PlaceMarker 預處理 Token。如果結果不是有效的預處理 Token，則行為未定義。生成的 Token 可用於進一步的宏替換。 ## 運算符的評估順序未指定。

[示例：在以下片段中：

#define hash_hash # ## #
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define join(c, d) in_between(c hash_hash d)
char p[] = join(x, y); // equivalent to char p[] = "x ## y";

擴展在各個階段產生：

join(x, y)
in_between(x hash_hash y)
in_between(x ## y)
mkstr(x ## y)
"x ## y"

換句話說，擴展 hash_hash 會產生一個新的 Token，該 Token 由兩個相鄰的 # 符號組成，但是此新 Token 不是 ## 運算符。 — 結束示例]

▌簡單來說：

這裏介紹了預處理器的一些處理過程

如果參數在 ## 的前或後，則相應參數將被替換成 Token 序列

# 和參數之間的空白和 ## 前後的空格可有可無也就是因爲這樣，替換成 Token 序列時空格會被忽略

在替換宏之前，會先把 ## 刪除，然後把前面和後面的 Token 連接

所以 # 和 ## 會阻止另一個宏的展開，因為它們處理的步驟先於宏替換

生成的 Token 可用於進一步的宏替換
即使 Token 的内容是 ##，但這只代表這個 Token 由兩個相鄰的 # 符號組成，不是 ## 運算符

▌結語

咬文嚼字其實很累，然而作為平日使用並不需要認識這麼深

我們總算初步認識了宏在 C++ 標準文檔的一些規則

很可惜的是標準文檔也不是全部

也還有一些地方並沒有完全涵蓋（也可能是故意的，提供優化空間

宏這裏剛好就有幾點是不同編譯器之間有不相同的實現和行為

就算熟讀文檔，該踩的坑還是會踩 … = _ =

可變參數宏、泛型宏、重掃描 … 很多東西還未說

前路漫漫哦