難度級別： ?????

重構

? After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art. ?
— Frédéric Chopin

深入

就算是竭盡了全力編寫全面的單元測試，還是會遇到錯誤。我所說的“錯誤”是什么意思？錯誤是尚未寫到的測試實例。

>>> import roman7
>>> roman7.from_roman('') ①
0

這就是錯誤。和其它無效羅馬數字的一系列字符一樣，空字符串將引發 InvalidRomanNumeralError 例外。

在重現該錯誤后，應該在修復前寫出一個導致該失敗情形的測試實例，這樣才能描述該錯誤。

class FromRomanBadInput(unittest.TestCase):  
    .
    .
    .
    def testBlank(self):
        '''from_roman should fail with blank string'''
        self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, '') ①

這段代碼非常簡單。通過傳入一個空字符串調用 from_roman() ，并確保其引發一個 InvalidRomanNumeralError 例外。難的是發現錯誤；找到了該錯誤之后對它進行測試是件輕松的工作。

由于代碼有錯誤，且有用于測試該錯誤的測試實例，該測試實例將會導致失敗：

you@localhost:~/diveintopython3/examples$ python3 romantest8.py -v
from_roman should fail with blank string ... FAIL
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok

======================================================================
FAIL: from_roman should fail with blank string
----------------------------------------------------------------------
Traceback (most recent call last):
  File "romantest8.py", line 117, in test_blank
    self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, '')
AssertionError: InvalidRomanNumeralError not raised by from_roman

----------------------------------------------------------------------
Ran 11 tests in 0.171s

FAILED (failures=1)

現在可以修復該錯誤了。

def from_roman(s):
    '''convert Roman numeral to integer'''
    if not s:                                                                  ①
        raise InvalidRomanNumeralError('Input can not be blank')
    if not re.search(romanNumeralPattern, s):
        raise InvalidRomanNumeralError('Invalid Roman numeral: {}'.format(s))  ②

    result = 0
    index = 0
    for numeral, integer in romanNumeralMap:
        while s[index:index+len(numeral)] == numeral:
            result += integer
            index += len(numeral)
    return result

只需兩行代碼：一行明確地對空字符串進行檢查，另一行為 raise 語句。
在本書中還尚未提到該內容，因此現在讓我們講講字符串格式化最后一點內容。從 Python 3.1 起，在格式化標示符中使用位置索引時可以忽略數字。也就是說，無需使用格式化標示符 {0} 來指向 format() 方法的第一個參數，只需簡單地使用 {} 而 Python 將會填入正確的位置索引。該規則適用于任何數量的參數；第一個 {} 代表 {0}，第二個 {} 代表 {1}，以此類推。

you@localhost:~/diveintopython3/examples$ python3 romantest8.py -v
from_roman should fail with blank string ... ok  ①
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok

----------------------------------------------------------------------
Ran 11 tests in 0.156s

OK  ②

現在空字符串測試實例通過了測試，也就是說錯誤被修正了。
所有其它測試實例仍然可以通過，說明該錯誤修正沒有破壞其它部分。代碼編寫結束。

用此方式編寫代碼將使得錯誤修正變得更困難。簡單的錯誤（像這個）需要簡單的測試實例；復雜的錯誤將會需要復雜的測試實例。在以測試為中心的環境中，由于必須在代碼中精確地描述錯誤（編寫測試實例），然后修正錯誤本身，看起來好像修正錯誤需要更多的時間。而如果測試實例無法正確地通過，則又需要找出到底是修正方案有錯誤，還數測試實例本身就有錯誤。然而從長遠看，這種在測試代碼和經測試代碼之間的來回折騰是值得的，因為這樣才更有可能在第一時間修正錯誤。同時，由于可以對新代碼輕松地重新運行所有測試實例，在修正新代碼時破壞舊代碼的機會更低。今天的單元測試就是明天的回歸測試。

控制需求變化

為了獲取準確的需求，盡管已經竭力將客戶“釘”在原地，并經歷了反復剪切、粘貼的痛苦，但需求仍然會變化。大多數客戶在看到產品之前不知道自己想要什么，而且就算知道，他們也不擅長清晰地表述自己的想法。而即便擅長表述，他們在下一個版本中也會提出更多要求。因此，必須隨時準備好更新測試實例以應對需求變化。

舉個例子來說，假定我們要擴展羅馬數字轉換函數的能力范圍。正常情況下，羅馬數字中的任何一個字符在同一行中不得重復出現三次以上。但羅馬人卻愿意該規則有個例外：通過一行中的 4 個 M 字符來代表 4000 。進行該修改后，將會把可轉換數字的范圍從 1..3999 拓展為 1..4999。但首先必須對測試實例進行一些修改。

[download roman8.py]

class KnownValues(unittest.TestCase):
    known_values = ( (1, 'I'),
                      .
                      .
                      .
                     (3999, 'MMMCMXCIX'),
                     (4000, 'MMMM'),                                      ①
                     (4500, 'MMMMD'),
                     (4888, 'MMMMDCCCLXXXVIII'),
                     (4999, 'MMMMCMXCIX') )

class ToRomanBadInput(unittest.TestCase):
    def test_too_large(self):
        '''to_roman should fail with large input'''
        self.assertRaises(roman8.OutOfRangeError, roman8.to_roman, 5000)  ②

    .
    .
    .

class FromRomanBadInput(unittest.TestCase):
    def test_too_many_repeated_numerals(self):
        '''from_roman should fail with too many repeated numerals'''
        for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):     ③
            self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s)

    .
    .
    .

class RoundtripCheck(unittest.TestCase):
    def test_roundtrip(self):
        '''from_roman(to_roman(n))==n for all n'''
        for integer in range(1, 5000):                                    ④
            numeral = roman8.to_roman(integer)
            result = roman8.from_roman(numeral)
            self.assertEqual(integer, result)

現有的已知數值不會變（它們依然是合理的測試數值），但必須在 4000 范圍之內（外）增加一些。在此，我已經添加了 4000 (最短)、 4500 (第二短)、 4888 (最長) 和 4999 (最大)。
“過大值輸入” 的定義已經發生了變化。該測試用于通過傳入 4000 調用 to_roman() 并期望引發一個錯誤；目前 4000-4999 是有效的值，必須將該值調整為 5000 。
“太多重復數字”的定義也發生了變化。該測試通過傳入 'MMMM' 調用 from_roman() 并預期發生一個錯誤；目前 MMMM 被認定為有效的羅馬數字，必須將該條件修改為 'MMMMM' 。
對范圍內的每個數字進行完整循環測試，從 1 到 3999。由于范圍已經進行了拓展，該 for 循環同樣需要修改為以 4999 為上限。

現在，測試實例已經按照新的需求進行了更新，但代碼還沒有，因按照預期，某些測試實例將返回失敗結果。

you@localhost:~/diveintopython3/examples$ python3 romantest9.py -v
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ERROR          ①
to_roman should give known result with known input ... ERROR            ②
from_roman(to_roman(n))==n for all n ... ERROR                          ③
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok

======================================================================
ERROR: from_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
  File "romantest9.py", line 82, in test_from_roman_known_values
    result = roman9.from_roman(numeral)
  File "C:\home\diveintopython3\examples\roman9.py", line 60, in from_roman
    raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s))
roman9.InvalidRomanNumeralError: Invalid Roman numeral: MMMM

======================================================================
ERROR: to_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
  File "romantest9.py", line 76, in test_to_roman_known_values
    result = roman9.to_roman(integer)
  File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
    raise OutOfRangeError('number out of range (must be 0..3999)')
roman9.OutOfRangeError: number out of range (must be 0..3999)

======================================================================
ERROR: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
  File "romantest9.py", line 131, in testSanity
    numeral = roman9.to_roman(integer)
  File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
    raise OutOfRangeError('number out of range (must be 0..3999)')
roman9.OutOfRangeError: number out of range (must be 0..3999)

----------------------------------------------------------------------
Ran 12 tests in 0.171s

FAILED (errors=3)

一旦遇到 'MMMM'，from_roman() 已知值測試將會失敗，因為 from_roman() 仍將其視為無效羅馬數字。
一旦遇到 4000，to_roman() 已知值測試將會失敗，因為 to_roman() 仍將其視為超范圍數字。
而往返（譯注：指在普通數字和羅馬數字之間來回轉換）檢查遇到 4000 時也會失敗，因為 to_roman() 仍認為其超范圍。

現在，我們有了一些由新需求導致失敗的測試實例，可以考慮修正代碼讓它與新測試實例一致起來。（剛開始編寫單元測試的時候，被測試代碼絕不會在測試實例“之前”出現確實讓人感覺有點怪。）盡管編碼工作被置后安排，但還是不少要做的事情，一旦與測試實例相符，編碼工作就可以結束了。一旦習慣單元測試后，您可能會對自己曾在編程時不進行測試感到很奇怪。）

[download roman9.py]

roman_numeral_pattern = re.compile('''
    ^                   # beginning of string
    M{0,4}              # thousands - 0 to 4 Ms  ①
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs),
                        #            or 500-800 (D, followed by 0 to 3 Cs)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs),
                        #        or 50-80 (L, followed by 0 to 3 Xs)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is),
                        #        or 5-8 (V, followed by 0 to 3 Is)
    $                   # end of string
    ''', re.VERBOSE)

def to_roman(n):
    '''convert integer to Roman numeral'''
    if not (0 < n < 5000):                        ②
        raise OutOfRangeError('number out of range (must be 1..4999)')
    if not isinstance(n, int):
        raise NotIntegerError('non-integers can not be converted')

    result = ''
    for numeral, integer in roman_numeral_map:
        while n >= integer:
            result += numeral
            n -= integer
    return result

def from_roman(s):
    .
    .
    .

根本無需對 from_roman() 函數進行任何修改。唯一需要修改的是 roman_numeral_pattern 。仔細觀察下，將會發現我已經在正則表達式的第一部分中將 M 字符的數量從 3 優化為 4 。該修改將允許等價于 4999 而不是 3999 的羅馬數字。實際的 from_roman() 函數完全是通用的；它只查找重復的羅馬數字字符并將它們加起來，而不關心它們重復了多少次。之前無法處理 'MMMM' 的唯一原因是我們通過正則表達式匹配明確地阻止了它這么做。
to_roman() 函數只需在范圍檢查中進行一個小改動。將之前檢查 0 < n < 4000 的地方現在修改為檢查 0 < n < 5000 。同時修改 引發 的錯誤信息，以體現新的可接受范圍 (1..4999 取代 1..3999) 。無需對函數剩下部分進行任何修改；它已經能夠應對新的實例。（它將對找到的每個千位增加 'M' ；如果給定 4000，它將給出 'MMMM'。之前它不這么做的唯一原因是我們通過范圍檢查明確地阻止了它。）

所需做的就是這兩處小修改，但你可能會有點懷疑。嗨，別光聽我說，你自己看看吧。

you@localhost:~/diveintopython3/examples$ python3 romantest9.py -v
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok

----------------------------------------------------------------------
Ran 12 tests in 0.203s

OK  ①

所有測試實例均通過了。代碼編寫結束。

全面單元測試的意思是：無需依賴某個程序員來說“相信我吧。”

重構

關于全面單元測試，最美妙的事情不是在所有的測試實例通過后的那份心情，也不是別人抱怨你破壞了代碼，而你通過實踐證明自己沒有時的快感。單元測試最美妙之處在于它給了你大刀闊斧進行重構的自由。

重構是修改可運作代碼，使其表現更佳的過程。通常，“更佳”指的是“更快”，但它也可能指的是“占用更少內存“、”占用更少磁盤空間“或者”更加簡潔”。對于你的環境、你的項目來說，無論重構意味著什么，它對程序的長期健康都至關重要。

本例中，“更佳”的意思既包括“更快”也包括“更易于維護”。具體而言，因為用于驗證羅馬數字的正則表達式生澀冗長，該 from_roman() 函數比我所希望的更慢，也更加復雜。現在，你可能會想，“當然，正則表達式就又臭又長的，難道我有其它辦法驗證任意字符串是否為羅馬數字嗎？”

答案是：只針對 5000 個數進行轉換；為什么不知建立一個查詢表呢？意識到 根本不需要使用正則表達式 之后，這個主意甚至變得更加理想了。在建立將整數轉換為羅馬數字的查詢表的同時，還可以建立將羅馬數字轉換為整數的逆向查詢表。在需要檢查任意字符串是否是有效羅馬數字的時候，你將收集到所有有效的羅馬數字。“驗證”工作簡化為一個簡單的字典查詢。

最棒的是，你已經有了一整套單元測試。可以修改模塊中一半以上的代碼，而單元測試將會保持不變。這意味著可以向你和其他人證明：新代碼運作和最初的一樣好。

[download roman10.py]

class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
class InvalidRomanNumeralError(ValueError): pass

roman_numeral_map = (('M',  1000),
                     ('CM', 900),
                     ('D',  500),
                     ('CD', 400),
                     ('C',  100),
                     ('XC', 90),
                     ('L',  50),
                     ('XL', 40),
                     ('X',  10),
                     ('IX', 9),
                     ('V',  5),
                     ('IV', 4),
                     ('I',  1))

to_roman_table = [ None ]
from_roman_table = {}

def to_roman(n):
    '''convert integer to Roman numeral'''
    if not (0 < n < 5000):
        raise OutOfRangeError('number out of range (must be 1..4999)')
    if int(n) != n:
        raise NotIntegerError('non-integers can not be converted')
    return to_roman_table[n]

def from_roman(s):
    '''convert Roman numeral to integer'''
    if not isinstance(s, str):
        raise InvalidRomanNumeralError('Input must be a string')
    if not s:
        raise InvalidRomanNumeralError('Input can not be blank')
    if s not in from_roman_table:
        raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s))
    return from_roman_table[s]

def build_lookup_tables():
    def to_roman(n):
        result = ''
        for numeral, integer in roman_numeral_map:
            if n >= integer:
                result = numeral
                n -= integer
                break
        if n > 0:
            result += to_roman_table[n]
        return result

    for integer in range(1, 5000):
        roman_numeral = to_roman(integer)
        to_roman_table.append(roman_numeral)
        from_roman_table[roman_numeral] = integer

build_lookup_tables()

讓我們打斷一下，進行一些剖析工作。可以說，最重要的是最后一行：

build_lookup_tables()

可以注意到這是一次函數調用，但沒有 if 語句包裹住它。這不是 if __name__ == '__main__' 語塊；模塊被導入時 它將會被調用。（重要的是必須明白：模塊將只被導入一次，隨后被緩存了。如果導入一個已導入模塊，將不會導致任何事情發生。因此這段代碼將只在第一此導入時運行。）

那么，該 build_lookup_tables() 函數究竟進行了哪些操作呢?很高興你問這個問題。

to_roman_table = [ None ]
from_roman_table = {}
.
.
.
def build_lookup_tables():
    def to_roman(n):                                ①
        result = ''
        for numeral, integer in roman_numeral_map:
            if n >= integer:
                result = numeral
                n -= integer
                break
        if n > 0:
            result += to_roman_table[n]
        return result

    for integer in range(1, 5000):
        roman_numeral = to_roman(integer)          ②
        to_roman_table.append(roman_numeral)       ③
        from_roman_table[roman_numeral] = integer

這是一段聰明的程序代碼……也許過于聰明了。上面定義了 to_roman() 函數；它在查詢表中查找值并返回結果。而 build_lookup_tables() 函數重定義了 to_roman() 函數用于實際操作（像添加查詢表之前的例子一樣）。在 build_lookup_tables() 函數內部，對 to_roman() 的調用將會針對該重定義的版本。一旦 build_lookup_tables() 函數退出，重定義的版本將會消失 — 它的定義只在 build_lookup_tables() 函數的作用域內生效。
該行代碼將調用重定義的 to_roman() 函數，該函數實際計算羅馬數字。
一旦獲得結果（從重定義的 to_roman() 函數），可將整數及其對應的羅馬數字添加到兩個查詢表中。

查詢表建好后，剩下的代碼既容易又快捷。

def to_roman(n):
    '''convert integer to Roman numeral'''
    if not (0 < n < 5000):
        raise OutOfRangeError('number out of range (must be 1..4999)')
    if int(n) != n:
        raise NotIntegerError('non-integers can not be converted')
    return to_roman_table[n]                                            ①

def from_roman(s):
    '''convert Roman numeral to integer'''
    if not isinstance(s, str):
        raise InvalidRomanNumeralError('Input must be a string')
    if not s:
        raise InvalidRomanNumeralError('Input can not be blank')
    if s not in from_roman_table:
        raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s))
    return from_roman_table[s]                                          ②

像前面那樣進行同樣的邊界檢查之后，to_roman() 函數只需在查詢表中查找并返回適當的值。
同樣，from_roman() 函數也縮水為一些邊界檢查和一行代碼。不再有正則表達式。不再有循環。O(1) 轉換為或轉換到羅馬數字。

但這段代碼可以運作嗎？為什么可以，是的它可以。而且我可以證明。

you@localhost:~/diveintopython3/examples$ python3 romantest10.py -v
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok

----------------------------------------------------------------------
Ran 12 tests in 0.031s                                                  ①

OK

它不僅能夠回答你的問題，還運行得非常快！好象速度提升了 10 倍。當然，這種比較并不公平，因為此版本在導入時耗時更長（在建造查詢表時）。但由于只進行一次導入，啟動的成本可以由對 to_roman() 和 from_roman() 函數的所有調用攤薄。由于該測試進行幾千次函數調用（來回單獨測試上萬次），節省出來的效率成本得以迅速提升！

這個故事的寓意是什么？

簡單是一種美德。
特別在涉及到正則表達式的時候。
單元測試令你在進行大規模重構時充滿自信。

摘要

單元測試是一個威力強大的概念，如果正確實施，不但可以降低維護成本，還可以提高長期項目的靈活性。但同時還必須明白：單元測試既不是靈丹妙藥，也不是解決問題的魔術，更不是銀彈。編寫良好的測試實例非常艱難，確保它們時刻保持最新必須成為一項紀律（特別在客戶要求關鍵錯誤修正時）。單元測試不是功能測試、集成測試或用戶承受能力測試等其它測試的替代品。但它是可行的、行之有效的，見識過其功用后，你將對之前曾沒有用它而感到奇怪。

這幾章覆蓋的內容很多，很大一部分都不是 Python 所特有的。許多語言都有單元測試框架，但所有框架都要求掌握同一基本概念：

設計測試實例是件具體、自動且獨立的工作。
在編寫被測試代碼之前編寫測試實例。
編寫用于檢查好輸入并驗證正確結果的測試
編寫用于測試“壞”輸入并做出正確失敗響應的測試。
編寫并更新測試實例以反映新的需求
毫不留情地重構以提升性能、可擴展性、可讀性、可維護性及任何缺乏的特性。

? ?

? 2001–9 Mark Pilgrim

亚洲欧美在线