Python 深入解析 List 的Append

Eden Chen
8 min readMay 30, 2020

--

List可以說是Python最引人入勝的地方!他有所謂的一行操作(List Comprehension). List 是sequence type, 然後就如一般的array來說 … 他可以直接透過index來取用其位置上的值,看個🌰吧

```python3
a = [1,2,3,4,5]
a[0] # 1
# another way
a = list(range(1,6))
a = [elem for elem in range(1,6)]

要記得的是,range是一個 iterator, 因此, 要透過list函數來轉換或者直接用for來解 … 否則你直接拿到的就是一個iterator! 請看下面速解.

>>> a = range(1,6)
>>> a
range(1, 6)
>>> type(a)
<class 'range'>

list很多操作是可以被numpy.array來加速的!但是這裡,誠如標題我們想探討的只有append這個行為!

請至附錄[Code_Test_Append]來看本次驗證的程式碼以及結果. 結論: 不要用numpy.array做append! 雖然結果會依個人環境而跑出不同的時間 … 但基本上,用numpy.array做append這件事情,確定是不理想的行為.

至於deque跟list的差距, 我有試過在python2.7.16 deque是比list還要快上許多的!不過, 如果用了Python 3.8.3的話, 將不會有任何的差距!

另外補充如果在可以跳脫 append的前提下,請盡量的使用list comprehension. 速度上存在顯著的差異!當然,我知道當你如果是在收資料,只能等資料打過來的前提下,還是只能用append.

接著,說到收資料 … 我想大家會有個疑惑說 … 奇怪append 不是 O 1? 為什麼我會越收越慢? 這是來自於Python的Garbage Collector的問題. 我也做了相仿的測試請至Code_Append_Big_Size來檢閱測試用程式碼, 不過我把時間取了平均分別跑了 10,000/ 100,000/ 1,000,000 次. 雖然,結果並沒有作者做出來那麼可觀的差異(3rd, July, 2020) … 但還是可以清楚的看到是存在著差異的!因此,如果在要做類似的操作想要再加速的話,這是一個推薦的做法.

那篇Stackoverflow的文章作者提到, 當Python GC在做運行的時候, 會去掃過每一個list裡面的object來決定是否要free這個memory. 當然, 他也有提到這篇文章已久,這個bug可能已被修正. 這就是為什麼, 經過測試之後, 還是有差但結果已不顯著像是 899 -> 18.6這樣!另外,這邊補充可以看看append背後GC的模式.

附錄

Code_Test_Append

# ENV
# Python 3.8.3
#[Clang 11.0.3 (clang-1103.0.32.59)] on darwin
from collections import deque
import numpy as np
import time
def testNumpyAppend(num):
arr = np.array([])
for _ in range(num):
np.append(arr, "x0y0", axis=None)
def testListAppend(num):
arr = []
for _ in range(num):
arr += ["x0y0"]
def testListCompreAppend(num):
arr = ["x0y0" for _ in range(num)]
def testDequeAppend(num):
deq = deque()
for _ in range(num):
deq.append( "x0y0" )
TEST_NUMBER = 100000print(f"RUN {TEST_NUMBER} times")
startNumpyTime = time.time()
testNumpyAppend(TEST_NUMBER)
execution_time = time.time() - startNumpyTime
print (f'Numpy\t-> {(execution_time*1000):0.3f}ms')
startListTime = time.time()
testListAppend(TEST_NUMBER)
execution_time = time.time() - startListTime
print (f'List\t-> {(execution_time*1000):0.3f}ms')
startListTime = time.time()
testListCompreAppend(TEST_NUMBER)
execution_time = time.time() - startListTime
print (f'ListCmp\t-> {(execution_time*1000):0.3f}ms')
startDequeTime = time.time()
testDequeAppend(TEST_NUMBER)
execution_time = time.time() - startDequeTime
print (f'Deque\t-> {(execution_time*1000):0.3f}ms')
"""
@ 1,000 result
Numpy -> 19.252ms
List -> 0.431ms
ListCmp -> 0.138ms
Deque -> 0.243ms
@ 5,000 result
Numpy -> 64.615ms
List -> 0.753ms
ListCmp -> 0.454ms
Deque -> 0.725ms
@ 10,000 result
Numpy -> 118.736ms
List -> 1.534ms
ListCmp -> 0.841ms
Deque -> 1.366ms
@ 100,000 result
Numpy -> 1148.628ms
List -> 11.774ms
ListCmp -> 6.079ms
Deque -> 11.760ms
"""

Code_Append_Big_Size

import time
import gc
class Test:
def __init__(self):
self.number_one = 1
self.number_two = 2
self.explain = 'whatever string you like'
def time_to_append(sz, item_gen):
test_list = []
t0 = time.time()
for _ in range(sz):
test_list.append(item_gen())
return time.time() - t0
def test(append_size, times):
ct = 0
for _ in range(times):
ct += time_to_append(append_size, lambda: Test())
print(f"Average time cost with GC -> {round(ct/times,3)} ms")
def test_nogc(append_size, times):
ct = 0
for _ in range(times):
gc.disable()
ct += time_to_append(append_size, lambda: Test())
gc.enable()
print(f"Average time cost without GC -> {round(ct/times,3)} ms")
SIZE = 1000000
TIMES = 100
test(SIZE, TIMES)
test_nogc(SIZE, TIMES)
"""
@ 10,000 result
Average time cost with GC -> 0.011 ms
Average time cost without GC -> 0.009 ms
@ 100,000 result
Average time cost with GC -> 0.122 ms
Average time cost without GC -> 0.096 ms
@ 1,000,000 result
Average time cost with GC -> 1.668 ms
Average time cost without GC -> 1.013 ms
"""

--

--

Eden Chen

What I cannot create, I do not understand / Software Engineer @Hiveventures.io