List可以說是Python最引人入勝的地方!他有所謂的一行操作(List Comprehension). List 是sequence type, 然後就如一般的array來說 … 他可以直接透過index來取用其位置上的值,看個🌰吧
```python3
a = [1,2,3,4,5]
a[0] # 1# another way
a = list(range(1,6))
a = [elem for elem in range(1,6)]
要記得的是,range是一個 iterator, 因此, 要透過list函數來轉換或者直接用for來解 … 否則你直接拿到的就是一個iterator! 請看下面速解.
>>> a = range(1,6)
>>> a
range(1, 6)
>>> type(a)
<class 'range'>
list很多操作是可以被numpy.array來加速的!但是這裡,誠如標題我們想探討的只有append這個行為!
請至附錄[Code_Test_Append]來看本次驗證的程式碼以及結果. 結論: 不要用numpy.array做append! 雖然結果會依個人環境而跑出不同的時間 … 但基本上,用numpy.array做append這件事情,確定是不理想的行為.
至於deque跟list的差距, 我有試過在python2.7.16 deque是比list還要快上許多的!不過, 如果用了Python 3.8.3的話, 將不會有任何的差距!
另外補充如果在可以跳脫 append的前提下,請盡量的使用list comprehension. 速度上存在顯著的差異!當然,我知道當你如果是在收資料,只能等資料打過來的前提下,還是只能用append.
接著,說到收資料 … 我想大家會有個疑惑說 … 奇怪append 不是 O 1? 為什麼我會越收越慢? 這是來自於Python的Garbage Collector的問題. 我也做了相仿的測試請至Code_Append_Big_Size來檢閱測試用程式碼, 不過我把時間取了平均分別跑了 10,000/ 100,000/ 1,000,000 次. 雖然,結果並沒有作者做出來那麼可觀的差異(3rd, July, 2020) … 但還是可以清楚的看到是存在著差異的!因此,如果在要做類似的操作想要再加速的話,這是一個推薦的做法.
那篇Stackoverflow的文章作者提到, 當Python GC在做運行的時候, 會去掃過每一個list裡面的object來決定是否要free這個memory. 當然, 他也有提到這篇文章已久,這個bug可能已被修正. 這就是為什麼, 經過測試之後, 還是有差但結果已不顯著像是 899 -> 18.6這樣!另外,這邊補充可以看看append背後GC的模式.
附錄
Code_Test_Append
# ENV
# Python 3.8.3
#[Clang 11.0.3 (clang-1103.0.32.59)] on darwin
from collections import deque
import numpy as np
import timedef testNumpyAppend(num):
arr = np.array([])
for _ in range(num):
np.append(arr, "x0y0", axis=None)def testListAppend(num):
arr = []
for _ in range(num):
arr += ["x0y0"]def testListCompreAppend(num):
arr = ["x0y0" for _ in range(num)]def testDequeAppend(num):
deq = deque()
for _ in range(num):
deq.append( "x0y0" )TEST_NUMBER = 100000print(f"RUN {TEST_NUMBER} times")
startNumpyTime = time.time()
testNumpyAppend(TEST_NUMBER)
execution_time = time.time() - startNumpyTime
print (f'Numpy\t-> {(execution_time*1000):0.3f}ms')startListTime = time.time()
testListAppend(TEST_NUMBER)
execution_time = time.time() - startListTime
print (f'List\t-> {(execution_time*1000):0.3f}ms')startListTime = time.time()
testListCompreAppend(TEST_NUMBER)
execution_time = time.time() - startListTime
print (f'ListCmp\t-> {(execution_time*1000):0.3f}ms')startDequeTime = time.time()
testDequeAppend(TEST_NUMBER)
execution_time = time.time() - startDequeTime
print (f'Deque\t-> {(execution_time*1000):0.3f}ms')"""
@ 1,000 result
Numpy -> 19.252ms
List -> 0.431ms
ListCmp -> 0.138ms
Deque -> 0.243ms@ 5,000 result
Numpy -> 64.615ms
List -> 0.753ms
ListCmp -> 0.454ms
Deque -> 0.725ms@ 10,000 result
Numpy -> 118.736ms
List -> 1.534ms
ListCmp -> 0.841ms
Deque -> 1.366ms@ 100,000 result
Numpy -> 1148.628ms
List -> 11.774ms
ListCmp -> 6.079ms
Deque -> 11.760ms
"""
Code_Append_Big_Size
import time
import gcclass Test:
def __init__(self):
self.number_one = 1
self.number_two = 2
self.explain = 'whatever string you like'def time_to_append(sz, item_gen):
test_list = []
t0 = time.time()
for _ in range(sz):
test_list.append(item_gen())
return time.time() - t0def test(append_size, times):
ct = 0
for _ in range(times):
ct += time_to_append(append_size, lambda: Test())
print(f"Average time cost with GC -> {round(ct/times,3)} ms")def test_nogc(append_size, times):
ct = 0
for _ in range(times):
gc.disable()
ct += time_to_append(append_size, lambda: Test())
gc.enable()
print(f"Average time cost without GC -> {round(ct/times,3)} ms")SIZE = 1000000
TIMES = 100
test(SIZE, TIMES)
test_nogc(SIZE, TIMES)"""
@ 10,000 result
Average time cost with GC -> 0.011 ms
Average time cost without GC -> 0.009 ms@ 100,000 result
Average time cost with GC -> 0.122 ms
Average time cost without GC -> 0.096 ms@ 1,000,000 result
Average time cost with GC -> 1.668 ms
Average time cost without GC -> 1.013 ms"""