May 30, 2020

Python 杂录

Last Update on July 17, 2020

字典和列表

2020/7/17
字典: 哈希表, 开放寻址. 3.6 开始有新的改变.
列表: 动态数组
参考

多线程和多进程

2020/7/15
You can use threading if your program is network or IO bound, and multiprocessing if it's CPU bound.
Without multiprocessing, Python programs have trouble maxing out your system's specs because of the GIL (Global Interpreter Lock). Python wasn't designed considering that personal computers might have more than one core, so the GIL is necessary because Python is not thread-safe and there is a globally enforced lock when accessing a Python object. Though not perfect, it's a pretty effective mechanism for memory management.
Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. The multiprocessing library gives each process its own Python interpreter and each their own GIL. Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don't share memory, they can't modify the same memory concurrently.
参考

+=

2020/6/9
>>> x = y = [1234]
>>> x += [4]
>>> x
[12344]
>>> y
[12344]
>>> x = y = [1234]
>>> x = x + [4]
>>> x
[12344]
>>> y
[1234]
其中 += 调用了可变对象的 __iadd__ method, 原地操作, 对不可变对象来说依然是 __add__, 而 + 则是 __add__, 创建了新对象.
对于 list 而言, += 几乎等价于 extend, 只是后者是一次函数调用.
参考

UnboundLocalError

a, b = 01

def f(n):
    for _ in range(n):
        a, b = b, a + b
        return a

print(f(7))
# UnboundLocalError: local variable 'b' referenced before assignment
当函数中有赋值操作时, 那个变量就视为局部变量. 解决方法是用 global.

当参数默认值为空列表

def f(*args, a=[]):
    a += args
    return a

x = f(1)
y = f(2)
print(x, y)
# [1, 2] [1, 2]
def g(*args, a=None):
    if not a:
        a = []
    a += args
    return a

x = g(1)
y = g(2)
print(x, y)
# [1] [2]
原因在于函数是一等公民, 参数就像是它的 member data, 随着函数调用而改变.

Late Binding

a = []
for i in range(3):
    def func(x): return x * i
    a.append(func)
for f in a:
    print(f(2))
'''
4
4
4
'''

for f in [lambda x: x*i for i in range(3)]:
    print(f(2))
'''
4
4
4
'''
Python is actually behaving as defined. Three separate functions are created, but they each have the closure of the environment they're defined in - in this case, the global environment (or the outer function's environment if the loop is placed inside another function). This is exactly the problem, though - in this environment, i is mutated, and the closures all refer to the same i.
a = []
for i in range(3):
    def funcC(j):
        def func(x): return x * j
        return func
    a.append(funcC(i))
for f in a:
    print(f(2))

for f in [lambda x, i=i: x*i for i in range(3)]:
    print(f(2))

for f in [lambda x, j=i: x*j for i in range(3)]:
    print(f(2))

# lazy evaluation
for f in (lambda x: x*i for i in range(3)):
    print(f(2))
参考

小整数

>>> a = 256
>>> b = 256
>>> is b
True
>>> a = 257
>>> b = 257
>>> is b
False    
Python 储存了 -5~256 的整数, 当在这个范围内创建整数时, 都会得到先前存在的对象的引用.

super

2020/5/30
当子类的 method 和父类同名时, 可以直接显式地调用父类的 method, 但更好的是用 super 来调用, 最常见的就是 __init__.
事实上上一句话并不对, super 的调用是根据 MRO (Method Resolution Order) 进行的, 并非调用它的父类, 在涉及多重继承时会有区别.
class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second():
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First, Second):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

Third()

'''
Third(): entering
First(): entering
Second(): entering
Second(): exiting
First(): exiting
Third(): exiting
'''
First 和 Second 没有父子关系, 但是在定义 class Third(First, Second) 时, MRO 是 [Third, First, Second], 于是 First 的 super 会调用 Second 的 method.
class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

class Fourth(Second, Third):
    def __init__(self):
        print("Fourth(): entering")
        super().__init__()
        print("Fourth(): exiting")

Fourth()

'''
Fourth(): entering
Second(): entering
Third(): entering
First(): entering
First(): exiting
Third(): exiting
Second(): exiting
Fourth(): exiting
'''
MRO 为 [Fourth, Second, Third, First], 规则是子类必须出现在父类之前.
class First():
    def __init__(self):
        print("First(): entering")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        # difference
        First.__init__(self)

class Third(First):
    def __init__(self):
        print("Third(): entering")
        super().__init__()

class Fourth(First):
    def __init__(self):
        print("Fourth(): entering")
        super().__init__()

class A(Second, Fourth):
    def __init__(self):
        print("A(): entering")
        super().__init__()

class B(Third, Fourth):
    def __init__(self):
        print("B(): entering")
        super().__init__()

A()
B()

'''
A(): entering
Second(): entering
First(): entering
B(): entering
Third(): entering
Fourth(): entering
First(): entering
'''
Second 显式地调用父类方法, 而 Third 通过 super 调用 MRO 下一个类的方法.
class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First, Second):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

Third()
'''
TypeError: Cannot create a consistent method resolution
order (MRO) for bases First, Second
'''
这里 Second 是 First 的子类, 而 Third(First, Second) 却想让 MRO 为 [Third, First, Second], 产生矛盾, 抛出错误.
参考

装饰器

关键在于函数是 Python 的一等公民, 它可以作为参数被传递, 被 return, 被赋值到一个变量.
当函数嵌套时, 内层函数可以使用外层函数的临时变量.

闭包

闭包 (closure): 嵌套函数内层函数用了外层函数的变量, 并且外层函数 return 了内层函数. 见下例.
def print_msg(msg):
    '''outer enclosing function'''

    def printer():
        '''nested function'''
        print(msg)

    return printer

another = print_msg("Hello")
another()
# Hello

'''
This technique by which some data ("Hello") gets attached 
to the code is called closure in Python.

This value in the enclosing scope is remembered 
even when the variable goes out of scope 
or the function itself is removed from the current namespace.
'''
用数学类比, 记 print_msg, 参数 msg, 内层 printer, 则 print_msg 可以理解为

装饰器

# baisc example
def uppercase_decorator(function):
    def wrapper():
        func = function()
        make_uppercase = func.upper()
        return make_uppercase
    return wrapper

def say_hi():
    return 'hello there'

decorate = uppercase_decorator(say_hi)
decorate()
# 'HELLO THERE'

# general example
def some_decorator(function):
    def wrapper(*args, **kwargs):
        print('The positional arguments are', args)
        print('The keyword arguments are', kwargs)
        function(*args, **kwargs)
    return wrapper

@some_decorator
def printer(a, b, c):
    print(a, b, c)

printer(12, c=3)

'''
The positional arguments are (1, 2)
The keyword arguments are {'c': 3}
1 2 3
'''
类似地, 记 some_decorator, function, wrapper, 则
在例子中 printer, (a, b, c). 装饰器 将原本的 变成了 .
# further example
def n_times(n):
    def some_decorator(function):
        def wrapper(*args, **kwargs):
            for _ in range(n):
                function(*args, **kwargs)
        return wrapper
    return some_decorator

@n_times(2)
def printer(a, b, c):
    print(a, b, c)

'''
1 2 3
1 2 3
'''

'''
若不用装饰器则等价于 n_times(2)(printer)(1, 2, c=3)
注意写成 n_times(2)(printer(1, 2, c=3)) 是错误的,
因为 n_times(2) 是记录了 n(=2) 的 some_decorator,
而 printer(1, 2, c=3) 返回的是 None,
传入 some_decorator 之后什么都不会发生,
除了之前调用 printer(1, 2, c=3) 时打印一次 123.
'''
参考

垃圾回收

Garbage collection 的主要机制是 reference counts, 引用数归零则回收. 这个机制无法被关闭.
import sys
a = 'my-string'
b = [a]
print(sys.getrefcount(a))
# 4
这里 4 来自:
  • 创建 a
  • b
  • sys.getrefcount
  • print

循环引用

class MyClass():
    pass
a = MyClass()
a.obj = a
del a
删除了实例后, Python 无法再访问它, 但是其实例依然在内存. 因为它有一个指向自己的引用, 所以引用数不是零.
这类问题叫做 reference cycle, 需要 generational garbage collector 来解决, 在标准库中的 gc 模块中, 它可以检测循环引用.

分代回收

垃圾回收器追踪内存中的所有对象, 一共分为 3 代, 新对象从第 1 代开始. 如果触发了垃圾回收之后对象存活 (没有被回收), 则移动到下一代. 有三个阈值来决定何时触发垃圾回收, 当那个代的对象数量超过了对应的阈值则触发.
但总得来说平时不太需要关心垃圾回收的问题.
参考

Docstring Formats

No comments:

Post a Comment