标签归档:design-patterns

python设计模式

问题:python设计模式

我正在寻找使用Python给出最佳实践,设计模式和SOLID原理示例的任何资源。

I am looking for any resources that gives examples of Best Practices, Design patterns and the SOLID principles using Python.


回答 0

这些重叠

Python中级和高级软件木工

像Pythonista一样的代码:惯用的Python

Python成语与效率

Google美国开发人员日-Python设计模式

另一个资源是Python食谱中的示例。很多人没有遵循最佳实践,但是您可以在其中找到一些有用的模式

Some overlap in these

Intermediate and Advanced Software Carpentry in Python

Code Like a Pythonista: Idiomatic Python

Python Idioms and Efficiency

Google Developers Day US – Python Design Patterns

Another resource is by example at the Python Recipes. A good number do not follow best practices but you can find some patterns in there that are useful


回答 1

类型

>>> import this

在Python控制台中。

尽管这通常被当作一个笑话,但它包含几个有效的特定于Python的公理。

Type

>>> import this

in a Python console.

Although this is usually treated as a (fine!) joke, it contains a couple of valid python-specific axioms.


回答 2

布鲁斯·埃克尔(Bruce Eckel)的“ Python中思想”在很大程度上依赖于设计模式

Bruce Eckel’s “Thinking in Python” leans heavily on Design Patterns


回答 3

您可以在这里这里开始。

要更深入地了解设计模式,您应该查看设计模式:可重用的面向对象软件的元素。源代码不是Python中的代码,但是不需要您了解这些模式。

You can get started here and here.

For a more in depth look at design pattners you should look at Design Patterns: Elements of Reusable Object-Oriented Software. The source code is not in Python, but it doesn’t need to be for you to understand the patterns.


回答 4

在调用可能存在或不存在的对象的属性时,可以用来简化代码的方法是使用Null对象设计模式(在Python Cookbook中引入了Null对象设计模式)。

大致来说,使用Null对象的目标是为Python中常用的原始数据类型None或其他语言中的Null(或Null指针)提供“智能”替代。这些用于许多目的,包括重要的情况,在这种情况下,无论出于何种原因,一组其他类似元素的成员都是特殊的。通常,这会导致条件语句来区分普通元素和原始Null值。

这个对象只是吃了缺少属性错误,并且可以避免检查它们的存在。

无非就是

class Null(object):

    def __init__(self, *args, **kwargs):
        "Ignore parameters."
        return None

    def __call__(self, *args, **kwargs):
        "Ignore method calls."
        return self

    def __getattr__(self, mname):
        "Ignore attribute requests."
        return self

    def __setattr__(self, name, value):
        "Ignore attribute setting."
        return self

    def __delattr__(self, name):
        "Ignore deleting attributes."
        return self

    def __repr__(self):
        "Return a string representation."
        return "<Null>"

    def __str__(self):
        "Convert to a string and return it."
        return "Null"

这样一来,如果您这样做,Null("any", "params", "you", "want").attribute_that_doesnt_exists()它就不会爆炸,而只是默默地变成了pass

通常你会做类似的事情

if obj.attr:
    obj.attr()

这样,您只需执行以下操作:

obj.attr()

忘掉它 注意该Null对象的广泛使用可能会在代码中隐藏错误。

Something you can use to simplify your code when calling attributes on objects that might or might not exist is to use the Null Object Design Pattern (to which I was introduced in Python Cookbook).

Roughly, the goal with Null objects is to provide an ‘intelligent’ replacement for the often used primitive data type None in Python or Null (or Null pointers) in other languages. These are used for many purposes including the important case where one member of some group of otherwise similar elements is special for whatever reason. Most often this results in conditional statements to distinguish between ordinary elements and the primitive Null value.

This object just eats the lack of attribute error, and you can avoid checking for their existence.

It’s nothing more than

class Null(object):

    def __init__(self, *args, **kwargs):
        "Ignore parameters."
        return None

    def __call__(self, *args, **kwargs):
        "Ignore method calls."
        return self

    def __getattr__(self, mname):
        "Ignore attribute requests."
        return self

    def __setattr__(self, name, value):
        "Ignore attribute setting."
        return self

    def __delattr__(self, name):
        "Ignore deleting attributes."
        return self

    def __repr__(self):
        "Return a string representation."
        return "<Null>"

    def __str__(self):
        "Convert to a string and return it."
        return "Null"

With this, if you do Null("any", "params", "you", "want").attribute_that_doesnt_exists() it won’t explode, but just silently become the equivalent of pass.

Normally you’d do something like

if obj.attr:
    obj.attr()

With this, you just do:

obj.attr()

and forget about it. Beware that extensive use of the Null object can potentially hide bugs in your code.


回答 5

您可能还希望阅读本文(选择.pdf文件),该文章讨论了动态面向对象语言(例如Python)的设计模式。引用页面:

本文探讨了当使用动态的,高阶的,面向对象的编程语言解决相似的问题时,“四人帮”或“ GOF”一书中的模式是如何出现的。有些模式消失了-也就是说,语言功能直接支持它们,有些模式更简单或具有不同的关注点,而有些则基本上没有变化。

You may also wish to read this article (select the .pdf file), which discusses Design Patterns in dynamic object oriented languages (i.e. Python). To quote the page:

This paper explores how the patterns from the “Gang of Four”, or “GOF” book, as it is often called, appear when similar problems are addressed using a dynamic, higher-order, object-oriented programming language. Some of the patterns disappear — that is, they are supported directly by language features, some patterns are simpler or have a different focus, and some are essentially unchanged.


为什么IoC / DI在Python中不常见?

问题:为什么IoC / DI在Python中不常见?

在Java中,IoC / DI是一种非常普遍的做法,广泛用于Web应用程序,几乎所有可用的框架和Java EE中。另一方面,也有很多大型的Python Web应用程序,但是除了Zope(我听说过应该非常可怕的编码)之外,IoC在Python世界中似乎并不普遍。(如果您认为我错了,请举一些例子)。

当然,有一些流行的Java IoC框架的克隆可用于Python,例如springpython。但是它们似乎都没有被实际使用。至少,我从来没有在一个stumpled Django的SQLAlchemy的 + <insert your favorite wsgi toolkit here>,它使用类似的东西,基于Web应用程序。

我认为IoC具有合理的优势,例如可以轻松替换django-default-user-model,但是在Python中广泛使用接口类和IoC看起来有些奇怪,而不是“ pythonic”。但是也许有人有一个更好的解释,为什么IoC在Python中没有得到广泛使用。

In Java IoC / DI is a very common practice which is extensively used in web applications, nearly all available frameworks and Java EE. On the other hand, there are also lots of big Python web applications, but beside of Zope (which I’ve heard should be really horrible to code) IoC doesn’t seem to be very common in the Python world. (Please name some examples if you think that I’m wrong).

There are of course several clones of popular Java IoC frameworks available for Python, springpython for example. But none of them seems to get used practically. At least, I’ve never stumpled upon a Django or sqlalchemy+<insert your favorite wsgi toolkit here> based web application which uses something like that.

In my opinion IoC has reasonable advantages and would make it easy to replace the django-default-user-model for example, but extensive usage of interface classes and IoC in Python looks a bit odd and not »pythonic«. But maybe someone has a better explanation, why IoC isn’t widely used in Python.


回答 0

我实际上并不认为DI / IoC 在Python 并不罕见。什么不常见的,但是,是DI / IoC的框架/容器

想一想:DI容器做什么?它可以让你

  1. 将独立的组件连接成一个完整的应用程序…
  2. …在运行时。

我们有“连接在一起”和“运行时”的名称:

  1. 脚本编写
  2. 动态

因此,DI容器不过是动态脚本语言的解释器。实际上,让我改写一下:一个典型的Java / .NET DI容器只不过是一个糟糕的解释器,它解释了一种非常糟糕的动态脚本语言,其使用的语法有些笨拙,有时甚至是基于XML的。

当您使用Python进行编程时,为什么要使用丑陋,糟糕的脚本语言,却要拥有漂亮,精妙的脚本语言呢?实际上,这是一个更笼统的问题:当您使用几乎任何一种语言进行编程时,为什么要使用Jython和IronPython来使用一种丑陋的,糟糕的脚本语言?

因此,回顾一下:出于完全相同的原因,DI / IoC 的实践在Python中与在Java中一样重要。但是,DI / IoC 的实现已内置于该语言中,并且通常如此轻巧,以至于它完全消失了。

(这里有一个简短的类比:在汇编中,子例程调用是一件很重要的事情-您必须将本地变量和寄存器保存到内存中,将返回地址保存在某个地方,将指令指针更改为要调用的子例程,安排它完成后以某种方式跳回到您的子例程中,将参数放在被调用者可以找到它们的地方,依此类推。IOW:在汇编中,“子例程调用”是一种设计模式,在出现诸如内置了子例程调用的Fortran,人们正在构建自己的“子例程框架”。您会说在Python中子例程调用是“罕见的”,仅仅是因为您不使用子例程框架吗?)

顺便说一句:让DI成为逻辑结论的示例,请看一下Gilad BrachaNewspeak编程语言及其在该主题上的著作:

I don’t actually think that DI/IoC are that uncommon in Python. What is uncommon, however, are DI/IoC frameworks/containers.

Think about it: what does a DI container do? It allows you to

  1. wire together independent components into a complete application …
  2. … at runtime.

We have names for “wiring together” and “at runtime”:

  1. scripting
  2. dynamic

So, a DI container is nothing but an interpreter for a dynamic scripting language. Actually, let me rephrase that: a typical Java/.NET DI container is nothing but a crappy interpreter for a really bad dynamic scripting language with butt-ugly, sometimes XML-based, syntax.

When you program in Python, why would you want to use an ugly, bad scripting language when you have a beautiful, brilliant scripting language at your disposal? Actually, that’s a more general question: when you program in pretty much any language, why would you want to use an ugly, bad scripting language when you have Jython and IronPython at your disposal?

So, to recap: the practice of DI/IoC is just as important in Python as it is in Java, for exactly the same reasons. The implementation of DI/IoC however, is built into the language and often so lightweight that it completely vanishes.

(Here’s a brief aside for an analogy: in assembly, a subroutine call is a pretty major deal – you have to save your local variables and registers to memory, save your return address somewhere, change the instruction pointer to the subroutine you are calling, arrange for it to somehow jump back into your subroutine when it is finished, put the arguments somewhere where the callee can find them, and so on. IOW: in assembly, “subroutine call” is a Design Pattern, and before there were languages like Fortran which had subroutine calls built in, people were building their own “subroutine frameworks”. Would you say that subroutine calls are “uncommon” in Python, just because you don’t use subroutine frameworks?)

BTW: for an example of what it looks like to take DI to its logical conclusion, take a look at Gilad Bracha‘s Newspeak Programming Language and his writings on the subject:


回答 1

它的一部分是模块系统在Python中的工作方式。您只需从模块导入即可免费获得某种“单身”。在模块中定义对象的实际实例,然后任何客户端代码都可以导入该对象,并实际上获得一个可以正常工作的,完全构建的/填充的对象。

这与Java相反,在Java中,您不导入对象的实际实例。这意味着您始终必须自己实例化它们(或使用某种IoC / DI样式方法)。您可以通过使用静态工厂方法(或实际工厂类)来减轻必须实例化所有内容的麻烦,但是您仍然会每次实际创建新方法时会产生资源开销。

Part of it is the way the module system works in Python. You can get a sort of “singleton” for free, just by importing it from a module. Define an actual instance of an object in a module, and then any client code can import it and actually get a working, fully constructed / populated object.

This is in contrast to Java, where you don’t import actual instances of objects. This means you are always having to instantiate them yourself, (or use some sort of IoC/DI style approach). You can mitigate the hassle of having to instantiate everything yourself by having static factory methods (or actual factory classes), but then you still incur the resource overhead of actually creating new ones each time.


回答 2

IoC和DI在成熟的Python代码中非常常见。由于鸭子输入,您只需要一个框架来实现DI。

最好的示例是如何使用来设置Django应用程序settings.py

# settings.py
CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': REDIS_URL + '/1',
    },
    'local': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'snowflake',
    }
}

Django Rest Framework大量利用了DI:

class FooView(APIView):
    # The "injected" dependencies:
    permission_classes = (IsAuthenticated, )
    throttle_classes = (ScopedRateThrottle, )
    parser_classes = (parsers.FormParser, parsers.JSONParser, parsers.MultiPartParser)
    renderer_classes = (renderers.JSONRenderer,)

    def get(self, request, *args, **kwargs):
        pass

    def post(self, request, *args, **kwargs):
        pass

让我提醒一下(来源):

“依赖性注入”是5美分概念的25美元术语。依赖注入意味着给对象一个实例变量。[…]。

IoC and DI are super common in mature Python code. You just don’t need a framework to implement DI thanks to duck typing.

The best example is how you set up a Django application using settings.py:

# settings.py
CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': REDIS_URL + '/1',
    },
    'local': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'snowflake',
    }
}

Django Rest Framework utilizes DI heavily:

class FooView(APIView):
    # The "injected" dependencies:
    permission_classes = (IsAuthenticated, )
    throttle_classes = (ScopedRateThrottle, )
    parser_classes = (parsers.FormParser, parsers.JSONParser, parsers.MultiPartParser)
    renderer_classes = (renderers.JSONRenderer,)

    def get(self, request, *args, **kwargs):
        pass

    def post(self, request, *args, **kwargs):
        pass

Let me remind (source):

“Dependency Injection” is a 25-dollar term for a 5-cent concept. […] Dependency injection means giving an object its instance variables. […].


回答 3

Django充分利用了控制反转。例如,数据库服务器由配置文件选择,然后框架向数据库客户端提供适当的数据库包装器实例。

区别在于Python具有一流的类型。数据类型(包括类)本身就是对象。如果您想要某些东西使用特定的类,只需为该类命名。例如:

if config_dbms_name == 'postgresql':
    import psycopg
    self.database_interface = psycopg
elif config_dbms_name == 'mysql':
    ...

随后的代码可以通过编写以下内容来创建数据库接口:

my_db_connection = self.database_interface()
# Do stuff with database.

Python用一两行普通代码来代替Java和C ++所需的样板工厂功能。这就是函数式编程与命令式编程的强项。

Django makes great use of inversion of control. For instance, the database server is selected by the configuration file, then the framework provides appropriate database wrapper instances to database clients.

The difference is that Python has first-class types. Data types, including classes, are themselves objects. If you want something to use a particular class, simply name the class. For example:

if config_dbms_name == 'postgresql':
    import psycopg
    self.database_interface = psycopg
elif config_dbms_name == 'mysql':
    ...

Later code can then create a database interface by writing:

my_db_connection = self.database_interface()
# Do stuff with database.

Instead of the boilerplate factory functions that Java and C++ need, Python does it with one or two lines of ordinary code. This is the strength of functional versus imperative programming.


回答 4

它看到人们真的不再得到依赖注入和控制反转意味着什么了。

使用控制反转的做法是让类或函数依赖于另一个类或函数,但是与其在函数代码的类中创建实例相比,不如在函数代码的类中创建实例,则最好将其作为参数来接收,因此可以简化松耦合。这具有许多优点,因为它们具有更高的可测试性以及归档liskov替换原理。

您会发现,通过使用接口和注入,代码可以更容易维护,因为您可以轻松更改行为,因为您不必重写一行代码(在DI配置中为一两行)类更改其行为,因为实现您的类正在等待的接口的类可以独立变化,只要它们遵循该接口即可。保持代码分离和易于维护的最佳策略之一是至少遵循单一的责任,替换和依赖关系反转原则。

如果您可以自己在包中实例化对象并将其导入以自己注入,那么DI库有什么用?选择的答案是正确的,因为Java没有过程部分(类之外的代码),所有这些都进入了无聊的配置xml,因此需要类来实例化和注入依赖于惰性加载方式的依赖项,因此您不会感到厌烦您的性能,而在python上,您只需在代码的“过程”(类外部代码)部分上编码注入

It seens that people really dont get what Dependency injection and inversion of control means anymore.

The practice of using inversion of control is to have classes or function that depends of another classes or functions, but instead of creating the instances whithin the class of function code it is better to receive it as a parameter, so loose coupling can be archieved. That has many benefits as more testability and to archieve the liskov substitution principle.

You see, by working with interfaces and injections, your code gets more maintanable, since you can change the behavior easily, because you won’t have to rewrite a single line of code (maybe a line or two on the DI configuration) of your class to change it’s behavior, since the classes that implements the interface your class is waiting for can vary independently as long as they follow the interface. One of the best strategies to keep code decoupled and easy to maintain is to follow at least the single responsability, substitution and dependency inversion principles.

Whats a DI library good for if you can instantiate a object yourself inside a package and import it to inject it yourself? The chosen answer is right, since java has no procedural sections (code outside of classes), all that goes into boring configuration xml’s, hence the need of a class to instantiate and inject dependencies on a lazy load fashion so you don’t blow away your performance, while on python you just code the injections on the “procedural” (code outside classes) sections of your code


回答 5

几年来没有使用过Python,但是我想说它与动态类型化语言的关系比其他任何事情都重要。举一个简单的例子,在Java中,如果我想测试是否适当地写了一些标准,我可以使用DI并传入任何PrintStream来捕获正在编写的文本并进行验证。但是,当我在Ruby中工作时,我可以动态替换STDOUT上的“ puts”方法来进行验证,而将DI完全排除在外。如果我创建抽象的唯一原因是测试使用抽象的类(例如文件系统操作或Java中的时钟),则DI / IoC会在解决方案中造成不必要的复杂性。

Haven’t used Python in several years, but I would say that it has more to do with it being a dynamically typed language than anything else. For a simple example, in Java, if I wanted to test that something wrote to standard out appropriately I could use DI and pass in any PrintStream to capture the text being written and verify it. When I’m working in Ruby, however, I can dynamically replace the ‘puts’ method on STDOUT to do the verify, leaving DI completely out of the picture. If the only reason I’m creating an abstraction is to test the class that’s using it (think File system operations or the clock in Java) then DI/IoC creates unnecessary complexity in the solution.


回答 6

实际上,用DI编写足够干净和紧凑的代码是很容易的(我想知道那会是/保持pythonic,但无论如何:)),例如,我实际上更喜欢这种编码方式:

def polite(name_str):
    return "dear " + name_str

def rude(name_str):
    return name_str + ", you, moron"

def greet(name_str, call=polite):
    print "Hello, " + call(name_str) + "!"

_

>>greet("Peter")
Hello, dear Peter!
>>greet("Jack", rude)
Hello, Jack, you, moron!

是的,可以将其视为参数化函数/类的简单形式,但是它确实可以工作。因此,也许Python随附的默认电池在这里也足够了。

PS我还在动态评估Python中的简单布尔逻辑时还发布了这种天真方法的更大示例。

Actually, it is quite easy to write sufficiently clean and compact code with DI (I wonder, will it be/stay pythonic then, but anyway :) ), for example I actually perefer this way of coding:

def polite(name_str):
    return "dear " + name_str

def rude(name_str):
    return name_str + ", you, moron"

def greet(name_str, call=polite):
    print "Hello, " + call(name_str) + "!"

_

>>greet("Peter")
Hello, dear Peter!
>>greet("Jack", rude)
Hello, Jack, you, moron!

Yes, this can be viewed as just a simple form of parameterizing functions/classes, but it does its work. So, maybe Python’s default-included batteries are enough here too.

P.S. I have also posted a larger example of this naive approach at Dynamically evaluating simple boolean logic in Python.


回答 7

IoC / DI是一个设计概念,但不幸的是,它通常被视为适用于某些语言(或键入系统)的概念。我希望看到依赖注入容器在Python中变得越来越流行。有Spring,但是那是一个超级框架,似乎是Java概念的直接移植,而无需过多考虑“ Python方式”。

给定Python 3中的注释,我决定对功能齐全但简单的依赖项注入容器进行破解:https : //github.com/zsims/dic。它基于.NET依赖项注入容器中的一些概念(如果您曾经在该领域中玩,那么IMO就是一个不错的选择),但是却被Python概念所突变。

IoC/DI is a design concept, but unfortunately it’s often taken as a concept that applies to certain languages (or typing systems). I’d love to see dependency injection containers become far more popular in Python. There’s Spring, but that’s a super-framework and seems to be a direct port of the Java concepts without much consideration for “The Python Way.”

Given Annotations in Python 3, I decided to have a crack at a full featured, but simple, dependency injection container: https://github.com/zsims/dic . It’s based on some concepts from a .NET dependency injection container (which IMO is fantastic if you’re ever playing in that space), but mutated with Python concepts.


回答 8

我认为,由于python的动态性质,人们经常看不到需要另一个动态框架。当类从新样式的“对象”继承时,您可以动态创建一个新变量(https://wiki.python.org/moin/NewClassVsClassicClass)。

在纯python中:

#application.py
class Application(object):
    def __init__(self):
        pass

#main.py
Application.postgres_connection = PostgresConnection()

#other.py
postgres_connection = Application.postgres_connection
db_data = postgres_connection.fetchone()

但是,查看https://github.com/noodleflake/pyioc,这可能是您想要的。

在pyooc

from libs.service_locator import ServiceLocator

#main.py
ServiceLocator.register(PostgresConnection)

#other.py
postgres_connection = ServiceLocator.resolve(PostgresConnection)
db_data = postgres_connection.fetchone()

I think due to the dynamic nature of python people don’t often see the need for another dynamic framework. When a class inherits from the new-style ‘object’ you can create a new variable dynamically (https://wiki.python.org/moin/NewClassVsClassicClass).

i.e. In plain python:

#application.py
class Application(object):
    def __init__(self):
        pass

#main.py
Application.postgres_connection = PostgresConnection()

#other.py
postgres_connection = Application.postgres_connection
db_data = postgres_connection.fetchone()

However have a look at https://github.com/noodleflake/pyioc this might be what you are looking for.

i.e. In pyioc

from libs.service_locator import ServiceLocator

#main.py
ServiceLocator.register(PostgresConnection)

#other.py
postgres_connection = ServiceLocator.resolve(PostgresConnection)
db_data = postgres_connection.fetchone()

回答 9

我支持“JörgW Mittag”的回答:“ DI / IoC的Python实现非常轻巧,因此完全消失了”。

为了支持这一说法,请看一下著名的Martin Fowler从Java移植到Python的示例: Python:Design_Patterns:Inversion_of_Control

从上面的链接中可以看到,Python中的“容器”可以用8行代码编写:

class Container:
    def __init__(self, system_data):
        for component_name, component_class, component_args in system_data:
            if type(component_class) == types.ClassType:
                args = [self.__dict__[arg] for arg in component_args]
                self.__dict__[component_name] = component_class(*args)
            else:
                self.__dict__[component_name] = component_class

I back “Jörg W Mittag” answer: “The Python implementation of DI/IoC is so lightweight that it completely vanishes”.

To back up this statement, take a look at the famous Martin Fowler’s example ported from Java to Python: Python:Design_Patterns:Inversion_of_Control

As you can see from the above link, a “Container” in Python can be written in 8 lines of code:

class Container:
    def __init__(self, system_data):
        for component_name, component_class, component_args in system_data:
            if type(component_class) == types.ClassType:
                args = [self.__dict__[arg] for arg in component_args]
                self.__dict__[component_name] = component_class(*args)
            else:
                self.__dict__[component_name] = component_class

回答 10

我的2cents是,在大多数Python应用程序中,您不需要它,即使您需要它,也有很多Java仇恨者(以及认为自己是开发人员的无能的提琴手)认为它不好,只是因为它在Java中很流行。

当您具有复杂的对象网络时,IoC系统实际上很有用,其中每个对象可能是其他几个对象的依赖项,而本身又是其他对象的依赖项。在这种情况下,您将希望一次定义所有这些对象,并具有一种机制,可以根据尽可能多的隐式规则将它们自动组合在一起。如果您还需要由应用程序用户/管理员以简单的方式定义配置,那么这就是希望IoC系统能够从简单的XML文件(即配置)中读取其组件的另一个原因。

没有这样复杂的体系结构,典型的Python应用程序要简单得多,只有一堆脚本。我个人知道IoC实际上是什么(与在此处写了某些答案的人相反),而我在有限的Python经验中从未感到过对IoC的需求(而且我并没有在所有地方都使用Spring,不是在优点时它给您带来了不合理的开发开销)。

也就是说,在某些Python情况下,IoC方法实际上是有用的,实际上,我在这里读到Django使用了它。

上面的相同推理可以应用于Java世界中的面向方面的编程,不同之处在于AOP真正值得的案例数量更加有限。

My 2cents is that in most Python applications you don’t need it and, even if you needed it, chances are that many Java haters (and incompetent fiddlers who believe to be developers) consider it as something bad, just because it’s popular in Java.

An IoC system is actually useful when you have complex networks of objects, where each object may be a dependency for several others and, in turn, be itself a dependant on other objects. In such a case you’ll want to define all these objects once and have a mechanism to put them together automatically, based on as many implicit rules as possible. If you also have configuration to be defined in a simple way by the application user/administrator, that’s an additional reason to desire an IoC system that can read its components from something like a simple XML file (which would be the configuration).

The typical Python application is much simpler, just a bunch of scripts, without such a complex architecture. Personally I’m aware of what an IoC actually is (contrary to those who wrote certain answers here) and I’ve never felt the need for it in my limited Python experience (also I don’t use Spring everywhere, not when the advantages it gives don’t justify its development overhead).

That said, there are Python situations where the IoC approach is actually useful and, in fact, I read here that Django uses it.

The same reasoning above could be applied to Aspect Oriented Programming in the Java world, with the difference that the number of cases where AOP is really worthwhile is even more limited.


回答 11

pytest夹具全部基于DI(来源

pytest fixtures all based on DI (source)


回答 12

我同意@Jorg的观点,那就是DI / IoC在Python中是可能的,更容易的,甚至更漂亮的。缺少的是支持它的框架,但是有一些exceptions。我想举几个例子:

  • Django注释使您可以使用自定义逻辑和表单来连接自己的Comment类。[更多信息]

  • Django允许您使用自定义Profile对象附加到您的User模型。这不是完全的IoC,而是一种很好的方法。我个人希望像注释框架那样替换空洞的User模型。[更多信息]

I agree with @Jorg in the point that DI/IoC is possible, easier and even more beautiful in Python. What’s missing is the frameworks supporting it, but there are a few exceptions. To point a couple of examples that come to my mind:

  • Django comments let you wire your own Comment class with your custom logic and forms. [More Info]

  • Django let you use a custom Profile object to attach to your User model. This is not completely IoC but is a good approach. Personally I’d like to replace the hole User model as the comments framework does. [More Info]


回答 13

在我看来,诸如依赖注入之类的东西就是僵化和过度复杂框架的症状。当代码主体变得过于繁重而无法轻松更改时,您会发现自己不得不选择其中的一小部分,为它们定义接口,然后允许人们通过插入这些接口的对象来更改行为。一切都很好,但是最好首先避免这种复杂性。

这也是静态类型语言的症状。当您唯一需要表达抽象的工具是继承时,那么几乎到处都可以使用它。话虽这么说,C ++非常相似,但从未像Java开发人员那样在任何地方都对Builders和Interfaces着迷。梦想拥有灵活性和可扩展性很容易变得过于狂妄,而这样做的代价是编写太多的通用代码,却没有什么实际的好处。我认为这是文化的事情。

通常,我认为Python人员习惯于为工作选择合适的工具,这是一个连贯且简单的整体,而不是一个可以做任何事情但提供令人困惑的可能配置排列的单一工具(带有千种插件) 。仍然有必要时可互换的部分,但是由于鸭子类型的灵活性和语言的相对简单性,因此不需要定义固定接口的庞大形式。

In my opinion, things like dependency injection are symptoms of a rigid and over-complex framework. When the main body of code becomes much too weighty to change easily, you find yourself having to pick small parts of it, define interfaces for them, and then allowing people to change behaviour via the objects that plug into those interfaces. That’s all well and good, but it’s better to avoid that sort of complexity in the first place.

It’s also the symptom of a statically-typed language. When the only tool you have to express abstraction is inheritance, then that’s pretty much what you use everywhere. Having said that, C++ is pretty similar but never picked up the fascination with Builders and Interfaces everywhere that Java developers did. It is easy to get over-exuberant with the dream of being flexible and extensible at the cost of writing far too much generic code with little real benefit. I think it’s a cultural thing.

Typically I think Python people are used to picking the right tool for the job, which is a coherent and simple whole, rather than the One True Tool (With A Thousand Possible Plugins) that can do anything but offers a bewildering array of possible configuration permutations. There are still interchangeable parts where necessary, but with no need for the big formalism of defining fixed interfaces, due to the flexibility of duck-typing and the relative simplicity of the language.


回答 14

与Java中强类型化的特性不同。Python的鸭子输入行为使传递对象变得非常容易。

Java开发人员专注于构造对象之间的类结构和关系,同时保持事物的灵活性。IoC对于实现这一点极为重要。

Python开发人员专注于完成工作。他们只是在需要时上课。他们甚至不必担心类的类型。只要能发出嘎嘎声,它就是鸭子!这种性质没有留给IoC的空间。

Unlike the strong typed nature in Java. Python’s duck typing behavior makes it so easy to pass objects around.

Java developers are focusing on the constructing the class strcuture and relation between objects, while keeping things flexible. IoC is extremely important for achieving this.

Python developers are focusing on getting the work done. They just wire up classes when they need it. They don’t even have to worry about the type of the class. As long as it can quack, it’s a duck! This nature leaves no room for IoC.


为什么总是在__new __()之后调用__init __()?

问题:为什么总是在__new __()之后调用__init __()?

我只是想简化我的一个类,并以与flyweight设计模式相同的样式介绍了一些功能。

但是,对于为什么__init__总是被称为after ,我有点困惑__new__。我没想到这一点。谁能告诉我为什么会这样,否则我如何实现此功能?(除了将实现放到__new__hacky中之外)。

这是一个例子:

class A(object):
    _dict = dict()

    def __new__(cls):
        if 'key' in A._dict:
            print "EXISTS"
            return A._dict['key']
        else:
            print "NEW"
            return super(A, cls).__new__(cls)

    def __init__(self):
        print "INIT"
        A._dict['key'] = self
        print ""

a1 = A()
a2 = A()
a3 = A()

输出:

NEW
INIT

EXISTS
INIT

EXISTS
INIT

为什么?

I’m just trying to streamline one of my classes and have introduced some functionality in the same style as the flyweight design pattern.

However, I’m a bit confused as to why __init__ is always called after __new__. I wasn’t expecting this. Can anyone tell me why this is happening and how I can implement this functionality otherwise? (Apart from putting the implementation into the __new__ which feels quite hacky.)

Here’s an example:

class A(object):
    _dict = dict()

    def __new__(cls):
        if 'key' in A._dict:
            print "EXISTS"
            return A._dict['key']
        else:
            print "NEW"
            return super(A, cls).__new__(cls)

    def __init__(self):
        print "INIT"
        A._dict['key'] = self
        print ""

a1 = A()
a2 = A()
a3 = A()

Outputs:

NEW
INIT

EXISTS
INIT

EXISTS
INIT

Why?


回答 0

使用__new__时,你需要控制一个新实例的创建。

使用 __init__时,你需要一个新的实例的控件初始化。

__new__是实例创建的第一步。它首先被调用,并负责返回您的类的新实例。

相反, __init__什么也不返回;创建实例后,它仅负责初始化实例。

通常,__new__除非您要继承不可变类型(例如str,int,unicode或tuple),否则无需重写。

从2008年4月发布:何时使用__new__vs __init__在mail.python.org上。

您应该考虑要尝试做的事通常是通过Factory完成的,这是最好的方法。使用__new__不是一个好的清洁解决方案,因此请考虑使用工厂。在这里,您有一个很好的工厂示例

Use __new__ when you need to control the creation of a new instance.

Use __init__ when you need to control initialization of a new instance.

__new__ is the first step of instance creation. It’s called first, and is responsible for returning a new instance of your class.

In contrast, __init__ doesn’t return anything; it’s only responsible for initializing the instance after it’s been created.

In general, you shouldn’t need to override __new__ unless you’re subclassing an immutable type like str, int, unicode or tuple.

From April 2008 post: When to use __new__ vs. __init__? on mail.python.org.

You should consider that what you are trying to do is usually done with a Factory and that’s the best way to do it. Using __new__ is not a good clean solution so please consider the usage of a factory. Here you have a good factory example.


回答 1

__new__是静态类方法,__init__而是实例方法。 __new__必须先创建实例,因此__init__可以对其进行初始化。注意,__init__将其self作为参数。在创建实例之前,没有任何实例self

现在,我知道您正在尝试在Python中实现单例模式。有几种方法可以做到这一点。

另外,从Python 2.6开始,您可以使用类装饰器

def singleton(cls):
    instances = {}
    def getinstance():
        if cls not in instances:
            instances[cls] = cls()
        return instances[cls]
    return getinstance

@singleton
class MyClass:
  ...

__new__ is static class method, while __init__ is instance method. __new__ has to create the instance first, so __init__ can initialize it. Note that __init__ takes self as parameter. Until you create instance there is no self.

Now, I gather, that you’re trying to implement singleton pattern in Python. There are a few ways to do that.

Also, as of Python 2.6, you can use class decorators.

def singleton(cls):
    instances = {}
    def getinstance():
        if cls not in instances:
            instances[cls] = cls()
        return instances[cls]
    return getinstance

@singleton
class MyClass:
  ...

回答 2

在大多数众所周知的OO语言中,类似的表达式SomeClass(arg1, arg2)将分配一个新实例,初始化该实例的属性,然后返回它。

在大多数著名的OO语言中,可以通过定义构造函数为每个类自定义“初始化实例的属性”部分,该构造函数基本上只是在新实例上运行的代码块(使用提供给构造函数表达式的参数) )来设置所需的任何初始条件。在Python中,这对应于class的__init__方法。

Python的__new__功能无非就是“分配新实例”部分的类似的按类自定义。当然,这允许您执行不同寻常的操作,例如返回现有实例而不是分配新实例。因此,在Python中,我们不应该真的认为这部分必然涉及分配。我们所需要的只是__new__从某个地方提出一个合适的实例。

但这仍然只是工作的一半,Python系统无法知道有时您希望__init__稍后再执行另一部分工作(),而有时又不想。如果您想要这种行为,则必须明确地说出。

通常,您可以重构,因此只需要__new__,或者不需要__new__,或者这样__init__就可以在已初始化的对象上表现不同。但是,如果你真的想,Python不竟让你重新定义“工作”,所以SomeClass(arg1, arg2)不一定需要__new__后面__init__。为此,您需要创建一个元类,并定义其__call__方法。

元类只是类的类。而类的__call__方法控制了当您调用类的实例时会发生什么。因此,metaclass__call__方法控制了您调用类时发生的事情。即,它允许您从头到尾重新定义实例创建机制。在此级别上,您可以最优雅地实现完全非标准的实例创建过程,例如单例模式。事实上,用了不到10行代码就可以实现一个Singleton元类是那么甚至不要求你与futz __new__ 可言,并且可以将任何通过简单地增加,否则正常的,定义为单__metaclass__ = Singleton

class Singleton(type):
    def __init__(self, *args, **kwargs):
        super(Singleton, self).__init__(*args, **kwargs)
        self.__instance = None
    def __call__(self, *args, **kwargs):
        if self.__instance is None:
            self.__instance = super(Singleton, self).__call__(*args, **kwargs)
        return self.__instance

但是,这可能比这种情况下真正应具有的魔力还要深!

In most well-known OO languages, an expression like SomeClass(arg1, arg2) will allocate a new instance, initialise the instance’s attributes, and then return it.

In most well-known OO languages, the “initialise the instance’s attributes” part can be customised for each class by defining a constructor, which is basically just a block of code that operates on the new instance (using the arguments provided to the constructor expression) to set up whatever initial conditions are desired. In Python, this corresponds to the class’ __init__ method.

Python’s __new__ is nothing more and nothing less than similar per-class customisation of the “allocate a new instance” part. This of course allows you to do unusual things such as returning an existing instance rather than allocating a new one. So in Python, we shouldn’t really think of this part as necessarily involving allocation; all that we require is that __new__ comes up with a suitable instance from somewhere.

But it’s still only half of the job, and there’s no way for the Python system to know that sometimes you want to run the other half of the job (__init__) afterwards and sometimes you don’t. If you want that behavior, you have to say so explicitly.

Often, you can refactor so you only need __new__, or so you don’t need __new__, or so that __init__ behaves differently on an already-initialised object. But if you really want to, Python does actually allow you to redefine “the job”, so that SomeClass(arg1, arg2) doesn’t necessarily call __new__ followed by __init__. To do this, you need to create a metaclass, and define its __call__ method.

A metaclass is just the class of a class. And a class’ __call__ method controls what happens when you call instances of the class. So a metaclass__call__ method controls what happens when you call a class; i.e. it allows you to redefine the instance-creation mechanism from start to finish. This is the level at which you can most elegantly implement a completely non-standard instance creation process such as the singleton pattern. In fact, with less than 10 lines of code you can implement a Singleton metaclass that then doesn’t even require you to futz with __new__ at all, and can turn any otherwise-normal class into a singleton by simply adding __metaclass__ = Singleton!

class Singleton(type):
    def __init__(self, *args, **kwargs):
        super(Singleton, self).__init__(*args, **kwargs)
        self.__instance = None
    def __call__(self, *args, **kwargs):
        if self.__instance is None:
            self.__instance = super(Singleton, self).__call__(*args, **kwargs)
        return self.__instance

However this is probably deeper magic than is really warranted for this situation!


回答 3

引用文档

典型的实现通过使用带有适当参数的“ super(currentclass,cls).__ new __(cls [,…])”调用超类的__new __()方法,然后根据需要修改新创建的实例来创建该类的新实例。在返回之前。

如果__new __()不返回cls的实例,则将不会调用新实例的__init __()方法。

__new __()主要用于允许不可变类型的子类(例如int,str或tuple)自定义实例创建。

To quote the documentation:

Typical implementations create a new instance of the class by invoking the superclass’s __new__() method using “super(currentclass, cls).__new__(cls[, …])”with appropriate arguments and then modifying the newly-created instance as necessary before returning it.

If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.

__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation.


回答 4

我意识到这个问题已经很久了,但是我也遇到了类似的问题。以下是我想要的:

class Agent(object):
    _agents = dict()

    def __new__(cls, *p):
        number = p[0]
        if not number in cls._agents:
            cls._agents[number] = object.__new__(cls)
        return cls._agents[number]

    def __init__(self, number):
        self.number = number

    def __eq__(self, rhs):
        return self.number == rhs.number

Agent("a") is Agent("a") == True

我将此页面用作资源http://infohost.nmt.edu/tcc/help/pubs/python/web/new-new-method.html

I realize that this question is quite old but I had a similar issue. The following did what I wanted:

class Agent(object):
    _agents = dict()

    def __new__(cls, *p):
        number = p[0]
        if not number in cls._agents:
            cls._agents[number] = object.__new__(cls)
        return cls._agents[number]

    def __init__(self, number):
        self.number = number

    def __eq__(self, rhs):
        return self.number == rhs.number

Agent("a") is Agent("a") == True

I used this page as a resource http://infohost.nmt.edu/tcc/help/pubs/python/web/new-new-method.html


回答 5

我认为这个问题的简单答案是,如果__new__返回的值与类的类型相同,则__init__函数将执行,否则将不会执行。在这种情况下,您的代码将返回A._dict('key')与相同的类cls,因此__init__将被执行。

I think the simple answer to this question is that, if __new__ returns a value that is the same type as the class, the __init__ function executes, otherwise it won’t. In this case your code returns A._dict('key') which is the same class as cls, so __init__ will be executed.


回答 6

__new__返回相同类的实例时,__init__随后在返回的对象上运行。即您不能使用它__new__来阻止__init__运行。即使您从中返回先前创建的对象__new__,也将__init__一次又一次地将其初始化为double(三重,等等)。

这是Singleton模式的通用方法,它在上面扩展了vartec答案并对其进行了修复:

def SingletonClass(cls):
    class Single(cls):
        __doc__ = cls.__doc__
        _initialized = False
        _instance = None

        def __new__(cls, *args, **kwargs):
            if not cls._instance:
                cls._instance = super(Single, cls).__new__(cls, *args, **kwargs)
            return cls._instance

        def __init__(self, *args, **kwargs):
            if self._initialized:
                return
            super(Single, self).__init__(*args, **kwargs)
            self.__class__._initialized = True  # Its crucial to set this variable on the class!
    return Single

全文在这里

实际上涉及的另一种方法__new__是使用类方法:

class Singleton(object):
    __initialized = False

    def __new__(cls, *args, **kwargs):
        if not cls.__initialized:
            cls.__init__(*args, **kwargs)
            cls.__initialized = True
        return cls


class MyClass(Singleton):
    @classmethod
    def __init__(cls, x, y):
        print "init is here"

    @classmethod
    def do(cls):
        print "doing stuff"

请注意,通过这种方法,您需要用修饰所有方法@classmethod,因为您将永远不会使用的任何实际实例MyClass

When __new__ returns instance of the same class, __init__ is run afterwards on returned object. I.e. you can NOT use __new__ to prevent __init__ from being run. Even if you return previously created object from __new__, it will be double (triple, etc…) initialized by __init__ again and again.

Here is the generic approach to Singleton pattern which extends vartec answer above and fixes it:

def SingletonClass(cls):
    class Single(cls):
        __doc__ = cls.__doc__
        _initialized = False
        _instance = None

        def __new__(cls, *args, **kwargs):
            if not cls._instance:
                cls._instance = super(Single, cls).__new__(cls, *args, **kwargs)
            return cls._instance

        def __init__(self, *args, **kwargs):
            if self._initialized:
                return
            super(Single, self).__init__(*args, **kwargs)
            self.__class__._initialized = True  # Its crucial to set this variable on the class!
    return Single

Full story is here.

Another approach, which in fact involves __new__ is to use classmethods:

class Singleton(object):
    __initialized = False

    def __new__(cls, *args, **kwargs):
        if not cls.__initialized:
            cls.__init__(*args, **kwargs)
            cls.__initialized = True
        return cls


class MyClass(Singleton):
    @classmethod
    def __init__(cls, x, y):
        print "init is here"

    @classmethod
    def do(cls):
        print "doing stuff"

Please pay attention, that with this approach you need to decorate ALL of your methods with @classmethod, because you’ll never use any real instance of MyClass.


回答 7

参考此文档

当对不可变的内置类型(例如数字和字符串)进行子类化时,有时在其他情况下,可以使用new静态方法。new是实例构造的第一步,在init之前调用。

方法被称为与类作为第一个参数; 它的责任是返回该类的新实例。

将此与init进行比较:init是使用实例作为其第一个参数调用的,它不返回任何内容;它的责任是初始化实例。

在某些情况下,无需调用init即可创建新实例(例如,从泡菜中加载实例时)。如果不调用new,就无法创建新实例(尽管在某些情况下,可以通过调用基类的new来摆脱困境)。

关于您希望实现的目标,在有关Singleton模式的相同文档信息中也有

class Singleton(object):
        def __new__(cls, *args, **kwds):
            it = cls.__dict__.get("__it__")
            if it is not None:
                return it
            cls.__it__ = it = object.__new__(cls)
            it.init(*args, **kwds)
            return it
        def init(self, *args, **kwds):
            pass

您也可以使用装饰器使用PEP 318中的此实现

def singleton(cls):
    instances = {}
    def getinstance():
        if cls not in instances:
            instances[cls] = cls()
        return instances[cls]
    return getinstance

@singleton
class MyClass:
...

Referring to this doc:

When subclassing immutable built-in types like numbers and strings, and occasionally in other situations, the static method new comes in handy. new is the first step in instance construction, invoked before init.

The new method is called with the class as its first argument; its responsibility is to return a new instance of that class.

Compare this to init: init is called with an instance as its first argument, and it doesn’t return anything; its responsibility is to initialize the instance.

There are situations where a new instance is created without calling init (for example when the instance is loaded from a pickle). There is no way to create a new instance without calling new (although in some cases you can get away with calling a base class’s new).

Regarding what you wish to achieve, there also in same doc info about Singleton pattern

class Singleton(object):
        def __new__(cls, *args, **kwds):
            it = cls.__dict__.get("__it__")
            if it is not None:
                return it
            cls.__it__ = it = object.__new__(cls)
            it.init(*args, **kwds)
            return it
        def init(self, *args, **kwds):
            pass

you may also use this implementation from PEP 318, using a decorator

def singleton(cls):
    instances = {}
    def getinstance():
        if cls not in instances:
            instances[cls] = cls()
        return instances[cls]
    return getinstance

@singleton
class MyClass:
...

回答 8

class M(type):
    _dict = {}

    def __call__(cls, key):
        if key in cls._dict:
            print 'EXISTS'
            return cls._dict[key]
        else:
            print 'NEW'
            instance = super(M, cls).__call__(key)
            cls._dict[key] = instance
            return instance

class A(object):
    __metaclass__ = M

    def __init__(self, key):
        print 'INIT'
        self.key = key
        print

a1 = A('aaa')
a2 = A('bbb')
a3 = A('aaa')

输出:

NEW
INIT

NEW
INIT

EXISTS

NB作为一个副作用M._dict属性会自动变成可触及AA._dict所以要小心不要顺带覆盖它。

class M(type):
    _dict = {}

    def __call__(cls, key):
        if key in cls._dict:
            print 'EXISTS'
            return cls._dict[key]
        else:
            print 'NEW'
            instance = super(M, cls).__call__(key)
            cls._dict[key] = instance
            return instance

class A(object):
    __metaclass__ = M

    def __init__(self, key):
        print 'INIT'
        self.key = key
        print

a1 = A('aaa')
a2 = A('bbb')
a3 = A('aaa')

outputs:

NEW
INIT

NEW
INIT

EXISTS

NB As a side effect M._dict property automatically becomes accessible from A as A._dict so take care not to overwrite it incidentally.


回答 9

__new__应该返回一个类的新的空白实例。然后调用__init__初始化该实例。您不是在__new__的“ NEW”情况下调用__init__,因此正在为您调用它。所调用的代码__new__无法跟踪是否已在特定实例上调用__init__,也不会跟踪它,因为您在这里做的事情很不寻常。

您可以在__init__函数中向该对象添加一个属性,以指示它已被初始化。首先检查该属性是否存在,如果已存在,请不要继续进行。

__new__ should return a new, blank instance of a class. __init__ is then called to initialise that instance. You’re not calling __init__ in the “NEW” case of __new__, so it’s being called for you. The code that is calling __new__ doesn’t keep track of whether __init__ has been called on a particular instance or not nor should it, because you’re doing something very unusual here.

You could add an attribute to the object in the __init__ function to indicate that it’s been initialised. Check for the existence of that attribute as the first thing in __init__ and don’t proceed any further if it has been.


回答 10

对@AntonyHatchkins答案的更新,您可能希望为元类型的每个类提供单独的实例字典,这意味着您应__init__在元类中使用一个方法使用该字典初始化您的类对象,而不是使它在所有类中都为全局对象。

class MetaQuasiSingleton(type):
    def __init__(cls, name, bases, attibutes):
        cls._dict = {}

    def __call__(cls, key):
        if key in cls._dict:
            print('EXISTS')
            instance = cls._dict[key]
        else:
            print('NEW')
            instance = super().__call__(key)
            cls._dict[key] = instance
        return instance

class A(metaclass=MetaQuasiSingleton):
    def __init__(self, key):
        print 'INIT'
        self.key = key
        print()

我继续使用一种__init__方法更新了原始代码,并将语法更改为Python 3表示法(super类参数中的no-arg调用和metaclass而不是作为属性)。

无论哪种方式,最重要的一点是,你的类初始化函数(__call__方法)将不会执行任何__new__或者__init__如果键被找到。这比使用干净得多__new__,如果要跳过默认__init__步骤,使用标记您需要标记该对象。

An update to @AntonyHatchkins answer, you probably want a separate dictionary of instances for each class of the metatype, meaning that you should have an __init__ method in the metaclass to initialize your class object with that dictionary instead of making it global across all the classes.

class MetaQuasiSingleton(type):
    def __init__(cls, name, bases, attibutes):
        cls._dict = {}

    def __call__(cls, key):
        if key in cls._dict:
            print('EXISTS')
            instance = cls._dict[key]
        else:
            print('NEW')
            instance = super().__call__(key)
            cls._dict[key] = instance
        return instance

class A(metaclass=MetaQuasiSingleton):
    def __init__(self, key):
        print 'INIT'
        self.key = key
        print()

I have gone ahead and updated the original code with an __init__ method and changed the syntax to Python 3 notation (no-arg call to super and metaclass in the class arguments instead of as an attribute).

Either way, the important point here is that your class initializer (__call__ method) will not execute either __new__ or __init__ if the key is found. This is much cleaner than using __new__, which requires you to mark the object if you want to skip the default __init__ step.


回答 11

深入了解这一点!

CPython中泛型类的类型为type,其基类为Object(除非您明确定义另一个基类,如元类)。低级呼叫的顺序可以在这里找到。所谓的第一种方法是type_call,然后调用tp_new,然后tp_init

这里有趣的部分是tp_new将调用Object的(基类)new方法object_new,该方法执行tp_allocPyType_GenericAlloc)为对象分配内存的方法:)

那时在内存中创建对象,然后__init__调用该方法。如果__init__未在您的类中实现,则将object_init调用gets并且不执行任何操作:)

然后type_call只返回绑定到变量的对象。

Digging little deeper into that!

The type of a generic class in CPython is type and its base class is Object (Unless you explicitly define another base class like a metaclass). The sequence of low level calls can be found here. The first method called is the type_call which then calls tp_new and then tp_init.

The interesting part here is that tp_new will call the Object‘s (base class) new method object_new which does a tp_alloc (PyType_GenericAlloc) which allocates the memory for the object :)

At that point the object is created in memory and then the __init__ method gets called. If __init__ is not implemented in your class then the object_init gets called and it does nothing :)

Then type_call just returns the object which binds to your variable.


回答 12

应该将其__init__视为传统OO语言中的一种简单构造函数。例如,如果您熟悉Java或C ++,则向构造函数隐式传递一个指向其自身实例的指针。对于Java,它是this变量。如果要检查为Java生成的字节码,则有人会注意到有两个调用。第一个调用是对“ new”方法的调用,然后下一个调用是对init方法的调用(这是对用户定义的构造函数的实际调用)。通过两步过程,可以在调用类的构造方法(该实例的另一个方法)之前创建实际实例。

现在,对于Python,__new__是用户可以访问的附加功能。Java由于其类型性质而没有提供这种灵活性。如果一种语言提供了该功能,那么的实现者__new__可以在返回实例之前用该方法做很多事情,包括在某些情况下创建不相关对象的全新实例。而且,这种方法对于Python尤其适用于不可变类型也很有效。

One should look at __init__ as a simple constructor in traditional OO languages. For example, if you are familiar with Java or C++, the constructor is passed a pointer to its own instance implicitly. In the case of Java, it is the this variable. If one were to inspect the byte code generated for Java, one would notice two calls. The first call is to an “new” method, and then next call is to the init method (which is the actual call to the user defined constructor). This two step process enables creation of the actual instance before calling the constructor method of the class which is just another method of that instance.

Now, in the case of Python, __new__ is a added facility that is accessible to the user. Java does not provide that flexibility, due to its typed nature. If a language provided that facility, then the implementor of __new__ could do many things in that method before returning the instance, including creating a totally new instance of a unrelated object in some cases. And, this approach also works out well for especially for immutable types in the case of Python.


回答 13

但是,对于为什么__init__总是被称为after ,我有点困惑__new__

我认为C ++类比在这里会很有用:

  1. __new__只需为对象分配内存。一个对象的实例变量需要内存来保存它,这就是该步骤__new__要做的。

  2. __init__ 将对象的内部变量初始化为特定值(可以是默认值)。

However, I’m a bit confused as to why __init__ is always called after __new__.

I think the C++ analogy would be useful here:

  1. __new__ simply allocates memory for the object. The instance variables of an object needs memory to hold it, and this is what the step __new__ would do.

  2. __init__ initialize the internal variables of the object to specific values (could be default).


回答 14

__init__经过被称为__new__所以,当你在子类中重写它,你添加的代码仍然会被调用。

如果您尝试对已经具有a的类进行子类化,则__new__对此一无所知的人可能会先改编__init__并将调用向下转发到子类__init__。这种呼叫__init__后的约定__new__有助于按预期工作。

__init__仍然需要允许超任何参数__new__需要的,但不这样做通常会建立一个清晰的运行时错误。并且__new__可能应该明确允许*args和’** kw’,以明确表示扩展名是可以的。

这是普遍不好的形式既有__new____init__在继承同级别在同一个Class,因为原来的海报中描述的行为。

The __init__ is called after __new__ so that when you override it in a subclass, your added code will still get called.

If you are trying to subclass a class that already has a __new__, someone unaware of this might start by adapting the __init__ and forwarding the call down to the subclass __init__. This convention of calling __init__ after __new__ helps that work as expected.

The __init__ still needs to allow for any parameters the superclass __new__ needed, but failing to do so will usually create a clear runtime error. And the __new__ should probably explicitly allow for *args and ‘**kw’, to make it clear that extension is OK.

It is generally bad form to have both __new__ and __init__ in the same class at the same level of inheritance, because of the behavior the original poster described.


回答 15

但是,对于为什么__init__总是被称为after ,我有点困惑__new__

除了这样做是没有其他原因的。__new__没有初始化类的责任,其他方法有责任(__call__,可能-我不确定)。

我没想到这一点。谁能告诉我为什么会这样,否则我如何实现此功能?(除了将实现放入__new__hack之外)。

你可以有__init__做什么,如果它已经被初始化,或者你可以写一个新的一个新的元类__call__,只有调用__init__新的实例,否则直接返回__new__(...)

However, I’m a bit confused as to why __init__ is always called after __new__.

Not much of a reason other than that it just is done that way. __new__ doesn’t have the responsibility of initializing the class, some other method does (__call__, possibly– I don’t know for sure).

I wasn’t expecting this. Can anyone tell me why this is happening and how I implement this functionality otherwise? (apart from putting the implementation into the __new__ which feels quite hacky).

You could have __init__ do nothing if it’s already been initialized, or you could write a new metaclass with a new __call__ that only calls __init__ on new instances, and otherwise just returns __new__(...).


回答 16

原因很简单,函数用于创建实例,而init用于初始化实例。在初始化之前,应先创建实例。这就是为什么应该在init之前调用new的原因。

The simple reason is that the new is used for creating an instance, while init is used for initializing the instance. Before initializing, the instance should be created first. That’s why new should be called before init.


回答 17

现在我遇到了同样的问题,由于某些原因,我决定避免使用装饰器,工厂和元类。我这样做是这样的:

主文件

def _alt(func):
    import functools
    @functools.wraps(func)
    def init(self, *p, **k):
        if hasattr(self, "parent_initialized"):
            return
        else:
            self.parent_initialized = True
            func(self, *p, **k)

    return init


class Parent:
    # Empty dictionary, shouldn't ever be filled with anything else
    parent_cache = {}

    def __new__(cls, n, *args, **kwargs):

        # Checks if object with this ID (n) has been created
        if n in cls.parent_cache:

            # It was, return it
            return cls.parent_cache[n]

        else:

            # Check if it was modified by this function
            if not hasattr(cls, "parent_modified"):
                # Add the attribute
                cls.parent_modified = True
                cls.parent_cache = {}

                # Apply it
                cls.__init__ = _alt(cls.__init__)

            # Get the instance
            obj = super().__new__(cls)

            # Push it to cache
            cls.parent_cache[n] = obj

            # Return it
            return obj

示例类

class A(Parent):

    def __init__(self, n):
        print("A.__init__", n)


class B(Parent):

    def __init__(self, n):
        print("B.__init__", n)

正在使用

>>> A(1)
A.__init__ 1  # First A(1) initialized 
<__main__.A object at 0x000001A73A4A2E48>
>>> A(1)      # Returned previous A(1)
<__main__.A object at 0x000001A73A4A2E48>
>>> A(2)
A.__init__ 2  # First A(2) initialized
<__main__.A object at 0x000001A7395D9C88>
>>> B(2)
B.__init__ 2  # B class doesn't collide with A, thanks to separate cache
<__main__.B object at 0x000001A73951B080>
  • 警告:您不应该初始化Parent,它与其他类发生冲突-除非您在每个子代中都定义了单独的缓存,否则这不是我们想要的。
  • 警告:父辈的祖父母类看起来很奇怪。[未验证]

在线尝试!

Now I’ve got the same problem, and for some reasons I decided to avoid decorators, factories and metaclasses. I did it like this:

Main file

def _alt(func):
    import functools
    @functools.wraps(func)
    def init(self, *p, **k):
        if hasattr(self, "parent_initialized"):
            return
        else:
            self.parent_initialized = True
            func(self, *p, **k)

    return init


class Parent:
    # Empty dictionary, shouldn't ever be filled with anything else
    parent_cache = {}

    def __new__(cls, n, *args, **kwargs):

        # Checks if object with this ID (n) has been created
        if n in cls.parent_cache:

            # It was, return it
            return cls.parent_cache[n]

        else:

            # Check if it was modified by this function
            if not hasattr(cls, "parent_modified"):
                # Add the attribute
                cls.parent_modified = True
                cls.parent_cache = {}

                # Apply it
                cls.__init__ = _alt(cls.__init__)

            # Get the instance
            obj = super().__new__(cls)

            # Push it to cache
            cls.parent_cache[n] = obj

            # Return it
            return obj

Example classes

class A(Parent):

    def __init__(self, n):
        print("A.__init__", n)


class B(Parent):

    def __init__(self, n):
        print("B.__init__", n)

In use

>>> A(1)
A.__init__ 1  # First A(1) initialized 
<__main__.A object at 0x000001A73A4A2E48>
>>> A(1)      # Returned previous A(1)
<__main__.A object at 0x000001A73A4A2E48>
>>> A(2)
A.__init__ 2  # First A(2) initialized
<__main__.A object at 0x000001A7395D9C88>
>>> B(2)
B.__init__ 2  # B class doesn't collide with A, thanks to separate cache
<__main__.B object at 0x000001A73951B080>
  • Warning: You shouldn’t initialize Parent, it will collide with other classes – unless you defined separate cache in each of the children, that’s not what we want.
  • Warning: It seems a class with Parent as grandparent behaves weird. [Unverified]

Try it online!


Python-patterns Python中的设计模式/习惯用法的集合

Python模式

Python中的设计模式和习惯用法的集合

当前模式

创作模式

图案 描述
abstract_factory 对特定工厂使用泛型函数
borg 实例间状态共享的单例
builder 生成器对象接收参数并返回构造的对象,而不是使用多个构造函数
factory 委托专用函数/方法来创建实例
lazy_evaluation Python中延迟计算的属性模式
pool 预实例化并维护一组相同类型的实例
prototype 为新实例使用原型的工厂和克隆(如果实例化成本较高)

结构模式

图案 描述
3-tier 数据<->业务逻辑<->表示分离(严格关系)
adapter 使用白名单使一个接口适应另一个接口
bridge 客户端-提供商中间人,用于软化界面更改
composite 允许客户端统一处理各个对象和组合
decorator 将功能与其他功能一起包装以影响输出
facade 使用一个类作为多个其他类的API
flyweight 透明地重用具有相似/相同状态的对象的现有实例
front_controller 进入应用程序的单个处理程序请求
mvc 模型<->视图<->控制器(非严格关系)
proxy 对象将操作传递给其他对象

行为模式

图案 描述
chain_of_responsibility 应用一系列连续的处理程序来尝试和处理数据
catalog 通用方法将根据构造参数调用不同的专用方法
chaining_method 继续回调下一个对象方法
command 捆绑命令和参数以供稍后调用
iterator 遍历容器并访问容器的元素
iterator(Alt.实施。) 遍历容器并访问容器的元素
mediator 知道如何连接其他对象并充当代理的对象
memento 生成可用于返回到以前状态的不透明令牌
observer 提供回调以通知数据的事件/更改
publish_subscribe 源将事件/数据联合到0+注册的侦听器
registry 跟踪给定类的所有子类
specification 通过使用布尔逻辑将业务规则链接在一起,可以重新组合业务规则
state 逻辑被组织成离散数量的潜在状态和可以转换到的下一个状态
strategy 对相同数据的可选操作
template 对象强加了一个结构,但接受可插入的组件
visitor 调用集合中所有项的回调

可测试性模式的设计

图案 描述
dependency_injection 依赖项注入的3种变体

基本模式

图案 描述
delegation_pattern 对象通过委托给第二个对象(委托)来处理请求

其他

图案 描述
blackboard 架构模型,集合不同子系统知识构建解决方案,AI方法-非四人帮模式
graph_search 绘图算法–非四人组模式
hsm 分层状态机-非四人组模式

视频

Design Patterns in Python by Peter Ullrich

Sebastian Buczyński – Why you don’t need design patterns in Python?

You Don’t Need That!

Pluggable Libs Through Design Patterns

贡献

添加或修改实施时,请查看以下准则:

输出

具有示例模式的所有文件都具有### OUTPUT ###部分(迁移到输出=“。”正在进行中)

append_output.sh(例如./append_output.sh borg.py)生成/更新它

文档字符串

以文档字符串的形式添加模块级描述,其中包含指向相应参考资料或其他有用信息的链接

如果您知道一些,请添加“Python生态系统中的示例”部分。它展示了如何将模式应用于现实世界的问题

facade.py有一个很好的详细描述的示例,但有时是较短的示例,如template.py就足够了

在某些情况下,带有doctest的类级文档字符串也会有所帮助(请参见adapter.py),但可读的输出部分要好得多

Python 2兼容性

要查看某些模式的Python2兼容版本,请查看legacy标签

更新自述文件

当其他工作完成后-更新自述文件的相应部分

特拉维斯CI

请跑吧toxtox -e ci37在提交修补程序之前,请确保您的更改将通过CI

您还可以运行flake8pytest手动命令。示例可在tox.ini

通过问题分类进行贡献

您可以对问题进行分类并提取请求,其中可能包括重现错误报告或要求提供重要信息,如版本号或重现说明。如果您想要开始对问题进行分类,一种简单的开始方法是subscribe to python-patterns on CodeTriage

System-design-primer 学习如何设计大型系统 为系统设计面试做准备

了解如何设计大型系统

为系统设计面试做准备

为系统设计面试做准备

学习如何设计可伸缩的系统将帮助您成为一名更好的工程师

系统设计是一个宽泛的话题。网上有大量关于系统设计原则的资源。

此repo是一个有组织的资源集合,可帮助您了解如何大规模构建系统

编码资源:交互式编码挑战

这是一个不断更新的开放源码项目

欢迎投稿!

步骤1:概述用例、约束和假设

除了对面试进行编码外,系统设计在许多科技公司的技术面试流程中也是必不可少的组成部分

练习常见的系统设计面试问题,并将您的结果与示例解决方案(讨论、代码和图表)进行比较

面试准备的其他主题:

系统设计主题索引

提供的Anki抽认卡套装使用间隔重复来帮助您记住关键的系统设计概念

非常适合在旅途中使用

步骤2:创建高级设计

正在寻找资源来帮助您准备编码面试吗?

请查看姊妹版repo交互式编码挑战,其中包含额外的Anki幻灯片:

学习指导

向社区学习

请随时提交拉取请求以提供帮助:

  • Fix errors
  • Improve sections
  • Add new sections
  • Translate

需要润色的内容正在开发中

查看投稿指南

如何处理系统设计面试问题

各种系统设计主题的摘要,包括优缺点。每件事都是权衡的

每个部分都包含指向更深入资源的链接

系统设计面试问题及解决方案

根据您的面试时间表(短、中、长)建议复习的主题

问:对于面试,我需要知道这里的一切吗?

A:不,你不需要了解这里的一切来准备面试

你在面试中被问到的问题取决于以下变量:

  • How much experience you have
  • What your technical background is
  • What positions you are interviewing for
  • Which companies you are interviewing with
  • Luck

更有经验的应聘者通常会对系统设计有更多的了解。架构师或团队领导可能会比单个贡献者了解更多。顶级科技公司可能会有一轮或多轮设计面试

从宽泛开始,在几个领域深入研究。它有助于您对各种关键的系统设计主题有所了解。根据您的时间表、经验、您面试的职位以及您面试的公司调整以下指南

  • Short timeline – Aim for breadth with system design topics. Practice by solving some interview questions.
  • Medium timeline – Aim for breadth and some depth with system design topics. Practice by solving many interview questions.
  • Long timeline – Aim for breadth and more depth with system design topics. Practice by solving most interview questions.
Short Medium Long
Read through the System design topics to get a broad understanding of how systems work :+1: :+1: :+1:
Read through a few articles in the Company engineering blogs for the companies you are interviewing with :+1: :+1: :+1:
Read through a few Real world architectures :+1: :+1: :+1:
Review How to approach a system design interview question :+1: :+1: :+1:
Work through System design interview questions with solutions Some Many Most
Work through Object-oriented design interview questions with solutions Some Many Most
Review Additional system design interview questions Some Many Most

面向对象的设计面试问题及其解决方案

如何进行撞击系统设计面试题

系统设计面试是一场开放式的谈话。希望你来领导它

您可以使用以下步骤来指导讨论。要帮助巩固此过程,请使用以下步骤完成系统设计面试问题与解决方案部分

步骤3:设计核心组件

收集需求并确定问题范围。提出问题以澄清用例和约束。讨论假设

  • Who is going to use it?
  • How are they going to use it?
  • How many users are there?
  • What does the system do?
  • What are the inputs and outputs of the system?
  • How much data do we expect to handle?
  • How many requests per second do we expect?
  • What is the expected read to write ratio?

步骤4:调整设计比例

概述包含所有重要组件的高级设计

  • Sketch the main components and connections
  • Justify your ideas

粗略计算

深入了解每个核心组件的详细信息。例如,如果您被要求设计一个url缩短服务,请讨论:

  • Generating and storing a hash of the full url
    • MD5 and Base62
    • Hash collisions
    • SQL or NoSQL
    • Database schema
  • Translating a hashed url to the full url
    • Database lookup
  • API and object-oriented design

来源和进一步阅读

在给定约束的情况下,确定并解决瓶颈问题。例如,您是否需要以下内容来解决可伸缩性问题?

  • Load balancer
  • Horizontal scaling
  • Caching
  • Database sharding

讨论潜在的解决方案和权衡。每件事都是权衡的。使用可扩展系统设计原则解决瓶颈问题

Design Pastebin.com(或bit.ly)

你可能会被要求手工做一些估算。有关以下资源,请参阅附录:

设计Twitter时间表和搜索(或Facebook提要和搜索)

请查看以下链接,以更好地了解预期内容:

系统设计主题:从此处开始

带有示例讨论、代码和图表的常见系统设计面试问题

链接到解决方案/文件夹中内容的解决方案

Question
Design Pastebin.com (or Bit.ly) Solution
Design the Twitter timeline and search (or Facebook feed and search) Solution
Design a web crawler Solution
Design Mint.com Solution
Design the data structures for a social network Solution
Design a key-value store for a search engine Solution
Design Amazon’s sales ranking by category feature Solution
Design a system that scales to millions of users on AWS Solution
Add a system design question Contribute

设计一个网络爬虫程序

查看练习和解决方案

Design Mint.com

查看练习和解决方案

设计社交网络的数据结构

查看练习和解决方案

为搜索引擎设计键值存储

查看练习和解决方案

按类别功能设计亚马逊的销售排名

查看练习和解决方案

设计可扩展到数百万AWS用户的系统

查看练习和解决方案

第1步:复习可扩展性视频课程

查看练习和解决方案

步骤2:查看可伸缩性文章

查看练习和解决方案

性能与可扩展性

常见的面向对象设计面试问题,带有示例讨论、代码和图表

注:此部分正在开发中。

是系统设计的新手吗?

Question
Design a hash map Solution
Design a least recently used cache Solution
Design a call center Solution
Design a deck of cards Solution
Design a parking lot Solution
Design a chat server Solution
Design a circular array Contribute
Add an object-oriented design question Contribute

延迟与吞吐量

首先,您需要基本了解通用原则,了解它们是什么、如何使用以及它们的优缺点

哈佛大学可伸缩性讲座

下一步

可扩展性

  • Topics covered:
    • Vertical scaling
    • Horizontal scaling
    • Caching
    • Load balancing
    • Database replication
    • Database partitioning

盖子定理

接下来,我们来看看高级权衡:

弱一致性

请记住,每件事都是权衡的。

  • Performance vs scalability
  • Latency vs throughput
  • Availability vs consistency

然后,我们将深入探讨更具体的主题,如DNS、CDN和负载均衡器

如果服务以与添加的资源成正比的方式提高性能,则该服务是可伸缩的。通常,提高性能意味着服务更多的工作单元,但也可以处理更大的工作单元,例如当数据集增长时。1

可用性与一致性

看待性能与可伸缩性的另一种方式:

延迟是执行某种操作或产生某种结果的时间

  • If you have a performance problem, your system is slow for a single user.
  • If you have a scalability problem, your system is fast for a single user but slow under heavy load.

最终一致性

一致性模式

吞吐量是每单位时间内此类操作或结果的数量

通常,您应该以具有可接受延迟的最大吞吐量为目标

资料来源:复习上限定理

强一致性

可用性模式

故障转移

在分布式计算机系统中,您只能支持以下两项保证:

网络不可靠,因此您需要支持分区容错。您需要在一致性和可用性之间进行软件权衡

  • Consistency – Every read receives the most recent write or an error
  • Availability – Every request receives a response, without guarantee that it contains the most recent version of the information
  • Partition Tolerance – The system continues to operate despite arbitrary partitioning due to network failures

等待来自分区节点的响应可能会导致超时错误。如果您的业务需求需要原子读写,CP是一个很好的选择

SQL调优

响应返回任何节点上可用的最容易获得的数据版本,该版本可能不是最新版本。在解析分区时,写入可能需要一些时间才能传播

键值存储

如果业务需要考虑到最终的一致性,或者当系统需要在出现外部错误的情况下继续工作时,AP是一个很好的选择

对于同一数据的多个副本,我们面临着如何对它们进行同步操作的选择,以便客户对数据有一致的看法。回想一下CAP定理中的一致性定义-每次读取都会收到最近的写入或错误

缺点:故障转移

域名系统

写入后,读取可能会也可能看不到它。采取了尽力而为的方法

复制

这种方法可以在memcached等系统中看到。弱一致性适用于VoIP、视频聊天和实时多人游戏等实时用例。例如,如果您正在打电话,并且在几秒钟内失去接收,那么当您重新连接时,您听不到在连接中断期间所说的话

写入之后,读取最终会看到它(通常在毫秒内)。异步复制数据

数量上的可用性

在DNS和电子邮件等系统中可以看到这种方法。最终一致性在高可用性系统中运行良好

写入后,Reads将看到它。同步复制数据

缺点:DNS

这种方法可以在文件系统和RDBMS中看到。强一致性在需要事务的系统中运行良好

支持高可用性有两种互补模式:故障切换和复制

推流CDN

内容交付网络

使用主动-被动故障转移时,会在处于备用状态的主动服务器和被动服务器之间发送心跳。如果心跳中断,被动服务器将接管主动服务器的IP地址并恢复服务

拉取CDN

文档存储

停机时间的长短取决于被动服务器是否已经在“热”待机状态下运行,或者它是否需要从“冷”待机状态启动。只有活动服务器才能处理流量

主动-被动故障切换也可以称为主-从故障切换

在主动-主动模式下,两台服务器都在管理流量,在它们之间分担负载

宽列存储

如果服务器是面向公众的,则DNS需要知道两个服务器的公共IP。如果服务器是面向内部的,则应用程序逻辑需要了解这两个服务器

主动-主动故障切换也可以称为主-主故障切换

此主题将在数据库部分进一步讨论:

劣势:CDN

  • Fail-over adds more hardware and additional complexity.
  • There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.

第4层负载均衡

图形数据库

可用性通常通过正常运行时间(或停机时间)作为服务可用时间的百分比来量化。可用性通常用数字9来衡量–99.99%可用性的服务被描述为有四个9

第7层负载均衡

如果服务由多个容易发生故障的组件组成,则服务的总体可用性取决于这些组件是顺序的还是并行的

来源和进一步阅读:NoSQL

Duration Acceptable downtime
Downtime per year 8h 45min 57s
Downtime per month 43m 49.7s
Downtime per week 10m 4.8s
Downtime per day 1m 26.4s

后备缓存

Duration Acceptable downtime
Downtime per year 52min 35.7s
Downtime per month 4m 23s
Downtime per week 1m 5s
Downtime per day 8.6s

直写

当两个可用性<100%的组件按顺序排列时,总体可用性会降低:

In sequence

如果Foo和Bar都有99.9%的可用性,那么它们的总可用性依次为99.8%

Availability (Total) = Availability (Foo) * Availability (Bar)

当两个可用性<100%的组件并行时,总体可用性会提高:

In parallel

如果Foo和BAR都有99.9%的可用性,那么它们并行的总可用性将是99.9999%

Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))

来源:DNS安全演示

负载均衡器

域名系统(DNS)将域名(如www.example.com)转换为IP地址

DNS是分层的,顶层有几个权威服务器。您的路由器或ISP提供有关执行查找时联系哪些DNS服务器的信息。较低级别的DNS服务器缓存映射,这些映射可能会因DNS传播延迟而变得陈旧。您的浏览器或操作系统还可以将DNS结果缓存一段时间,具体取决于生存时间(TTL)

CloudFlare和Route 53等服务提供托管DNS服务。某些DNS服务可以通过各种方法路由流量:

  • NS record (name server) – Specifies the DNS servers for your domain/subdomain.
  • MX record (mail exchange) – Specifies the mail servers for accepting messages.
  • A record (address) – Points a name to an IP address.
  • CNAME (canonical) – Points a name to another name or CNAME (example.com to www.example.com) or to an A record.

来源:为什么使用CDN

水平缩放

  • Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
  • DNS server management could be complex and is generally managed by governments, ISPs, and large companies.
  • DNS services have recently come under DDoS attack, preventing users from accessing websites such as Twitter without knowing Twitter’s IP address(es).

缺点:负载均衡器

反向代理(Web服务器)

内容递送网络(CDN)是全球分布的代理服务器网络,提供来自更靠近用户的位置的内容。一般情况下,静电文件(如html/css/js)、照片和视频都是由云服务提供的,但有些云服务(如亚马逊的云前端)支持动态内容。站点的DNS解析将告诉客户端要联系哪个服务器

从CDN提供内容可以通过两种方式显著提高性能:

每当您的服务器发生更改时,推送CDN都会接收新内容。您负责提供内容、直接上传到CDN、重写指向CDN的URL。您可以配置内容何时过期以及何时更新。只有在内容是新的或更改的情况下才会上载内容,从而最大限度地减少流量,但最大限度地提高存储

  • Users receive content from data centers close to them
  • Your servers do not have to serve requests that the CDN fulfills

负载均衡器与反向代理

流量较小的站点或内容不经常更新的站点可以很好地使用推送CDN。内容只放在CDN上一次,而不是定期重新拉取

拉取CDN在第一个用户请求内容时从您的服务器抓取新内容。您将内容保留在服务器上,并重写URL以指向CDN。这会导致请求速度变慢,直到内容缓存在CDN上

缺点:反向代理

生存时间(TTL)确定缓存内容的时间长度。拉取CDN最大限度地减少了CDN上的存储空间,但如果文件过期并在实际更改之前被拉取,则可能会产生冗余流量

流量大的站点可以很好地使用拉式CDN,因为流量分布更均匀,只有最近请求的内容保留在CDN上

来源:可伸缩系统设计模式

微服务

  • CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN.
  • Content might be stale if it is updated before the TTL expires it.
  • CDNs require changing URLs for static content to point to the CDN.

服务发现

应用层

负载平衡器将传入的客户端请求分发到应用程序服务器和数据库等计算资源。在每种情况下,负载均衡器都会将来自计算资源的响应返回到相应的客户端。负载均衡器在以下方面有效:

负载均衡器可以通过硬件(昂贵)或软件(如HAProxy)实现

  • Preventing requests from going to unhealthy servers
  • Preventing overloading resources
  • Helping to eliminate a single point of failure

其他优势包括:

为了防止故障,通常在主动-被动或主动-主动模式下设置多个负载均衡器

  • SSL termination – Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
  • Session persistence – Issue cookies and route a specific client’s requests to same instance if the web apps do not keep track of sessions

负载均衡器可以根据各种指标路由流量,包括:

第4层负载均衡器查看传输层的信息,以决定如何分发请求。通常,这涉及报头中的源IP地址、目的IP地址和端口,但不涉及数据包的内容。第4层负载均衡器转发进出上游服务器的网络数据包,执行网络地址转换(NAT)

缺点:应用层

第7层负载均衡器查看应用层以决定如何分发请求。这可能涉及标头、消息和Cookie的内容。第7层负载均衡器终止网络流量,读取消息,做出负载平衡决策,然后打开到所选服务器的连接。例如,第7层负载均衡器可以将视频流量定向到托管视频的服务器,同时将更敏感的用户计费流量定向到经过安全强化的服务器

关系数据库管理系统(RDBMS)

以灵活性为代价,与第7层相比,第4层负载平衡需要更少的时间和计算资源,尽管对现代商用硬件的性能影响可能微乎其微

负载均衡器还可以帮助进行水平扩展,从而提高性能和可用性。与在更昂贵的硬件上纵向扩展单个服务器(称为垂直扩展)相比,使用商用计算机进行横向扩展更具成本效益,并带来更高的可用性。与专门的企业系统相比,在商用硬件上工作的人才也更容易招聘

NoSQL

来源:维基百科

写后(回写)

  • Scaling horizontally introduces complexity and involves cloning servers
    • Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
    • Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
  • Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out

SQL或NoSQL

  • The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
  • Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
  • A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.

客户端缓存

数据库

反向代理是集中内部服务并向公众提供统一接口的Web服务器。在反向代理将服务器的响应返回给客户端之前,将来自客户端的请求转发到可以实现该请求的服务器

来源:规模架构系统简介

通过将Web层与应用层(也称为平台层)分开,您可以分别扩展和配置这两个层。添加新API会导致添加应用程序服务器,而不必添加额外的Web服务器。单一责任原则主张共同工作的小型自主服务。拥有小型服务的小型团队可以更积极地规划快速增长

  • Increased security – Hide information about backend servers, blacklist IPs, limit number of connections per client
  • Increased scalability and flexibility – Clients only see the reverse proxy’s IP, allowing you to scale servers or change their configuration
  • SSL termination – Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
  • Compression – Compress server responses
  • Caching – Return the response for cached requests
  • Static content – Serve static content directly
    • HTML/CSS/JS
    • Photos
    • Videos
    • Etc

CDN缓存

  • Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.
  • Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
  • Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.

Web服务器缓存

  • Introducing a reverse proxy results in increased complexity.
  • A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.

数据库缓存

高速缓存

应用程序层中的工作者也有助于启用异步

与此讨论相关的是微服务,可以将其描述为一套可独立部署的小型模块化服务。每个服务都运行一个独特的流程,并通过定义良好的轻量级机制进行通信,以服务于业务目标。1个

例如,Pinterest可以拥有以下微服务:用户档案、追随者、馈送、搜索、照片上传等

应用程序缓存

诸如Consul、etcd和ZooKeeper这样的系统可以通过跟踪注册的名称、地址和端口来帮助服务找到彼此。运行状况检查有助于验证服务完整性,通常使用HTTP端点来完成。Consul和etcd都有一个内置的键值存储,可用于存储配置值和其他共享数据

来源:向上扩展至您的第一个1000万用户

数据库查询级别的高速缓存

像SQL这样的关系数据库是以表形式组织的数据项的集合

对象级别的缓存

  • Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
  • Microservices can add complexity in terms of deployments and operations.

何时更新缓存

异步

ACID是关系数据库事务的一组属性

缺点:缓存

扩展关系数据库有许多技术:主-从复制、主-主复制、联合、分片、反规范化和SQL调优

主机服务于读取和写入,将写入复制到一个或多个仅服务于读取的从机。从属设备还可以以树状方式复制到其他从属设备。如果主机脱机,系统可以继续以只读模式运行,直到将从属提升为主机或调配新主机

  • Atomicity – Each transaction is all or nothing
  • Consistency – Any transaction will bring the database from one valid state to another
  • Isolation – Executing transactions concurrently has the same results as if the transactions were executed serially
  • Durability – Once a transaction has been committed, it will remain so

来源:可伸缩性、可用性、稳定性、模式

提前刷新

两个主机都提供读写服务,并在写入时相互协调。如果任一主机发生故障,系统可以在读取和写入的情况下继续运行

来源:可伸缩性、可用性、稳定性、模式

Disadvantage(s): master-slave replication
  • Additional logic is needed to promote a slave to a master.
  • See Disadvantage(s): replication for points related to both master-slave and master-master.

来源和进一步阅读:HTTP

来源:向上扩展至您的第一个1000万用户

联合(或功能分区)按功能拆分数据库。例如,您可以拥有三个数据库,而不是一个单一的整体数据库:论坛、用户和产品,从而减少每个数据库的读写流量,从而减少复制延迟。较小的数据库会产生更多可以放入内存的数据,这反过来又会因为改进的高速缓存位置而导致更多的高速缓存命中率。由于没有单个中央主机串行化写入,您可以并行写入,从而增加吞吐量

Disadvantage(s): master-master replication
  • You’ll need a load balancer or you’ll need to make changes to your application logic to determine where to write.
  • Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
  • Conflict resolution comes more into play as more write nodes are added and as latency increases.
  • See Disadvantage(s): replication for points related to both master-slave and master-master.
Disadvantage(s): replication
  • There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
  • Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can’t do as many reads.
  • The more read slaves, the more you have to replicate, which leads to greater replication lag.
  • On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
  • Replication adds more hardware and additional complexity.
Source(s) and further reading: replication

来源和进一步阅读:TCP和UDP

来源:可伸缩性、可用性、稳定性、模式

分片将数据分布在不同的数据库中,以便每个数据库只能管理数据的一个子集。以用户数据库为例,随着用户数量的增加,集群中会添加更多的分片

Disadvantage(s): federation
  • Federation is not effective if your schema requires huge functions or tables.
  • You’ll need to update your application logic to determine which database to read and write.
  • Joining data from two databases is more complex with a server link.
  • Federation adds more hardware and additional complexity.
Source(s) and further reading: federation

缺点:RPC

与联合的优点类似,分片可以减少读写流量、减少复制和增加缓存命中率。索引大小也会减小,这通常会通过更快的查询提高性能。如果一个碎片发生故障,其他碎片仍可运行,尽管您需要添加某种形式的复制以避免数据丢失。与联合一样,没有单个串行化写入的中央主机,允许您在提高吞吐量的同时并行写入

共享用户表的常见方式是通过用户的姓氏首字母或用户的地理位置

反规格化试图以牺牲一些写入性能为代价来提高读取性能。数据的冗余副本被写入多个表中,以避免昂贵的联接。一些RDBMS(如PostgreSQL和Oracle)支持物化视图,物化视图处理存储冗余信息和保持冗余副本一致的工作

使用联合和分片等技术分发数据后,管理跨数据中心的联接会进一步增加复杂性。反规格化可能会绕过对这种复杂连接的需要。

Disadvantage(s): sharding
  • You’ll need to update your application logic to work with shards, which could result in complex SQL queries.
  • Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
    • Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
  • Joining data from multiple shards is more complex.
  • Sharding adds more hardware and additional complexity.
Source(s) and further reading: sharding

劣势:睡觉

在大多数系统中,读取的数量可能远远超过写入的数量100:1甚至1000:1。导致复杂数据库联接的读取可能非常昂贵,会花费大量时间进行磁盘操作

SQL调优是一个涉及面很广的主题,很多书都是作为参考编写的

重要的是要进行基准测试和性能分析,以模拟和发现瓶颈

Disadvantage(s): denormalization
  • Data is duplicated.
  • Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
  • A denormalized database under heavy write load might perform worse than its normalized counterpart.
Source(s) and further reading: denormalization

来源及进一步阅读:睡觉和rpc

基准测试和性能分析可能会为您提供以下优化

NoSQL是键值存储、文档存储、宽列存储或图形数据库中表示的数据项的集合。数据被反规范化,连接通常在应用程序代码中完成。大多数NoSQL存储缺乏真正的ACID事务,倾向于最终的一致性

  • Benchmark – Simulate high-load situations with tools such as ab.
  • Profile – Enable tools such as the slow query log to help track performance issues.

BASE通常用来描述NoSQL数据库的属性。与CAP定理相比,BASE选择可用性而不是一致性

Tighten up the schema
  • MySQL dumps to disk in contiguous blocks for fast access.
  • Use CHAR instead of VARCHAR for fixed-length fields.
    • CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
  • Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
  • Use INT for larger numbers up to 2^32 or 4 billion.
  • Use DECIMAL for currency to avoid floating point representation errors.
  • Avoid storing large BLOBS, store the location of where to get the object instead.
  • VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
  • Set the NOT NULL constraint where applicable to improve search performance.
Use good indices
  • Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
  • Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
  • Placing an index can keep the data in memory, requiring more space.
  • Writes could also be slower since the index also needs to be updated.
  • When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
Avoid expensive joins
Partition tables
  • Break up a table by putting hot spots in a separate table to help keep it in memory.
Tune the query cache
Source(s) and further reading: SQL tuning

消息队列

除了在SQL或NoSQL之间进行选择之外,了解哪种类型的NoSQL数据库最适合您的用例也很有帮助。在下一节中,我们将回顾键值存储、文档存储、宽列存储和图形数据库

抽象:哈希表

  • Basically available – the system guarantees availability.
  • Soft state – the state of the system may change over time, even without input.
  • Eventual consistency – the system will become consistent over a period of time, given that the system doesn’t receive input during that period.

键值存储通常允许O(1)次读取和写入,并且通常由内存或SSD支持。数据存储可以按字典顺序维护键,从而允许高效地检索键范围。键值存储可允许存储具有值的元数据

来源和进一步阅读

键值存储提供高性能,通常用于简单数据模型或快速变化的数据,如内存缓存层。由于它们只提供有限的操作集,因此如果需要额外的操作,复杂性将转移到应用层

键值存储是更复杂系统(如文档存储,在某些情况下还包括图形数据库)的基础

抽象:键值存储,文档存储为值

文档存储以文档(XML、JSON、二进制等)为中心,文档存储给定对象的所有信息。文档存储提供基于文档本身的内部结构进行查询的API或查询语言。请注意,许多键值存储包括用于使用值的元数据的功能,从而模糊了这两种存储类型之间的界限

Source(s) and further reading: key-value store

可视化的延迟数字

根据底层实现,文档按集合、标签、元数据或目录进行组织。尽管可以将文档组织或分组在一起,但文档的字段可能彼此完全不同

一些文档存储,如MongoDB和CouchDB,也提供了一种类似SQL的语言来执行复杂的查询。DynamoDB同时支持键值和文档

文档存储具有很高的灵活性,通常用于处理偶尔更改的数据

来源:SQL&NoSQL,简史

抽象:嵌套映射ColumnFamily<RowKey,Columns<ColKey,Value,Timestamp>>

Source(s) and further reading: document store

英语∙日本語∙简体中文∙繁體中文|العَرَبِيَّة‎∙বাংলা∙葡萄牙语do Brasil∙Deutsch∙ελληνικά∙עברית∙Italiano∙한국어∙فارسی∙Poliano∙русскийязык∙Español∙ภาษาไทย∙Türkçe∙tiếng Việt∙Français|添加翻译

宽列存储的基本数据单位是列(名称/值对)。列可以按列族分组(类似于SQL表)。超级柱族进一步将柱族分组。您可以使用行键单独访问每列,具有相同行键的列形成一行。每个值都包含一个用于版本化和冲突解决的时间戳

Google引入了Bigtable作为第一个宽列存储,它影响了Hadoop生态系统中经常使用的开源HBase,以及Facebook的Cassandra。Bigtable、HBase和Cassandra等存储按字典顺序维护键,从而允许高效地检索选择性键范围

宽列存储提供高可用性和高可伸缩性。它们通常用于非常大的数据集

来源:图表数据库

抽象:图表

Source(s) and further reading: wide column store

网络不可靠,因此您需要支持分区容错。您需要在一致性和可用性之间进行软件权衡

在图形数据库中,每个节点是一条记录,每条弧是两个节点之间的关系。对图形数据库进行了优化,以表示具有多个外键或多对多关系的复杂关系

图形数据库为具有复杂关系的数据模型(如社交网络)提供高性能。它们相对较新,尚未广泛使用;可能更难找到开发工具和资源。很多图表只能通过睡觉API访问

来源:从RDBMS过渡到NoSQL

SQL的原因:

Source(s) and further reading: graph

请注意,许多键值存储包括用于使用值的元数据的功能,从而模糊了这两种存储类型之间的界限

任务队列

使用NoSQL的原因:

非常适合NoSQL的示例数据:

  • Structured data
  • Strict schema
  • Relational data
  • Need for complex joins
  • Transactions
  • Clear patterns for scaling
  • More established: developers, community, code, tools, etc
  • Lookups by index are very fast

来源:可伸缩系统设计模式

  • Semi-structured data
  • Dynamic or flexible schema
  • Non-relational data
  • No need for complex joins
  • Store many TB (or PB) of data
  • Very data intensive workload
  • Very high throughput for IOPS

缓存可以缩短页面加载时间,并可以减少服务器和数据库的负载。在此模型中,调度程序将首先查找以前是否已发出请求,并尝试查找要返回的前一个结果,以便保存实际执行

  • Rapid ingest of clickstream and log data
  • Leaderboard or scoring data
  • Temporary data, such as a shopping cart
  • Frequently accessed (‘hot’) tables
  • Metadata/lookup tables
Source(s) and further reading: SQL or NoSQL

沟通

数据库通常受益于跨其分区的读和写的统一分布。受欢迎的项目可能会扭曲分布,造成瓶颈。将缓存放在数据库前面有助于吸收不均匀的负载和流量高峰

缓存可以位于客户端(操作系统或浏览器)、服务器端或位于不同的缓存层中

CDN被认为是一种缓存

背压

反向代理和缓存(如varish)可以直接服务于静电和动态内容。Web服务器还可以缓存请求,无需联系应用程序服务器即可返回响应

缺点:异步

您的数据库通常在默认配置中包含某种级别的缓存,针对一般用例进行了优化。针对特定的使用模式调整这些设置可以进一步提高性能

超文本传输协议(HTTP)

内存缓存(如memcached和redis)是应用程序和数据存储之间的键值存储。由于数据保存在RAM中,因此它比数据存储在磁盘上的典型数据库快得多。RAM比磁盘更有限,因此缓存失效算法(如最近最少使用(LRU))可以帮助使“冷”条目无效,并将“热”数据保留在RAM中

传输控制协议(TCP)

Redis具有以下附加功能:

用户数据报协议(UDP)

您可以缓存多个级别,分为两个一般类别:数据库查询和对象:

通常,您应该尽量避免基于文件的缓存,因为这会增加克隆和自动缩放的难度

  • Persistence option
  • Built-in data structures such as sorted sets and lists

无论何时查询数据库,都要将查询作为键进行散列,并将结果存储到缓存中。此方法存在过期问题:

  • Row level
  • Query-level
  • Fully-formed serializable objects
  • Fully-rendered HTML

将数据视为对象,类似于您对应用程序代码所做的操作。让您的应用程序将数据库中的数据集组装成一个类实例或一个或多个数据结构:

远程过程调用(RPC)

缓存内容的建议:

  • Hard to delete a cached result with complex queries
  • If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell

表述性状态转移(睡觉)

由于您只能在缓存中存储有限数量的数据,因此您需要确定哪种缓存更新策略最适合您的用例

  • Remove the object from cache if its underlying data has changed
  • Allows for asynchronous processing: workers assemble objects by consuming the latest cached object

来源:从缓存到内存中数据网格

  • User sessions
  • Fully rendered web pages
  • Activity streams
  • User graph data

远程过程调用与睡觉调用比较

应用程序负责从存储中读取和写入。缓存不直接与存储交互。应用程序执行以下操作:

我在开放源码许可下向您提供此存储库中的代码和资源。因为这是我的个人存储库,您获得的我的代码和资源的许可证来自我,而不是我的雇主(Facebook)

memcached通常以这种方式使用

后续读取添加到高速缓存的数据速度很快。侧缓存也称为惰性加载。仅缓存请求的数据,从而避免使用未请求的数据填满缓存

  • Look for entry in cache, resulting in a cache miss
  • Load entry from the database
  • Add entry to cache
  • Return entry
def get_user(self, user_id):
    user = cache.get("user.{0}", user_id)
    if user is None:
        user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
        if user is not None:
            key = "user.{0}".format(user_id)
            cache.set(key, json.dumps(user))
    return user

来源:可伸缩性、可用性、稳定性、模式

应用程序使用高速缓存作为主数据存储,对其进行读写数据,而高速缓存负责对数据库进行读写:

Disadvantage(s): cache-aside
  • Each cache miss results in three trips, which can cause a noticeable delay.
  • Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
  • When a node fails, it is replaced by a new, empty node, increasing latency.

Write-through

应用程序代码:

缓存代码:

  • Application adds/updates entry in cache
  • Cache synchronously writes entry to data store
  • Return

由于写入操作,直写操作的总体速度较慢,但后续读取刚写入的数据会很快。用户在更新数据时通常比读取数据时更能容忍延迟。缓存中的数据未过时

set_user(12345, {"foo":"bar"})

来源:可伸缩性、可用性、稳定性、模式

def set_user(user_id, values):
    user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
    cache.set(user_id, user)

在Write-Back中,应用程序执行以下操作:

Disadvantage(s): write through
  • When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
  • Most data written might never be read, which can be minimized with a TTL.

Write-behind (write-back)

来源:从缓存到内存中数据网格

您可以将缓存配置为在任何最近访问的缓存条目到期之前自动刷新

  • Add/update entry in cache
  • Asynchronously write entry to the data store, improving write performance
Disadvantage(s): write-behind
  • There could be data loss if the cache goes down prior to its contents hitting the data store.
  • It is more complex to implement write-behind than it is to implement cache-aside or write-through.

Refresh-ahead

如果缓存可以准确预测将来可能需要哪些项目,则与直读相比,提前刷新可以降低延迟

来源:规模架构系统简介

异步工作流有助于减少代价高昂的操作的请求时间,否则这些操作将以内联方式执行。他们还可以通过提前执行耗时的工作来提供帮助,例如定期聚合数据

Disadvantage(s): refresh-ahead
  • Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.

二次幂表

  • Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.
  • Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
  • Need to make application changes such as adding Redis or memcached.

每个程序员都应该知道的延迟数字

安全性

消息队列接收、保存和传递消息。如果操作速度太慢而无法以内联方式执行,则可以将消息队列与以下工作流结合使用:

用户不会被阻止,作业将在后台处理。在此期间,客户端可能会选择性地进行少量处理,使其看起来好像任务已经完成。例如,如果发布一条tweet,该tweet可以立即发布到您的时间表上,但可能需要一段时间才能真正将您的tweet发送给您的所有追随者

其他系统设计面试问题

Redis作为简单的消息代理很有用,但消息可能会丢失

  • An application publishes a job to the queue, then notifies the user of job status
  • A worker picks up the job from the queue, processes it, then signals the job is complete

RabbitMQ很流行,但需要您适应‘AMQP’协议并管理您自己的节点

Amazon SQS是托管的,但可能会有很高的延迟,并且可能会传递两次消息

任务队列接收任务及其相关数据,运行它们,然后交付结果。它们可以支持调度,并可用于在后台运行计算密集型作业

芹菜支持调度,并且主要支持python。

现实世界的建筑

如果队列开始显著增长,队列大小可能会大于内存,从而导致缓存未命中、磁盘读取,甚至会降低性能。反压可以通过限制队列大小来提供帮助,从而为队列中已有的作业保持较高的吞吐率和良好的响应时间。一旦队列填满,客户端就会收到服务器繁忙或HTTP 503状态代码,以便稍后重试。客户端可以稍后重试请求,可能会使用指数回退

来源:OSI 7层模型

公司架构

HTTP是一种在客户端和服务器之间编码和传输数据的方法。它是一种请求/响应协议:客户端发出请求,服务器发出响应,其中包含请求的相关内容和完成状态信息。HTTP是独立的,允许请求和响应流经许多执行负载平衡、缓存、加密和压缩的中间路由器和服务器

公司工程博客

  • Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.

CP-一致性和划分容错

附录

基本HTTP请求由谓词(方法)和资源(端点)组成。以下是常见的HTTP谓词:

AP-可用性和分区容错

*可以多次调用,没有不同的结果

HTTP是依赖于较低级别协议(如TCP和UDP)的应用层协议

Verb Description Idempotent* Safe Cacheable
GET Reads a resource Yes Yes Yes
POST Creates a resource or trigger a process that handles data No No Yes if response contains freshness info
PUT Creates or replace a resource Yes No No
PATCH Partially updates a resource No No Yes if response contains freshness info
DELETE Deletes a resource Yes No No

来源:如何制作多人游戏

TCP是IP网络上的面向连接的协议。使用握手建立和终止连接。所有发送的数据包都保证按原始顺序到达目的地,并且通过以下方式不会损坏:

Source(s) and further reading: HTTP

主动-被动

如果发送方没有收到正确的响应,它将重新发送数据包。如果存在多个超时,则连接将断开。TCP还实施流量控制和拥塞控制。这些保证会导致延迟,并且通常会导致传输效率低于UDP

为了确保高吞吐量,Web服务器可以保持大量TCP连接处于打开状态,从而导致高内存使用率。在web服务器线程和memcached服务器(比方说)之间具有大量打开的连接可能代价高昂。除了在适用的情况下切换到UDP之外,连接池还可以提供帮助

TCP对于要求高可靠性但对时间要求较低的应用程序很有用。一些示例包括Web服务器、数据库信息、SMTP、FTP和SSH

在以下情况下使用UDP上的TCP:

来源:如何制作多人游戏

UDP是无连接的。数据报(类似于数据包)仅在数据报级别得到保证。数据报可能无序到达目的地,也可能根本没有到达目的地。UDP不支持拥塞控制。如果没有TCP支持的保证,UDP通常更有效

  • You need all of the data to arrive intact
  • You want to automatically make a best estimate use of the network throughput

主动-主动

UDP可以广播,向子网中的所有设备发送数据报。这对于DHCP很有用,因为客户端尚未接收到IP地址,因此阻止了TCP在没有IP地址的情况下流式传输

UDP的可靠性较低,但在VoIP、视频聊天、流和实时多人游戏等实时使用案例中运行良好

在以下情况下使用TCP上的UDP:

来源:破解系统设计访谈

在RPC中,客户端导致过程在不同的地址空间(通常是远程服务器)上执行。该过程的编码就好像它是一个本地过程调用,从客户端程序抽象出如何与服务器通信的细节。远程调用通常比本地调用更慢、更不可靠,因此区分RPC调用和本地调用很有帮助。流行的RPC框架包括Protobuf、Thrift和Avro

  • You need the lowest latency
  • Late data is worse than loss of data
  • You want to implement your own error correction

Source(s) and further reading: TCP and UDP

主-从和主-主

RPC是一种请求-响应协议:

示例RPC调用:

RPC专注于公开行为。RPC通常用于内部通信的性能原因,因为您可以手工创建本地调用以更好地适应您的用例

  • Client program – Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.
  • Client stub procedure – Marshals (packs) procedure id and arguments into a request message.
  • Client communication module – OS sends the message from the client to the server.
  • Server communication module – OS passes the incoming packets to the server stub procedure.
  • Server stub procedure – Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.
  • The server response repeats the steps above in reverse order.

在以下情况下选择本机库(也称为SDK):

GET /someoperation?data=anId

POST /anotheroperation
{
  "data":"anId";
  "anotherdata": "another value"
}

睡觉之后的HTTP接口倾向于更常用于公共接口

睡觉是一种实施客户端/服务器模型的体系结构风格,其中客户端作用于由服务器管理的一组资源。服务器提供资源和动作的表示,这些资源和动作既可以操作资源,也可以获得新的资源表示。所有通信必须是无状态和可缓存的

  • You know your target platform.
  • You want to control how your “logic” is accessed.
  • You want to control how error control happens off your library.
  • Performance and end user experience is your primary concern.

REST风格的界面有四个特点:

Disadvantage(s): RPC

  • RPC clients become tightly coupled to the service implementation.
  • A new API must be defined for every new operation or use case.
  • It can be difficult to debug RPC.
  • You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure RPC calls are properly cached on caching servers such as Squid.

99.9%可用性-三个9

睡觉调用示例:

睡觉专注于数据曝光。它最大限度地减少了客户端/服务器之间的耦合,通常用于公共HTTPAPI。睡觉使用一种更通用、更统一的方法,通过URI公开资源,通过头部表示,通过GET、POST、PUT、DELETE和PATCH等动词进行操作。由于是无状态的,睡觉非常适合水平伸缩和分区

  • Identify resources (URI in HTTP) – use the same URI regardless of any operation.
  • Change with representations (Verbs in HTTP) – use verbs, headers, and body.
  • Self-descriptive error message (status response in HTTP) – Use status codes, don’t reinvent the wheel.
  • HATEOAS (HTML interface for HTTP) – your web service should be fully accessible in a browser.

来源:你真的知道为什么你更喜欢睡觉而不是rpc吗?

GET /someresources/anId

PUT /someresources/anId
{"anotherdata": "another value"}

此部分可能需要一些更新。考虑捐款吧!

Disadvantage(s): REST

  • With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
  • REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn’t fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
  • Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
  • Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.

99.99%可用性-四个9

Operation RPC REST
Signup POST /signup POST /persons
Resign POST /resign
{
“personid”: “1234”
}
DELETE /persons/1234
Read a person GET /readPerson?personid=1234 GET /persons/1234
Read a person’s items list GET /readUsersItemsList?personid=1234 GET /persons/1234/items
Add an item to a person’s items POST /addItemToUsersItemsList
{
“personid”: “1234”;
“itemid”: “456”
}
POST /persons/1234/items
{
“itemid”: “456”
}
Update an item POST /modifyItem
{
“itemid”: “456”;
“key”: “value”
}
PUT /items/456
{
“key”: “value”
}
Delete an item POST /removeItem
{
“itemid”: “456”
}
DELETE /items/456

安全是一个广泛的话题。除非你有相当多的经验,有安全背景,或者正在申请一个需要安全知识的职位,否则你可能不需要知道更多的基础知识:

Source(s) and further reading: REST and RPC

正在开发中

有时你会被要求做“粗略”的估算。例如,您可能需要确定从磁盘生成100个图像缩略图需要多长时间,或者一个数据结构需要多少内存。每个程序员都应该知道的两个表的幂和延迟数字是很方便的参考资料

基于以上数字的便捷指标:

  • Encrypt in transit and at rest.
  • Sanitize all user inputs or any input parameters exposed to user to prevent XSS and SQL injection.
  • Use parameterized queries to prevent SQL injection.
  • Use the principle of least privilege.

并行可用性与顺序可用性

学分

缺点:水平缩放

Power           Exact Value         Approx Value        Bytes
---------------------------------------------------------------
7                             128
8                             256
10                           1024   1 thousand           1 KB
16                         65,536                       64 KB
20                      1,048,576   1 million            1 MB
30                  1,073,741,824   1 billion            1 GB
32                  4,294,967,296                        4 GB
40              1,099,511,627,776   1 trillion           1 TB

Source(s) and further reading

主从复制

Latency Comparison Numbers
--------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy            10,000   ns       10 us
Send 1 KB bytes over 1 Gbps network     10,000   ns       10 us
Read 4 KB randomly from SSD*           150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
HDD seek                            10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps  10,000,000   ns   10,000 us   10 ms  40x memory, 10X SSD
Read 1 MB sequentially from HDD     30,000,000   ns   30,000 us   30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

常见的系统设计面试问题,以及有关如何解决每个问题的资源链接

  • Read sequentially from HDD at 30 MB/s
  • Read sequentially from 1 Gbps Ethernet at 100 MB/s
  • Read sequentially from SSD at 1 GB/s
  • Read sequentially from main memory at 4 GB/s
  • 6-7 world-wide round trips per second
  • 2,000 round trips per second within a data center

Latency numbers visualized

有关如何设计真实系统的文章

Source(s) and further reading

主-主复制

来源:Twitter时间表按规模

Question Reference(s)
Design a file sync service like Dropbox youtube.com
Design a search engine like Google queue.acm.org
stackexchange.com
ardendertat.com
stanford.edu
Design a scalable web crawler like Google quora.com
Design Google docs code.google.com
neil.fraser.name
Design a key-value store like Redis slideshare.net
Design a cache system like Memcached slideshare.net
Design a recommendation system like Amazon’s hulu.com
ijcai13.org
Design a tinyurl system like Bitly n00tc0d3r.blogspot.com
Design a chat app like WhatsApp highscalability.com
Design a picture sharing system like Instagram highscalability.com
highscalability.com
Design the Facebook news feed function quora.com
quora.com
slideshare.net
Design the Facebook timeline function facebook.com
highscalability.com
Design the Facebook chat function erlang-factory.com
facebook.com
Design a graph search function like Facebook’s facebook.com
facebook.com
facebook.com
Design a content delivery network like CloudFlare figshare.com
Design a trending topic system like Twitter’s michael-noll.com
snikolov .wordpress.com
Design a random ID generation system blog.twitter.com
github.com
Return the top k requests during a time interval cs.ucsb.edu
wpi.edu
Design a system that serves data from multiple data centers highscalability.com
Design an online multiplayer card game indieflashblog.com
buildnewgames.com
Design a garbage collection system stuffwithstuff.com
washington.edu
Design an API rate limiter https://stripe.com/blog/
Design a Stock Exchange (like NASDAQ or Binance) Jane Street
Golang Implementation
Go Implemenation
Add a system design question Contribute

联盟

在下面的文章中,不要把重点放在具体的细节上,而是:

您面试的公司的架构

您遇到的问题可能来自同一个域

  • Identify shared principles, common technologies, and patterns within these articles
  • Study what problems are solved by each component, where it works, where it doesn’t
  • Review the lessons learned
Type System Reference(s)
Data processing MapReduce – Distributed data processing from Google research.google.com
Data processing Spark – Distributed data processing from Databricks slideshare.net
Data processing Storm – Distributed data processing from Twitter slideshare.net
Data store Bigtable – Distributed column-oriented database from Google harvard.edu
Data store HBase – Open source implementation of Bigtable slideshare.net
Data store Cassandra – Distributed column-oriented database from Facebook slideshare.net
Data store DynamoDB – Document-oriented database from Amazon harvard.edu
Data store MongoDB – Document-oriented database slideshare.net
Data store Spanner – Globally-distributed database from Google research.google.com
Data store Memcached – Distributed memory caching system slideshare.net
Data store Redis – Distributed memory caching system with persistence and value types slideshare.net
File system Google File System (GFS) – Distributed file system research.google.com
File system Hadoop File System (HDFS) – Open source implementation of GFS apache.org
Misc Chubby – Lock service for loosely-coupled distributed systems from Google research.google.com
Misc Dapper – Distributed systems tracing infrastructure research.google.com
Misc Kafka – Pub/sub message queue from LinkedIn slideshare.net
Misc Zookeeper – Centralized infrastructure and services enabling synchronization slideshare.net
Add an architecture Contribute

切分

Company Reference(s)
Amazon Amazon architecture
Cinchcast Producing 1,500 hours of audio every day
DataSift Realtime datamining At 120,000 tweets per second
Dropbox How we’ve scaled Dropbox
ESPN Operating At 100,000 duh nuh nuhs per second
Google Google architecture
Instagram 14 million users, terabytes of photos
What powers Instagram
Justin.tv Justin.Tv’s live video broadcasting architecture
Facebook Scaling memcached at Facebook
TAO: Facebook’s distributed data store for the social graph
Facebook’s photo storage
How Facebook Live Streams To 800,000 Simultaneous Viewers
Flickr Flickr architecture
Mailbox From 0 to one million users in 6 weeks
Netflix A 360 Degree View Of The Entire Netflix Stack
Netflix: What Happens When You Press Play?
Pinterest From 0 To 10s of billions of page views a month
18 million visitors, 10x growth, 12 employees
Playfish 50 million monthly users and growing
PlentyOfFish PlentyOfFish architecture
Salesforce How they handle 1.3 billion transactions a day
Stack Overflow Stack Overflow architecture
TripAdvisor 40M visitors, 200M dynamic page views, 30TB data
Tumblr 15 billion page views a month
Twitter Making Twitter 10000 percent faster
Storing 250 million tweets a day using MySQL
150M active users, 300K QPS, a 22 MB/S firehose
Timelines at scale
Big and small data at Twitter
Operations at Twitter: scaling beyond 100 million users
How Twitter Handles 3,000 Images Per Second
Uber How Uber scales their real-time market platform
Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories
WhatsApp The WhatsApp architecture Facebook bought for $19 billion
YouTube YouTube scalability
YouTube architecture

反规格化

想要添加博客吗?为避免重复工作,请考虑将您的公司博客添加到以下回购中:

有兴趣添加一个部分或帮助完成一个正在进行的部分吗?贡献自己的力量!

Source(s) and further reading

在整个回购过程中提供积分和来源

联系信息

特别感谢:

  • Distributed computing with MapReduce
  • Consistent hashing
  • Scatter gather
  • Contribute

许可证

请随时与我联系,讨论任何问题、问题或评论

我的联系信息可以在我的GitHub页面上找到

了解如何设计大型系统

我在开放源码许可下向您提供此存储库中的代码和资源。因为这是我的个人存储库,您获得的我的代码和资源的许可证来自我,而不是我的雇主(Facebook)

动机

向开源社区学习

Anki抽认卡

Copyright 2017 Donne Martin

Creative Commons Attribution 4.0 International License (CC BY 4.0)

http://creativecommons.org/licenses/by/4.0/