An Introduction to Asynchronous Programming and Twisted (1)

简介:

之前看的时候, 总觉得思路不是很清晰, 其实Dave在这个模型问题上没有说清楚, 参考同步和异步, 阻塞和非阻塞, Reactor和Proactor

对于阻塞一定是同步的, 但是反之不一定, 对于多线程本质上也是阻塞的方式, 只不过是多个线程一起阻塞, 适用于CPU密集型的任务, 因为事情总要人做的, 无论什么模型都不能让做事情的实际时间变少. 
对于非阻塞, 节省的是等待的时间, 所以适用于I/O密集型任务, 因为I/O往往需要等待 
Dave谈的异步就是广义的异步, 其实是非阻塞同步 
而Twisted解决的问题就是非阻塞同步问题, 即Reactor模型, 更进一步就是使用select, poll, epoll等系统调用监控一组I/O, 并通知应用端的demultiplexor(扮演Reactor角色)

而真正的非阻塞异步, 由于需要操作系统特别的支持, 其实很少被真正使用, 因为Unix并不支持...

Twisted解决什么问题? 
提供一种基于Reactor模型的编程框架, 对socket和select做封装, 让用户只需要通过定义callback和Defered就可以实现 
但需要明白的是, 他能够做的事情和使用底层socket, select去实现的版本没有区别, 没有特别神奇的... 
Twisetd的封装做的确实非常好, 我觉得学习他的最大好处是体会封装的思路, 从底层的socket,select –> Reactor –> Protocol, Transport –> Callback –> Defered 
但问题也在这儿, 封装太多, 导致如果直接使用, 你会不知道你自己到达在干什么, 会很疑惑, 导致学习成本和曲线比较高 
如果我学习socket和select, 比学习Twisted更容易的话, 我为什么要去学习Twisted, 毕竟使用者都是程序员, 而不是最终不懂code的用户 

http://krondo.com/?page_id=1327  Twisted Introduction (Dave Peticolas) 
http://blog.sina.com.cn/s/blog_704b6af70100py9n.html  中文

 

Part 1: In Which We Begin at the Beginning

Preface

Twisted is a highly abstracted system and this gives you tremendous leverage when you use it to solve problems. But when you are learning Twisted, and particularly when you are trying to understand how Twisted actually works, the many levels of abstraction can cause troubles. 
Much of the challenge does not stem from Twisted per se, but rather in the acquisition of the “mental model” required to write and understand asynchronous code.

Twisted挺难学和理解的, 这个不是来自他本身, Twisted本身的代码和文档都是写的不错的, 主要因为大家缺乏这个mental model. 所以下面来讲这个models.

The Models

The first model we will look at is the single-threaded synchronous model, in Figure 1 below:

image

We can contrast the synchronous model with another one, the threaded model illustrated in Figure 2:

image

The threads are managed by the operating system and may, on a system with multiple processors or multiple cores, run truly concurrently, or may be interleaved together on a single processor. The point is, in the threaded model the details of execution are handled by the OS and the programmer simply thinks in terms of independent instruction streams which may run simultaneously. Thread communication and coordination is an advanced programming topic and can be difficult to get right.

Now we can introduce the asynchronous model in Figure 3:

image

Differences between the asynchronous and threaded models: 
In asynchronous model, the tasks are interleaved with one another, but in a single thread of control . This is simpler than the threaded case because the programmer always knows that when one task is executing, another task is not. 
对于异步模式, 任务是必定会被interleave 成多个任务片段的, 但是这些interleave是在single thread 中被执行的, 所以不存在多个interleave都是run的情况, 这里就不用考虑多线程的同步和通信的问题 (这个问题往往是比较复杂的). 并且interleave的切换是用programmer 控制的.

In threaded model, 任务也有可能会被interleaved (单核cup), 但对于multi-cup, 就不会被interleaved, 这才是真正的threaded model. 并且多个任务是同时执行的, 必须考虑同步和通信问题. 最后任务的切换和调度是又OS 决定的, programmer管不了.

The Motivation

We’ve seen that the asynchronous model is simpler than the threaded one because there is a single instruction streamand tasks explicitly relinquish control instead of being suspended arbitrarily. But the asynchronous model is clearly more complex than the synchronous case. Since there is no actual parallelism, it appears from our diagrams that an asynchronous program will take just as long to execute as a synchronous one, perhaps longer as the asynchronous program might exhibit poorer locality of reference .

异步模式因为是单条指令流和任务用户可控, 所以比线程模式简单. 但他明显比阻塞模式复杂, 更长实际执行时间, 更差的locality of reference.
那么我们为什么要用异步模式?

Compared to the synchronous model, the asynchronous model performs best when:

  1. There are a large number of tasks so there is likely always at least one task that can make progress.
  2. The tasks perform lots of I/O , causing a synchronous program to waste lots of time blocking when other tasks could be running.
  3. The tasks are largely independent from one another so there is little need for inter-task communication (and thus for one task to wait upon another).

These conditions almost perfectly characterize a typical busy network server (like a web server) in a client-server environment. Each task represents one client request with I/O in the form of receiving the request and sending the reply. And client requests (being mostly reads) are largely independent. So a network server implementation is a prime candidate for the asynchronous model and this is why Twisted is first and foremost a networking library.

象我前篇写的, 其实I/O bound比较适合异步, CPU bound比较适合线程. 所以异步模式最适合的场景就是Web server, 需要同时处理多个client requests, 每个request相对独立, 大量的I/O操作.

Part 2: Slow Poetry and the Apocalypse

这章会用基本的python socket模块来编写同步和异步模式, 这样能更好的理解reactor设计模式.

Slow Poetry

实现一个提供poetry服务的服务器, 当你访问就返回给你某首诗. 为了能看到过程中间加了sleep, 所以是slow poetry.

首先是block的server, blocking-server/slowpoetry.py 

复制代码
 
 
1 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
2 sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 )
3 sock.bind((options.iface, options.port or 0))
4 sock.listen( 5 )
5 while True:
6 sock, addr = listen_socket.accept()
7 send_poetry(sock, poetry_file, num_bytes, delay)
复制代码
    

典型的block server, bind到一个端口, listen该端口(系统会开线程侦听该端口, connection队列上限设为5) 
在循环中, 不断用accept取得connection, 并且处理, 这儿就是发送poetry

The Blocking Client

Now we can use the blocking client in blocking-client/get-poetry.py

复制代码
 
 
1 for i, address in enumerate(addresses):
2 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
3 sock.connect(address)
4 while True:
5 bytes = sock.recv( 1024 )
6 if not bytes:
7 sock.close()
8 break
9 poem += bytes
复制代码

典型的block client, 如果对于多个服务器, one by one, connect, 一直读到读不到数据, 然后再处理下个服务器. 这个体现了Part1里面的 synchronous model  .

The Asynchronous Client

Now let’s take a look at a simple asynchronous client written without Twisted, in async-client/get-poetry.py

复制代码
 
 
1 sockets = map(connect, addresses)
2 while sockets:
3 # this select call blocks until one or more of the
4   # sockets is ready for read I/O
5   rlist, _, _ = select.select(sockets, [], []) # rlist is the list of sockets with data ready to read
6   for sock in rlist:
7 bytes = ''
8 while True:
9 try :
10 bytes += sock.recv( 1024 )
11 if not bytes:
12 break
13 except socket.error, e:
14 if e.args[0] == errno.EWOULDBLOCK:
15 # this error code means we would have
16   # blocked if the socket was blocking.
17 # instead we skip to the next socket
18 break
19 raise
20 if not bytes:
21 sockets.remove(sock)
22 sock.close()
23 poems[sock] += bytes
复制代码

先同时connect上多个端口, 过程和block一样, 只不过需要sock.setblocking(0), 将socket设为异步. 
然后最关键的就是对select.select 的调用, 这个函数详细参见前篇. 这个也是对Twisted的模拟, 都是通过select, poll系统调用来完成对多个端口的异步侦听. 
后面的逻辑, 就是一次有多少读多少, 然后继续侦听, 一直到读完为止...

The core of the asynchronous client is the top-level loop . This loop can be broken down into steps:

  1. Wait (block) on all open sockets using select until one (or more) sockets has data to be read.
  2. For each socket with data to be read, read it, but only as much as is available now. Don’t block .
  3. Repeat, until all sockets have been closed.

同时侦听多个端口, 哪个端口有数据, 就先处理哪一个, 有多少数据就处理多少, 这个体现了Don't block. 这个体现了Part1里面的asynchronous model.

Reactor Pattern

This use of a loop which waits for events to happen, and then handles them, is so common that it has achieved the status of a design pattern: the reactor pattern .

image

The loop is a “reactor” because it waits for and then reacts to events. For that reason it is also known as an event loop . And since reactive systems are often waiting on I/O, these loops are also sometimes called select loops , since the select call is used to wait for I/O.

A real implementation of the reactor pattern would implement the loop as a separate abstraction with the ability to:

  1. Accept a set of file descriptors you are interested in performing I/O with.
  2. Tell you, repeatedly, when any file descriptors are ready for I/O.

And a really good implementation of the reactor pattern would also:

  1. Handle all the weird corner cases that crop up on different systems.
  2. Provide lots of nice abstractions to help you use the reactor with the least amount of effort.
  3. Provide implementations of public protocols that you can use out of the box.

Well that's just what Twisted is — a robust, cross-platform implementation of the Reactor Pattern with lots of extras.

 

Part 3: Our Eye-beams Begin to Twist

Doing Nothing, the Twisted Way

The absolute simplest Twisted program is listed below, and is also available in basic-twisted/simple.py

from twisted.internet import reactor 
reactor.run()

Twisted is an implementation of the Reactor Pattern and thus contains an object that represents the reactor, or event loop, that is the heart of any Twisted program. 而run就是开启这个heart的钥匙...... 
This program just sits there doing nothing . Note that this is not a busy loop which keeps cycling over and over. If you happen to have a CPU meter on your screen, you won’t see any spikes caused by this technically infinite loop. In fact, our program isn't using any CPU at all . Instead, the reactor is stuck at the top cycle of Figure 5, waiting for an event that will never come (to be specific, waiting on a select call with no file descriptors). 这是用select这种系统调用的最大的优点, 原因见前篇对select的介绍......

We're about to make it more interesting, but we can already draw a few conclusions :

  1. Twisted’s reactor loop doesn’t start until told to. You start it by calling reactor.run() .
  2. The reactor loop runs in the same thread it was started in. In this case, it runs in the main (and only) thread .
  3. Once the loop starts up, it just keeps going. The reactor is now “in control” of the program (or the specific thread it was started in).
  4. If it doesn’t have anything to do, the reactor loop does not consume CPU .
  5. The reactor isn’t created explicitly, just imported .

In Twisted, the reactor is basically a Singleton . There is only one reactor object and it is created implicitly when you import it. If you open the reactor module in the twisted.internet package you will find very little code. The actual implementation resides in other files (starting with twisted.internet.selectreactor ). 
当然除了select, 还有其他的reactor, 选择的依据和切换方法参见http://twistedmatrix.com/documents/current/core/howto/choosing-reactor.html

Hello, Twisted

Let's make a Twisted program that at least does something .

from twisted.internet import reactor 
reactor.callWhenRunning(hello) 
reactor.run()

这个在上面的例子里面加了一行代码, 为WhenRunning事件定义一个callback 'hello', 这儿只是注册callback, 不会真正执行hello. 
当调用run()以后, 触发WhenRunning事件, reactor捕捉到该事件后会调用已注册的callback 'hello' 
We use the term callback to describe the reference to the hello function. A callback is a function reference that we give to Twisted (or any other framework) that Twisted will use to “call us back” at the appropriate time, in this case right after the reactor loop starts up.

Twisted is not the only reactor framework that uses callbacks. The older asynchronous Python frameworks Medusa andasyncore also use them. As do the GUI toolkits GTK and QT , both based, like many GUI frameworks, on a reactor loop. 
reactor和callback是一种典型的异步设计模式, 被用于很多异步framework, Twisted只是其中之一

Figure 6 shows what happens during a callback:

image

Figure 6 illustrates some important properties of callbacks:

  1. Our callback code runs in the same thread as the Twisted loop .
  2. When our callbacks are running, the Twisted loop is not running.
  3. And vice versa.
  4. The reactor loop resumes when our callback returns.

During a callback, the Twisted loop is effectively “blocked” on our code. So we should make sure our callback code doesn’t waste any time. 
这个图强调了只有一个thread, 执行callback的时候就无法监听event  
并且Reactor Loop和callback(bussiness logic)是分离的, Twisted只负责Reactor部分, bussiness logic用户必须通过callback的形式来实现 
由于Twisted Loop是被'blocked', 当我们在执行callback时, 所以必须保证callback不浪费任何时间, 而且这一点必须由用户(callback编写者)来保证

实际情况中, 不可能所有的callback都马上返回的, I/O操作, 文件读写, 数据库访问等等都是很耗时的, 肿么办...... 
Twisted提供了很多异步API用于封装这些耗时的操作, 你通过调用Twisted API就可以较好的解决这个问题, 具体后面会讨论.

Goodbye, Twisted

It turns out you can tell the Twisted reactor to stop running by using the reactor's stop method. But once stopped the reactor cannot be restarted, so it’s generally something you do only when your program needs to exit.

这节更重要的是讲了calllater的用法, reactor.callLater(1 , self .count) 
This program uses the callLater API to register a callback with Twisted. With callLater the callback is the second argument and the first argument is the number of seconds in the future you would like your callback to run.

So how does Twisted arrange to execute the callback at the right time? 
这是个自然的问题, reactor loop监控一组file descriptors, 并产生相应的事件. 那么怎样触发这种'几秒之后'的事件了. 
答案, The select call, and the others like it, also accepts an optional timeout value. If a timeout value is supplied and no file descriptors have become ready for I/O within the specified time then the select call will return anyway. You can think of atimeout as another kind of event the event loop of Figure 5 is waiting for. And Twisted uses timeouts to make sure any “timed callbacks” registered with callLater get called at the right time. 
就是说select的timeout本身也是event, reactor捕捉到这个timeout event就会去check, 当前时间是否有timed callbacks需要被触发. 我的疑问是那么这个timeout应该设多少, 应该不能设一个定值, 难道是设成最近的calllater需要触发的时间和当前时间的差值......

Twisted’s callLater mechanism cannot provide the sort of guarantees required in a hard real-time system. 
而且这个calllater只是个近似的, 不能确保这个function在n秒后被执行,  If another callback takes a really long time to execute, a timed callback may be delayed past its schedule.

Take That, Twisted

Since Twisted often ends up calling our code in the form of callbacks, you might wonder what happens when a callback raises an exception. 
答案是, 你不用怕, callback的异常不会导致twisted loop的crash, twsited会print出错误, 并继续执行. 这个是理所当然的......

 

Part 4: Twisted Poetry

Our First Twisted Client

Although Twisted is probably more often used to write servers, clients are simpler than servers and we’re starting out as simply as possible. Let’s try out our first poetry client written with Twisted. The source code is in twisted-client-1/get-poetry.py . 
这个例子是和上面的asynchronous client 实现完全相同的功能, 只不过是用low-level的twisted API进行一步封装, 为了深入浅出的理解twisted, 所以先用low-level的封装, 后面再用high-level的API.

复制代码
 
 
1 sockets = [PoetrySocket(i + 1 , addr) for i, addr in enumerate(addresses)]
2 from twisted.internet import reactor
3 reactor.run()
4 class PoetrySocket(object):
5 poem = ''
6 def __init__ (self, task_num, address):
7 self.task_num = task_num
8 self.address = address
9 self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
10 self.sock.connect(address)
11 # self.sock.setblocking(0) we don't set non-blocking -- broken!
12 # tell the Twisted reactor to monitor this socket for reading
13 from twisted.internet import reactor
14 reactor.addReader(self)
15 def connectionLost(self, reason):
16 self.sock.close()
17 # stop monitoring this socket
18 from twisted.internet import reactor
19 reactor.removeReader(self)
20 # see if there are any poetry sockets left
21 for reader in reactor.getReaders():
22 if isinstance(reader, PoetrySocket):
23 return
24 reactor.stop() # no more poetry
25 def doRead(self):
26 poem = ''
27 while True: # we're just reading everything (blocking) -- broken!
28 bytes = self.sock.recv( 1024 )
29 if not bytes:
30 break
31 poem += bytes
32 msg = ' Task %d: got %d bytes of poetry from %s '
33 print msg % (self.task_num, len(poem), self.format_addr())
34 self.poem = poem
35 return main.CONNECTION_DONE
复制代码

可以先去回顾一下上面asynchronous client 的代码, 核心的步骤有3个, 
1. 建立到各个server的socket connections, 在PoetrySocket.__init__()中实现 
2. select loop同时侦听所有sockets端口, 返回有数据的端口, 没有直接实现, 被reactor封装了 
3. 处理数据, 当发现数据接受完毕关闭socket, 在PoetrySocket.doRead()和connectionLost()中实现 
可见Twisted通过reactor封装了select loop和事件触发callback函数的过程, 程序员只需要使用reactor.addReader(self)将提供callback函数如doRead的对象注册到reactor中, 就可以简单的完成asynchronous client

Twisted Interfaces

There are a number of sub-modules in Twisted called interfaces . Each one defines a set of Interface classes. As of version 8.0, Twisted uses zope.interface as the basis for those classes, but the details of that package aren't so important for us. We're just concerned with the Interface sub-classes in Twisted itself, like the ones you are looking at now.

One of the principle purposes of Interfaces is documentation . As a Python programmer you are doubtless familiar withDuck Typing , the notion that the type of an object is principally defined not by its position in a class hierarchy but by thepublic interface it presents to the world. Thus two objects which present the same public interface (i.e., walk like a duck, quack like a …) are, as far as duck typing is concerned, the same sort of thing (a duck!). Well an Interface is a somewhat formalized way of specifying just what it means to walk like a duck.

如果要理解Twisted封装, 必须首先了解这个zope.interface (http://apidoc.zope.org/++apidoc++/Book/ifaceschema/interface/show.html ), 这儿没详细说, 感兴趣的可以去研究一下. 
其实对于Java和C++这种强类型语言, 对象的Type完全取决于class hierarchy 的位置, 不能随意改变的. 但对于Python这样的语言, 类接口在运行时, 是可以动态改变的, 所以无法定死对象的类型, 于是有Duck Typing的说法, 就是说你现在的接口看起来象个鸭子, 那我们就认为你现在属于鸭子类.   
同时Zope定义的interface和python官方的ABC(抽象基类)很相似, 那么有人比较了两者认为interface好些, Explaining Why Interfaces Are Great, 
http://glyph.twistedmatrix.com/2009/02/explaining-why-interfaces-are-great.html , 有兴趣的可以自己看看, 这儿就摘录了总结, 如下 
In a super-dynamic language like Python, you don't need a system for explicit abstract interfaces.  No compiler is going to shoot you for calling a 'foo' method.  But, formal interface definitions serve many purposes.  They can function asdocumentation , as a touchstone for code which wants to clearly report programming errors ("warning:  MyWidget claims to implement IWidget, but doesn't implement a 'doWidgetStuff' method"), and a mechanism for indirection when you know what contract your code wants but you don't know what implementation will necessarily satisfy it (adaptation ). 
上面说了对于python这样的动态的(弱类型的)语言, 似乎并不需要这样的一个explicit abstract interfaces, 因为类接口可以随时变的, 你象加啥都可以随时加的. 
但是formal的interface的定义在python中还是有意义的, 首先是作为documentation, 便于工程师学习和维护代码. 第二是作为一种touchstone, zope.interface module allows you to explicitly declare that a class implements one or more interfaces, and provides mechanisms to examine these declarations at run-time. 第三是作为间接的机制, adaption模式, the ability to dynamically provide a given interface for an object that might not support that interface directly.

下面就来看看, reactor怎样来进行封装的, 从addReader(self)入手,  It is declared in the IReactorFDSet Interface, IReactorFDSet is one of the Interfaces that Twisted reactors implement. Thus, any Twisted reactor has a method called addReader that works as described by the docstring above. 这个接口就是用来管理需要侦听的端口集合的, File Description Set, 通过addReader来增加对象(包含端口信息和相关callback)

复制代码
 
 
1 def addReader(reader):
2 """
3 I add reader to the set of file descriptors to get read events for.
4 @param reader: An L{IReadDescriptor} provider that will be checked for
5 read events until it is removed from the reactor with
6 L{removeReader}.
7 @return: C{None}.
8 """
复制代码

可见在interface中的函数, 和一般的类函数是有不同的, 只有注释段, 并且参数中没有加上self, 其实因为不需要, 没有人会直接实例化interface. 
从上面的可知, reader必须要support IReadDescriptor, 对于上面的code就是说self(PoetrySocket )必须要support

复制代码
 
 
1 IReadDescriptor
2 class IReadDescriptor(IFileDescriptor):
3
4 def doRead():
5 """
6 Some data is available for reading on your descriptor.
7 """
复制代码


And you will find an implementation of doRead on our PoetrySocket objects. It reads data from the socket asynchronously, whenever it is called by the Twisted reactor. So doRead is really a callback, but instead of passing it directly to Twisted, we pass in an object with a doRead method. This is a common idiom in the Twisted framework — instead of passing a function you pass an object that must implement a given Interface . This allows us to pass a set of related callbacks (the methods defined by the Interface) with a single argument. It also lets the callbacks communicate with each other through shared state stored on the object. 
上面就比较清楚的解释的Twisted框架的callback机制, 传递包含interface的对象, 而非直接传递函数给reactor. 
同时你注意到, IReadDescriptor继承自IFileDescriptor, 所以在PoetrySocket 还要实现如connectionLost等函数.

Part 5: Twistier Poetry

Abstract Expressionism

The Twisted framework is loosely composed of layers of abstractions and learning Twisted means learning what those layers provide, i.e, what APIs, Interfaces, and implementations are available for use in each one. 
The most important thing to keep in mind when learning a new Twisted abstraction is this:

Most higher-level abstractions in Twisted are built by using lower-level ones, not by replacing them.

Twisted框架是由多个抽象层松散组成的, 越底层越容易理解, 但不方便使用, 越高层越方便使用, 但越难理解. 所以学习Twisted还是要从底层学起.  
上一节学习了最底层的abstraction (1.0), 当然这个是有很多问题的, 这节要来介绍更高层一点的抽象 (2.0). 
Moving to higher-level abstractions generally means writing less code (and letting Twisted handle the platform-dependent corner cases).

Loopiness in the Brain

The most important abstraction we have learned so far, indeed the most important abstraction in Twisted, is the reactor . Much of the rest of Twisted, in fact, can be thought of as “stuff that makes it easier to do X using the reactor” where X might be “serve a web page” or “make a database query” or some other specific feature. 
在Twisted最重要的抽象就是Reactor, 理解了Reactor就理解了Twisted的精髓, 剩下的模块都是为了使各种任务更容易的使用reactor模式. 在实际使用Twisted时, 由于抽象程度很高, 程序员只要关心真正的callback逻辑, 往往会忽略了Reactor, 这样是无法写出优秀的Twisted代码的. 
When you choose to use Twisted you are also choosing to use the Reactor Pattern , and that means programming in the “reactive style” using callbacks and cooperative multi-tasking. If you want to use Twisted correctly, you have to keep the reactor’s existence (and the way it works) in mind.

Before we dive into the code, there are three new abstractions to introduce: Transports, Protocols, and Protocol Factories.

Transports

The Transport abstraction is defined by ITransport in the main Twisted interfaces module.The Transport abstraction represents any such connection( TCP ,  UDP sockets,  UNIX Pipes  ) and handles the details of asynchronous I/O for whatever sort of connection it represents.

Protocols

Twisted Protocols are defined by IProtocol in the same interfaces module. As you might expect, Protocol objects implementprotocols . That is to say, a particular implementation of a Twisted Protocol should implement one specific networking protocol, like FTP or IMAP or some nameless protocol we invent for our own purposes. 
Each instance of a Twisted Protocol object implements a protocol for one specific connection. 协议和connection一一对应的, 这个很好理解, 建立connection后, 要指明接受和发送数据的协议是啥.Twisted实现了大量的协议in twisted.protocols.basic , 所以在编写自己特有的协议前可以去查找一下, 也许有现成的.

Protocol Factories

So each connection needs its own Protocol and that Protocol might be an instance of a class we implement ourselves. Since we will let Twisted handle creating the connections, Twisted needs a way to make the appropriate Protocol “on demand” whenever a new connection is made. Making Protocol instances is the job of Protocol Factories.

The Protocol Factory API is defined by IProtocolFactory , also in the interfaces module. Protocol Factories are an example of the Factory design pattern and they work in a straightforward way. The buildProtocol method is supposed to return a new Protocol instance each time it is called. This is the method that Twisted uses to make a new Protocol for each new connection. 
用了设计模式的工厂方法, 根据不同的情况为connection生成不同的protocol实例

Get Poetry 2.0: First Blood.0

Alright, let's take a look at version 2.0 of the Twisted poetry client.

复制代码
 
 
1 class PoetryProtocol(Protocol):
2 poem = ''
3 def dataReceived(self, data):
4 self.poem += data
5 def connectionLost(self, reason):
6 self.poemReceived(self.poem)
7 def poemReceived(self, poem):
8 self.factory.poem_finished(poem)
9 class PoetryClientFactory(ClientFactory):
10 protocol = PoetryProtocol # tell base class what proto to build
11 def __init__ (self, poetry_count):
12 self.poetry_count = poetry_count
13 self.poems = []
14 def poem_finished(self, poem = None):
15 if poem is not None:
16 self.poems.append(poem)
17 self.poetry_count -= 1
18 if self.poetry_count == 0:
19 self.report()
20 from twisted.internet import reactor
21 reactor.stop()
22 def report(self):
23 for poem in self.poems:
24 print poem
25 def clientConnectionFailed(self, connector, reason):
26 print ' Failed to connect to: ' , connector.getDestination()
27 self.poem_finished()
28 def poetry_main():
29 addresses = parse_args()
30 factory = PoetryClientFactory(len(addresses))
31 from twisted.internet import reactor
32 for address in addresses:
33 host, port = address
34 reactor.connectTCP(host, port, factory)
35 reactor.run()
复制代码

这就是用上面几个新介绍的类实现的更高抽象级别的client, 那么在Twisted中high-level抽象级别的对象是通过调用low-level级别的对象实现的, 所以这边我们就来看看和猜猜, 2.0版本怎么样实现的, 原作者没这样写, 所以下面我写的个人看法, 不一定对.

image

首先在2.0版本中已经完全的看不到直接的socket操作, 也没有addReader, removeReader等操作, 其实想想这些code对于每个client而言都是一样的, 是可以重用的, 没有必要每次都写一遍. 真正需要开发者开发的就是那几个callback, 比如doRead, connectionLost 
那么作为一个框架的设计者应该怎么做, 把所有公共的东西都封装起来, 只让用户关心他应该关心的code, 让开发者写尽量少的code. 这个就是一种抽象, 抽象层次的提高, 就是把更多的代码写到框架内, 减轻开发者的工作量. 
那么框架开发者来写PoetrySocket 类, 比如称做connection类, 里面完成connet socket, addReader等common的code, 但它还必须要实现IReadDescriptor的接口, 比如doRead, 这个框架开发者是无法写死的. 怎么办? 
所以引入Protocol对象, 把所有和具体业务逻辑相关的callback都封装在Protocol对象里面, 在connection类中只需要调用对应protocol对象的固定接口. 
如上图, socket ready --> reader.doRead() --> protocol.dataReceived(), 通过调用protocol.dataReceived()来完成真正的处理工作. 
这样connection类就变成完全common的类, 可以放到框架中, 而对应用开发者完全透明. 
其实还差一步, 现在的connection类还不是完全common的类, 为什么? 
问题在于, 生成Protocol对象, 使用前必须要创建对象, 而对于不同的情况需要创建的protocol对象明显是不同的. 
如果你在connection类直接写, new PoetryProtocol(), 那么这块code就需要每次修改以变换protocol名. 
针对这个问题, 典型的设计模式是工厂方法, 把具体创建对象的过程封装在工厂类里面, 而在真正需要创建对象的地方使用工厂类的统一接口, 如Factory.buildProtocol() 
而具体开发者, 需要提供相应protocol的工厂类来用于实现对象创建.  
好, 到这一步, 就算是完成了对connection类的封装. 
现在我们再来看这段代码, 就很清楚了, 首先开发者需要定义protocol类和工厂类. 
接着开发者需要编写的代码只有这么一句, reactor.connectTCP(host, port, factory) 
看似简单, 其实内涵丰富, 远远不止名字所表明的connectTCP, 应该包括如下活动(我猜的)  
1. 通过factory创建protocol对象 
2. 创建支持IReadDescriptor的连接对象, 并通过调用protocol来实现callback, 如doRead 
3. 建立socket连接, 这里会通过抽象的transport对象来实现, 而不会直接使用socket 
4. 通过addReader把这个连接对象加到侦听队列


本文章摘自博客园,原文发布日期:2011-07-05

目录
相关文章
|
3月前
|
SQL 数据库连接 数据库
Python3 notes
Python3 notes
|
测试技术 API SDN
书籍:python网络编程 Python Network Programming - 2019
简介 主要特点 掌握Python技能,开发强大的网络应用程序 掌握SDN的基本原理和功能 为echo和chat服务器设计多线程,事件驱动的体系结构 此学习路径强调了Python网络编程的主要方面,例如编写简单的网络客户端,创建和部署SDN和NFV系统,以及使用Mininet扩展您的网络。
|
Python
Python 入门教程 14 ---- Practice Makes Perfect
 第一节      1 介绍了Python的一种内置方法type(x),用来判断x的类型,比如type(5)是int,type("asf")是string等等      2 练习:写一个函数为is_int,x作为参数,判断x是否为整数,但是要注意的是如果x的小数点全部为0那么x也认为是整数比如7.
840 0
|
8天前
|
人工智能 自然语言处理 API
深入浅出LangChain与智能Agent:构建下一代AI助手
LangChain为大型语言模型提供了一种全新的搭建和集成方式,通过这个强大的框架,我们可以将复杂的技术任务简化,让创意和创新更加易于实现。本文从LangChain是什么到LangChain的实际案例到智能体的快速发展做了全面的讲解。
279547 52
深入浅出LangChain与智能Agent:构建下一代AI助手
|
9天前
|
设计模式 人工智能 JSON
一文掌握大模型提示词技巧:从战略到战术
本文将用通俗易懂的语言,带你从战略(宏观)和战术(微观)两个层次掌握大模型提示词的常见技巧,真正做到理论和实践相结合,占领 AI 运用的先机。
237787 4
|
9天前
|
NoSQL Cloud Native Redis
Redis核心开发者的新征程:阿里云与Valkey社区的技术融合与创新
阿里云瑶池数据库团队后续将持续参与Valkey社区,如过往在Redis社区一样耕耘,为开源社区作出持续贡献。
Redis核心开发者的新征程:阿里云与Valkey社区的技术融合与创新
|
9天前
|
关系型数据库 分布式数据库 数据库
PolarDB闪电助攻,《香肠派对》百亿好友关系实现毫秒级查询
PolarDB分布式版助力《香肠派对》实现百亿好友关系20万QPS的毫秒级查询。
PolarDB闪电助攻,《香肠派对》百亿好友关系实现毫秒级查询
|
3天前
|
机器人 Linux API
基于Ollama+AnythingLLM轻松打造本地大模型知识库
Ollama是开源工具,简化了在本地运行大型语言模型(ile优化模型运行,支持GPU使用和热加载。它轻量、易用,可在Mac和Linux上通过Docker快速部署。AnythingLLM是Mintplex Labs的文档聊天机器人,支持多用户、多种文档格式,提供对话和查询模式,内置向量数据库,可高效管理大模型和文档。它也是开源的,能与Ollama结合使用,提供安全、低成本的LLM体验。这两款工具旨在促进本地高效利用和管理LLMs。
93662 19