python基础---常用模块（未完待续）-阿里云开发者社区

re模块（正则模块）

正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

\w 匹配字母数字及下划线

\W 匹配非字母数字下划线

\s 匹配任意空白字符，等价于【\t\n\r\f】

\S 匹配任意非空字符

\d 匹配任意数字，等价于【0-9】

\D 匹配任意非数字

\A 匹配字符串

\Z 匹配字符串结束，如果是存在换行，只匹配到换行前的结束字符串

\z 匹配字符串结束

\G 匹配最后匹配完成的位置

\n 匹配一个换行符

\t 匹配一个制表符

^ 匹配字符串的开头

$ 匹配字符串的末尾

. 匹配任意字符，除了换行符，当re.DOTALL标记被指定时，则可以匹配包括换行符的任意字符

[…] 用来表示一组字符，单独列出：【amk】匹配’a’，’m’或‘k’

[^…] 不在[]中的字符

* 匹配0个或多个的表达式

+ 匹配1个或多个的表达式

? 匹配0个或1个由前面的正则表达式定义的片段，非贪婪方式

{n} 精确匹配n个前面表达式

{n,m} 匹配n到m次由前面的正则表达式定义的片段，贪婪方式

a|b 匹配a或b

() 匹配括号内的表达式，也表示一个组

 
  
    
      
      
        import 
        re 
       
 
        print
        (re.findall(
        '\w'
        ,
        'hello_ | egon 123'
        )) 
       
 
        print
        (re.findall(
        '\W'
        ,
        'hello_ | egon 123'
        )) 
       
 
        print
        (re.findall(
        '\s'
        ,
        'hello_ | egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '\S'
        ,
        'hello_ | egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '\d'
        ,
        'hello_ | egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '\D'
        ,
        'hello_ | egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        'h'
        ,
        'hello_ | hello h egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '\Ahe'
        ,
        'hello_ | hello h egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '^he'
        ,
        'hello_ | hello h egon 123 \n \t'
        )) 
       
 
        print
        (re.findall(
        '123\Z'
        ,
        'hello_ | hello h egon 123 \n \t123'
        )) 
       
 
        print
        (re.findall(
        '123$'
        ,
        'hello_ | hello h egon 123 \n \t123'
        )) 
       
 
        print
        (re.findall(
        '\n'
        ,
        'hello_ | hello h egon 123 \n \t123'
        )) 
       
 
        print
        (re.findall(
        '\t'
        ,
        'hello_ | hello h egon 123 \n \t123'
        )) 
       

         
       

        输出：
       
 
        [
        'h'
        , 
        'e'
        , 
        'l'
        , 
        'l'
        , 
        'o'
        , 
        '_'
        , 
        'e'
        , 
        'g'
        , 
        'o'
        , 
        'n'
        , 
        '1'
        , 
        '2'
        , 
        '3'
        ] 
       
 
        [
        ' '
        , 
        '|'
        , 
        ' '
        , 
        ' '
        ] 
       
 
        [
        ' '
        , 
        ' '
        , 
        ' '
        , 
        ' '
        , 
        '\n'
        , 
        ' '
        , 
        '\t'
        ] 
       
 
        [
        'h'
        , 
        'e'
        , 
        'l'
        , 
        'l'
        , 
        'o'
        , 
        '_'
        , 
        '|'
        , 
        'e'
        , 
        'g'
        , 
        'o'
        , 
        'n'
        , 
        '1'
        , 
        '2'
        , 
        '3'
        ] 
       
 
        [
        '1'
        , 
        '2'
        , 
        '3'
        ] 
       
 
        [
        'h'
        , 
        'e'
        , 
        'l'
        , 
        'l'
        , 
        'o'
        , 
        '_'
        , 
        ' '
        , 
        '|'
        , 
        ' '
        , 
        'e'
        , 
        'g'
        , 
        'o'
        , 
        'n'
        , 
        ' '
        , 
        ' '
        , 
        '\n'
        , 
        ' '
        , 
        '\t'
        ] 
       
 
        [
        'h'
        , 
        'h'
        , 
        'h'
        ] 
       
 
        [
        'he'
        ] 
       
 
        [
        'he'
        ] 
       
 
        [
        '123'
        ] 
       
 
        [
        '123'
        ] 
       
 
        [
        '\n'
        ] 
       
 
        [
        '\t'
        ] 
       
 
    

   
 

re模块提供的方法：

re.findall() 查找所有满足匹配条件的结果，放在列表中

re.search()             只找到第一个匹配到的然后返回一个包含匹配信息的对象，该对象可以通过调用group()方法得到匹配的字符串,如果字符串没有匹配，则返回None

re.match()              同search，不过在字符串开始出进行匹配，完全可以使用search+^代替match

re.split()                 按匹配内容对对象进行分割

re.sub()                  替换，（老的值，新的值，替换对象，替换次数），不指定替换次数，默认替换所有

re.subn()                同sub，不过结果中返回替换的次数

re.compile             重用匹配格式

3、time模块

Python中，通常有以下三种方式来计算时间：

a.时间戳：

时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型

b.格式化的时间字符串

c.结构化的时间

struct_time元组共有9个元素:(年，月，日，时，分，秒，一年中第几周，一年中第几天，夏令时)

4、random模块

5、os模块

6、sys模块

7、json和pickle模块（序列化模块）

把对象(变量)从内存中变成可存储或传输的过程称为序列化

在Python中叫pickling，在其他语言中也被称之为serialization，marshalling，flattening等等

序列化的作用：

a.持久保存状态

在断电或重启程序之前将程序当前内存中所有的数据都保存下来（保存到文件中），以便于下次程序执行能够从文件中载入之前的数据，然后继续执行，这就是序列化

b.跨平台数据交互

序列化之后，不仅可以把序列化后的内容写入磁盘，还可以通过网络传输到别的机器上，如果收发的双方约定好实用一种序列化的格式，那么便打破了平台/语言差异化带来的限制，实现了跨平台数据交互。反过来，把变量内容从序列化的对象重新读到内存里称之为反序列化，即unpickling

json模块

如果我们要在不同的编程语言之间传递对象，就必须把对象序列化为标准格式，比如XML，但更好的方法是序列化为JSON，因为JSON表示出来就是一个字符串，可以被所有语言读取，也可以方便地存储到磁盘或者通过网络传输。JSON不仅是标准格式，并且比XML更快，而且可以直接在Web页面中读取，非常方便，所以json适合数据跨平台交互时使用（但是跨平台意味着不会支持某种语言的所有数据类型，如不支持python函数的序列化）

内存中结构化的数据<---> 格式json <--->字符串 <---> 保存到文件中或基于网络传输

使用：

dump 序列化

load 反序列化

 
        import 
        json 
       
        dic
        =
        {
        'name'
        :
        'egon'
        ,
        'age'
        :
        18
        } 
       
        with 
        open
        (
        'a.json'
        ,
        'w'
        ) as f: 
        # 序列化字典到文件内容 
       
        f.write(json.dumps(dic))  
       
        with 
        open
        (
        'a.json'
        ,
        'r'
        ) as f: 
        # 反序列化输出 
       
        data
        =
        f.read() 
       
        dic
        =
        json.loads(data)

dumps 序列化

loads 反序列化

 
  
    
      
      
        import 
        json 
       
 
        dic
        =
        {
        'name'
        :
        'egon'
        ,
        'age'
        :
        18
        } 
       
 
        json.dump(dic,
        open
        (
        'b.json'
        ,
        'w'
        ))     
        # 序列化字典到文件内容 
       
 
        print
        (json.load(
        open
        (
        'b.json'
        ,
        'r'
        ))[
        'name'
        ])  
        # 反序列化输出 
       
 
    

   
 

pickle模块

pickle只能用于Python（所有数据类型），并且可能不同版本的Python彼此都不兼容，因此，只能用Pickle保存那些不重要的数据，不能成功地反序列化也没关系。

内存中结构化的数据<---> 格式pickl<---> bytes类型 <---> 保存到文件中或基于网络传输

dumps 序列化

loads 反序列化

dump 序列化

load 反序列化

 
        import 
        pickle 
       
        dic
        =
        {
        'name'
        :
        'egon'
        ,
        'age'
        :
        18
        } 
       
        with 
        open
        (
        'd.pkl'
        ,
        'wb'
        ) as f:        
        # 序列化字典到文件内容 
       
        f.write(pickle.dumps(dic))  
       
        with 
        open
        (
        'd.pkl'
        ,
        'rb'
        ) as f:        
        # 反序列化输出 
       
        dic
        =
        pickle.loads(f.read())          
       
        print
        (dic[
        'name'
        ])

 
  
    
      
      
        import 
        pickle 
       
 
        dic
        =
        {
        'name'
        :
        'egon'
        ,
        'age'
        :
        18
        } 
       
 
        pickle.dump(dic,
        open
        (
        'e.pkl'
        ,
        'wb'
        ))   
        # 序列化字典到文件内容 
       
 
        print
        (pickle.load(
        open
        (
        'e.pkl'
        ,
        'rb'
        ))[
        'name'
        ]) 
        # 反序列化输出 
       
 
    

   
 

pickle是根据内存地址进行反序列化的，所以该内存地址对应的数据在命名空间中必须是已定义的

8、shelve模块

9、shutil模块

高级的文件、文件夹、压缩包处理模块

常用方法：

将文件内容拷贝到另一个文件中：

shutil.copyfileobj(源文件, 目标文件[, length])

拷贝文件：

shutil.copyfile(src, dst) # 目标文件无需存在

仅拷贝权限。内容、组、用户均不变

shutil.copymode(src, dst) # 目标文件必须存在

仅拷贝状态的信息，包括：mode bits,atime, mtime, flags

shutil.copystat(src, dst) #目标文件必须存在

拷贝文件和权限

shutil.copy(src, dst)

拷贝文件和状态信息

shutil.copy2(src, dst)

递归的去拷贝文件夹

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None) #目标目录不能存在，注意对dst目录父级目录要有可写权限，ignore的意思是排除

拷贝软连接

import shutil

shutil.copytree('f1', 'f2', symlinks=True,ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

通常的拷贝都把软连接拷贝成硬链接，即对待软连接来说，创建新的文件

递归的去删除文件

shutil.rmtree(path[, ignore_errors[,onerror]])

递归的去移动文件，它类似mv命令，其实就是重命名

shutil.move(src, dst)

创建压缩包并返回文件路径，例如：zip、tar

shutil.make_archive(base_name, format,...)

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径
如 data_bak =>保存至当前路径
如 /tmp/data_bak =>保存至/tmp/

format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”

root_dir：要压缩的文件夹路径（默认当前目录）

owner：用户，默认当前用户

group：组，默认当前组

logger：用于记录日志，通常是logging.Logger对象

练习：

 
        #将 /data 下的文件打包放置当前程序目录
       
        import 
        shutil 
       
        ret 
        = 
        shutil.make_archive(
        "data_bak"
        , 
        'gztar'
        , root_dir
        =
        '/data'
        ) 
       
        #将 /data下的文件打包放置 /tmp/目录
       
        import 
        shutil 
       
        ret 
        = 
        shutil.make_archive(
        "/tmp/data_bak"
        , 
        'gztar'
        , root_dir
        =
        '/data'
        )

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

 
        import 
        zipfile 
       
        # 压缩
       
        z 
        = 
        zipfile.ZipFile(
        'laxi.zip'
        , 
        'w'
        ) 
       
        z.write(
        'a.log'
        ) 
       
        z.write(
        'data.data'
        ) 
       
        z.close()
       
        # 解压
       
        z 
        = 
        zipfile.ZipFile(
        'laxi.zip'
        , 
        'r'
        ) 
       
        z.extractall(path
        =
        '.'
        ) 
       
        z.close()
       
        import 
        tarfile 
       
        # 压缩
       
        t
        =
        tarfile.
        open
        (
        '/tmp/egon.tar'
        ,
        'w'
        ) 
       
        t.add(
        '/test1/a.py'
        ,arcname
        =
        'a.bak'
        ) 
       
        t.add(
        '/test1/b.py'
        ,arcname
        =
        'b.bak'
        ) 
       
        t.close()
       
        # 解压
       
        t
        =
        tarfile.
        open
        (
        '/tmp/egon.tar'
        ,
        'r'
        ) 
       
        t.extractall(
        '/egon'
        ) 
       
        t.close()

10、xml模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json功能差不多，但json使用起来更简单，由于比json出现的早，至今很多传统公司如金融行业的很多系统的接口还主要是xml

xml是通过<>节点（标签）来区别数据结构的

 
  
    
      
      
        <?
        xml 
        version
        =
        "1.0"
        ?> 
       
 
        <
        data
        > 
       
 
           
        <
        country 
        name
        =
        "Liechtenstein"
        > 
       
 
               
        <
        rank 
        updated
        =
        "yes"
        >2</
        rank
        > 
       
 
               
        <
        year
        >2008</
        year
        > 
       
 
               
        <
        gdppc
        >141100</
        gdppc
        > 
       
 
               
        <
        neighbor 
        name
        =
        "Austria" 
        direction
        =
        "E"
        /> 
       
 
               
        <
        neighbor 
        name
        =
        "Switzerland" 
        direction
        =
        "W"
        /> 
       
 
           
        </
        country
        > 
       
 
           
        <
        country 
        name
        =
        "Singapore"
        > 
       
 
               
        <
        rank 
        updated
        =
        "yes"
        >5</
        rank
        > 
       
 
               
        <
        year
        >2011</
        year
        > 
       
 
               
        <
        gdppc
        >59900</
        gdppc
        > 
       
 
               
        <
        neighbor 
        name
        =
        "Malaysia" 
        direction
        =
        "N"
        /> 
       
 
           
        </
        country
        > 
       
 
           
        <
        country 
        name
        =
        "Panama"
        > 
       
 
               
        <
        rank 
        updated
        =
        "yes"
        >69</
        rank
        > 
       
 
               
        <
        year
        >2011</
        year
        > 
       
 
               
        <
        gdppc
        >13600</
        gdppc
        > 
       
 
               
        <
        neighbor 
        name
        =
        "Costa Rica" 
        direction
        =
        "W"
        /> 
       
 
               
        <
        neighbor 
        name
        =
        "Colombia" 
        direction
        =
        "E"
        /> 
       
 
           
        </
        country
        > 
       
 
        </
        data
        > 
       
 
    

   
 

对xml进行操作：

 
        import 
        xml.etree.ElementTree as ET    
        #导入模块方法 
       
        tree 
        = 
        ET.parse(
        "xmltest.xml"
        ) 
       
        root 
        = 
        tree.getroot() 
       
        print
        (root.tag) 
       
        #遍历xml文档
       
        for 
        child 
        in 
        root: 
       
        print
        (
        '========>'
        ,child.tag,child.attrib,child.attrib[
        'name'
        ]) 
       
        fori 
        in 
        child: 
       
        print
        (i.tag,i.attrib,i.text) 
       
        #只遍历year 节点
       
        for 
        node 
        in 
        root.
        iter
        (
        'year'
        ): 
       
        print
        (node.tag,node.text) 
       
        #---------------------------------------
       
        import 
        xml.etree.ElementTree as ET 
       
        tree 
        = 
        ET.parse(
        "xmltest.xml"
        ) 
       
        root 
        = 
        tree.getroot() 
       
        #修改
       
        for 
        node 
        in 
        root.
        iter
        (
        'year'
        ): 
       
        new_year
        =
        int
        (node.text)
        +
        1 
       
        node.text
        =
        str
        (new_year) 
       
        node.
        set
        (
        'updated'
        ,
        'yes'
        ) 
       
        node.
        set
        (
        'version'
        ,
        '1.0'
        ) 
       
        tree.write(
        'test.xml'
        ) 
       
        #删除node
       
        for 
        country 
        in 
        root.findall(
        'country'
        ): 
       
        rank 
        = 
        int
        (country.find(
        'rank'
        ).text) 
       
        ifrank > 
        50
        : 
       
        root.remove(country) 
       
        tree.write(
        'output.xml'
        )

11、configparser模块

主要用来解析配置文件

配置文件为以下格式：

[section1]

k1 = v1

k2:v2

user=egon

age=18

is_admin=true

salary=31

[section2]

k1 = v1

操作方法如下：

import configparser # 导入模块

config=configparser.ConfigParser() #使用ConfigParser方法得到一个对象赋值给config

查看标题：

config.sections()

查看标题section1下所有key=value的key

config.options('section1')

查看标题section1下所有key=value的(key,value)格式

config.items('section1')

查看标题section1下user的值，字符串格式

config.get('section1','user')

查看标题section1下age的值，整数格式

val1=config.getint('section1','age')

查看标题section1下is_admin的值，布尔值格式

config.getboolean('section1','is_admin')

查看标题section1下salary的值，浮点型格式

config.getfloat('section1','salary')

删除整个标题section2

config.remove_section('section2')

删除标题section1下的某个k1和k2

config.remove_option('section1','k1')

config.remove_option('section1','k2')

判断是否存在某个标题

config.has_section('section1')

判断标题section1下是否有user

config.has_option('section1','user')

添加一个标题

config.add_section('egon')

在标题egon下添加name=egon,age=18的配置

config.set('egon','name','egon')

config.set('egon','age',18) #报错,必须是字符串

最后将修改的内容写入文件,完成最终的修改

config.write(open('a.cfg','w'))

12、hashlib模块

hash：一种算法 ,3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法
三个特点：
1.内容相同则hash运算结果相同，内容稍微改变则hash值则变
2.不可逆推
3.相同算法：无论校验多长的数据，得到的哈希值长度固定

 
        import 
        hashlib 
       
        m
        =
        hashlib.md5()
        # m=hashlib.sha256() 
       
        m.update(
        'hello'
        .encode(
        'utf8'
        )) 
       
        print
        (m.hexdigest())  
        #5d41402abc4b2a76b9719d911017c592 
       
        m.update(
        'alvin'
        .encode(
        'utf8'
        )) 
       
        print
        (m.hexdigest())  
        #92a7e713c30abbb0319fa07da2a5c4af 
       
        m2
        =
        hashlib.md5() 
       
        m2.update(
        'helloalvin'
        .encode(
        'utf8'
        )) 
       
        print
        (m2.hexdigest()) 
        #92a7e713c30abbb0319fa07da2a5c4af 
       
        '''
       
        注意：把一段很长的数据update多次，与一次update这段长数据，得到的结果一样
       
        但是update多次为校验大文件提供了可能。
       
        '''

以上加密算法虽然依然非常厉害，但时候存在缺陷，即：通过撞库可以反解。所以，有必要对加密算法中添加自定义key再来做加密。

 
        import 
        hashlib 
       
        # ######## 256 ########
       
        hash 
        = 
        hashlib.sha256(
        '898oaFs09f'
        .encode(
        'utf8'
        )) 
       
        hash
        .update(
        'alvin'
        .encode(
        'utf8'
        )) 
       
        print 
        (
        hash
        .hexdigest())
        #e79e68f070cdedcfe63eaf1a2e92c83b4cfb1b5c6bc452d214c1b7e77cdfd1c7 
       
        import 
        hashlib 
       
        passwds
        =
        [ 
       
        'alex3714'
        , 
       
        'alex1313'
        , 
       
        'alex94139413'
        , 
       
        'alex123456'
        , 
       
        '123456alex'
        , 
       
        'a123lex'
        , 
       
        ] 
       
        def 
        make_passwd_dic(passwds): 
       
        dic
        =
        {} 
       
        for 
        passwd inpasswds: 
       
        m
        =
        hashlib.md5() 
       
        m.update(passwd.encode(
        'utf-8'
        )) 
       
        dic[passwd]
        =
        m.hexdigest() 
       
        return 
        dic 
       
        def 
        break_code(cryptograph,passwd_dic): 
       
        for 
        k,v inpasswd_dic.items(): 
       
        if 
        v 
        =
        = 
        cryptograph: 
       
        print
        (
        '密码是===>\033[46m%s\033[0m'
        %
        k) 
       
        cryptograph
        =
        'aee949757a2e698417463d47acac93df' 
       
        break_code(cryptograph,make_passwd_dic(passwds))
       
        python 还有一个 hmac 模块，它内部对我们创建 key 和 内容 进行进一步的处理然后再加密:
       
        import 
        hmac 
       
        h 
        = 
        hmac.new(
        'alvin'
        .encode(
        'utf8'
        )) 
       
        h.update(
        'hello'
        .encode(
        'utf8'
        )) 
       
        print 
        (h.hexdigest())
        #320df9832eab4c038b6c1d7ed73a5940 
       
        #要想保证hmac最终结果一致，必须保证：
       
        #1:hmac.new括号内指定的初始key一样
       
        #2:无论update多少次，校验的内容累加到一起是一样的内容
       
        import 
        hmac 
       
        h1
        =
        hmac.new(b
        'egon'
        ) 
       
        h1.update(b
        'hello'
        ) 
       
        h1.update(b
        'world'
        ) 
       
        print
        (h1.hexdigest()) 
       
        h2
        =
        hmac.new(b
        'egon'
        ) 
       
        h2.update(b
        'helloworld'
        ) 
       
        print
        (h2.hexdigest()) 
       
        h3
        =
        hmac.new(b
        'egonhelloworld'
        ) 
       
        print
        (h3.hexdigest()) 
       
        '''
       
        f1bf38d054691688f89dcd34ac3c27f2
       
        f1bf38d054691688f89dcd34ac3c27f2
       
        bcca84edd9eeb86f30539922b28f3981
       
        '''

5.subprocess模块

在python解释器中开启一个子进程执行shell命令

stdout 标准正确输出 # 输出内容为bytes类型，如果在windows输出需要解码为decode（‘gbk’），linux解码为decode（‘utf-8’）

stderr 标准错误输出

stdin 标准输入

shell=True 使用shell命令

subprocess.PIPE 把输出结果放到管道

res1=subprocess.Popen('ls/Users/jieli/Desktop',shell=True,stdout=subprocess.PIPE)

# 先列出桌面上的文件

subprocess.Popen('grep txt$',shell=True,stdin=res1.stdout,stdout=subprocess.PIPE)

# 把上面的数据交给这条命令作为输入结果，过滤以txt结尾的文件

本文转自lyndon博客51CTO博客，原文链接http://blog.51cto.com/lyndon/1955312如需转载请自行联系原作者

迟到的栋子

python基础---常用模块（未完待续）

热门文章

最新文章

相关课程

相关电子书

相关实验场景