标签归档:python-2.7

在iPython Notebook中进行调试的正确方法是什么?

问题:在iPython Notebook中进行调试的正确方法是什么?

我所知, %debug magic可以在一个单元内进行调试。

但是,我有跨多个单元格的函数调用。

例如,

In[1]: def fun1(a)
           def fun2(b)
               # I want to set a breakpoint for the following line #
               return do_some_thing_about(b)

       return fun2(a)

In[2]: import multiprocessing as mp
       pool=mp.Pool(processes=2)
       results=pool.map(fun1, 1.0)
       pool.close()
       pool.join

我试过的

  1. 我试图%debug在cell-1的第一行中设置。但是它甚至在执行单元2之前就立即进入调试模式。

  2. 我试图%debug在代码之前添加该行return do_some_thing_about(b)。但是,代码将永远运行,永远不会停止。

在ipython笔记本中设置断点的正确方法是什么?

As I know, %debug magic can do debug within one cell.

However, I have function calls across multiple cells.

For example,

In[1]: def fun1(a)
           def fun2(b)
               # I want to set a breakpoint for the following line #
               return do_some_thing_about(b)

       return fun2(a)

In[2]: import multiprocessing as mp
       pool=mp.Pool(processes=2)
       results=pool.map(fun1, 1.0)
       pool.close()
       pool.join

What I tried:

  1. I tried to set %debug in the first line of cell-1. But it enter into debug mode immediately, even before executing cell-2.

  2. I tried to add %debug in the line right before the code return do_some_thing_about(b). But then the code runs forever, never stops.

What is the right way to set a break point within the ipython notebook?


回答 0

使用ipdb

通过安装

pip install ipdb

用法:

In[1]: def fun1(a):
   def fun2(a):
       import ipdb; ipdb.set_trace() # debugging starts here
       return do_some_thing_about(b)
   return fun2(a)
In[2]: fun1(1)

用于逐行执行n和进入功能使用,s并退出调试提示使用c

有关可用命令的完整列表:https : //appletree.or.kr/quick_reference_cards/Python/Python%20Debugger%20Cheatsheet.pdf

Use ipdb

Install it via

pip install ipdb

Usage:

In[1]: def fun1(a):
   def fun2(a):
       import ipdb; ipdb.set_trace() # debugging starts here
       return do_some_thing_about(b)
   return fun2(a)
In[2]: fun1(1)

For executing line by line use n and for step into a function use s and to exit from debugging prompt use c.

For complete list of available commands: https://appletree.or.kr/quick_reference_cards/Python/Python%20Debugger%20Cheatsheet.pdf


回答 1

您可以ipdb在jupyter内部使用以下命令:

from IPython.core.debugger import Tracer; Tracer()()

编辑:自IPython 5.1起,不推荐使用上述功能。这是新方法:

from IPython.core.debugger import set_trace

set_trace()在需要断点的地方添加。键入help用于ipdb命令输入字段出现时。

You can use ipdb inside jupyter with:

from IPython.core.debugger import Tracer; Tracer()()

Edit: the functions above are deprecated since IPython 5.1. This is the new approach:

from IPython.core.debugger import set_trace

Add set_trace() where you need a breakpoint. Type help for ipdb commands when the input field appears.


回答 2

您的返回函数位于def函数(主函数)的行中,您必须给它一个制表符。和使用

%%debug 

代替

%debug 

调试整个单元而不仅仅是行。希望这可能对您有帮助。

Your return function is in line of def function(main function), you must give one tab to it. And Use

%%debug 

instead of

%debug 

to debug the whole cell not only line. Hope, maybe this will help you.


回答 3

您始终可以在任何单元格中添加它:

import pdb; pdb.set_trace()

调试器将在该行停止。例如:

In[1]: def fun1(a):
           def fun2(a):
               import pdb; pdb.set_trace() # debugging starts here
           return fun2(a)

In[2]: fun1(1)

You can always add this in any cell:

import pdb; pdb.set_trace()

and the debugger will stop on that line. For example:

In[1]: def fun1(a):
           def fun2(a):
               import pdb; pdb.set_trace() # debugging starts here
           return fun2(a)

In[2]: fun1(1)

回答 4

在Python 3.7中,您可以使用breakpoint()函数。只需输入

breakpoint()

无论您想在哪里停止运行时,都可以使用相同的pdb命令(r,c,n,…)或评估变量。

In Python 3.7 you can use breakpoint() function. Just enter

breakpoint()

wherever you would like runtime to stop and from there you can use the same pdb commands (r, c, n, …) or evaluate your variables.


回答 5

只需键入import pdb在jupyter笔记本,然后用这个的cheatsheet调试。非常方便

c->继续,s->步进,b 12->在第12行设置断点,依此类推。

一些有用的链接: pdb上的Python官方文档Python pdb调试器示例,以更好地了解如何使用调试器命令

一些有用的屏幕截图:

Just type import pdb in jupyter notebook, and then use this cheatsheet to debug. It’s very convenient.

c –> continue, s –> step, b 12 –> set break point at line 12 and so on.

Some useful links: Python Official Document on pdb, Python pdb debugger examples for better understanding how to use the debugger commands.

Some useful screenshots:


回答 6

得到错误后,在下一个单元格中运行%debug,仅此而已。

After you get an error, in the next cell just run %debug and that’s it.


回答 7

%pdb魔术的命令是很好用为好。只需说一遍%pdb on,随后pdb调试器将在所有异常上运行,无论调用堆栈中有多深。非常便利。

如果您有要调试的特定行,只需在此处引发一个异常(通常您已经!),或使用%debug其他人一直在建议的magic命令。

The %pdb magic command is good to use as well. Just say %pdb on and subsequently the pdb debugger will run on all exceptions, no matter how deep in the call stack. Very handy.

If you have a particular line that you want to debug, just raise an exception there (often you already are!) or use the %debug magic command that other folks have been suggesting.


回答 8

我刚刚发现了PixieDebugger。甚至以为我还没有时间进行测试,这似乎确实是调试我们在ipdb中使用ipython的方式的最相似方法

它还有一个“评估”标签

I just discovered PixieDebugger. Even thought I have not yet had the time to test it, it really seems the most similar way to debug the way we’re used in ipython with ipdb

It also has an “evaluate” tab


回答 9

提供了本机调试器作为JupyterLab的扩展。可以在几周前发布,可以通过获取相关扩展以及xeus-python内核(尤其是没有ipykernel用户众所周知的魔术)来安装它:

jupyter labextension install @jupyterlab/debugger
conda install xeus-python -c conda-forge

这样可以实现其他IDE众所周知的可视化调试体验。

来源:Jupyter的可视调试器

A native debugger is being made available as an extension to JupyterLab. Released a few weeks ago, this can be installed by getting the relevant extension, as well as xeus-python kernel (which notably comes without the magics well-known to ipykernel users):

jupyter labextension install @jupyterlab/debugger
conda install xeus-python -c conda-forge

This enables a visual debugging experience well-known from other IDEs.

Source: A visual debugger for Jupyter


如何在不覆盖数据的情况下(使用熊猫)写入现有的excel文件?

问题:如何在不覆盖数据的情况下(使用熊猫)写入现有的excel文件?

我使用熊猫以以下方式写入excel文件:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx已经包含许多不同的选项卡。但是,它尚未包含“ Main”。

熊猫正确地写入了“主要”表,不幸的是,它也删除了所有其他标签。

I use pandas to write to excel file in the following fashion:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx already consists of number of different tabs. However, it does not yet contain “Main”.

Pandas correctly writes to “Main” sheet, unfortunately it also deletes all other tabs.


回答 0

熊猫文档说,它对xlsx文件使用openpyxl。快速浏览一下其中的代码ExcelWriter可以提示可能会发生以下情况:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

回答 1

这是一个辅助函数:

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None
    """
    from openpyxl import load_workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

注意:对于<0.21.0的熊猫,请替换sheet_namesheetname

用法示例:

append_df_to_excel('d:/temp/test.xlsx', df)

append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)

Here is a helper function:

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    from openpyxl import load_workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)
        
        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)
        
        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()
            

NOTE: for Pandas < 0.21.0, replace sheet_name with sheetname!

Usage examples:

append_df_to_excel('d:/temp/test.xlsx', df)

append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)

回答 2

使用openpyxlversion 2.4.0pandasversion 0.19.2,@ ski提出的过程变得更加简单:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process @ski came up with gets a bit simpler:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

回答 3

从pandas 0.24开始,您可以使用mode关键字参数简化此操作ExcelWriter

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter:

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

回答 4

老问题了,但我猜有些人还在搜索这个-所以…

我发现此方法不错,因为所有工作表都加载到工作表名称和数据框对的字典中,该字典由熊猫使用sheetname = None选项创建。在将电子表格读取为dict格式并将其从dict写回之前,添加,删除或修改工作表很简单。对于我来说,就速度和格式而言,xlsxwriter在执行此特定任务方面比openpyxl更好。

注意:未来版本的熊猫(0.21.0+)将把“ sheetname”参数更改为“ sheet_name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

对于2013年问题中的示例:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

Old question, but I am guessing some people still search for this – so…

I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option. It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict. For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.

Note: future versions of pandas (0.21.0+) will change the “sheetname” parameter to “sheet_name”.

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

For the example in the 2013 question:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

回答 5

我知道这是一个较旧的线程,但这是您在搜索时发现的第一项,并且如果需要将图表保留在已创建的工作簿中,则上述解决方案将不起作用。在这种情况下,xlwings是一个更好的选择-它允许您写入Excel书并保留图表/图表数据。

简单的例子:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

I know this is an older thread, but this is the first item you find when searching, and the above solutions don’t work if you need to retain charts in a workbook that you already have created. In that case, xlwings is a better option – it allows you to write to the excel book and keeps the charts/chart data.

simple example:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

回答 6

在pandas 0.24中有一个更好的解决方案:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

之前:

后:

因此,立即升级您的熊猫:

pip install --upgrade pandas

There is a better solution in pandas 0.24:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

before:

after:

so upgrade your pandas now:

pip install --upgrade pandas

回答 7

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

这非常完美,只有主文件(添加新工作表的文件)的格式丢失了。

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.


回答 8

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“ keep_date_col”希望对您有所帮助

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

The “keep_date_col” hope help you


回答 9

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

Python UTC日期时间对象的ISO格式不包含Z(Zulu或零偏移)

问题:Python UTC日期时间对象的ISO格式不包含Z(Zulu或零偏移)

为什么python 2.7不像JavaScript那样在UTC日期时间对象的isoformat字符串的末尾不包含Z字符(Zulu或零偏移)?

>>> datetime.datetime.utcnow().isoformat()
'2013-10-29T09:14:03.895210'

而在javascript中

>>>  console.log(new Date().toISOString()); 
2013-10-29T09:38:41.341Z

Why python 2.7 doesn’t include Z character (Zulu or zero offset) at the end of UTC datetime object’s isoformat string unlike JavaScript?

>>> datetime.datetime.utcnow().isoformat()
'2013-10-29T09:14:03.895210'

Whereas in javascript

>>>  console.log(new Date().toISOString()); 
2013-10-29T09:38:41.341Z

回答 0

Python datetime对象默认没有时区信息,没有它,Python实际上违反了ISO 8601规范(如果未提供时区信息,则假定为本地时间)。您可以使用pytz包获取一些默认时区,或者直接tzinfo自己子类化:

from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
    def tzname(self,**kwargs):
        return "UTC"
    def utcoffset(self, dt):
        return timedelta(0)

然后,您可以将时区信息手动添加到utcnow()

>>> datetime.utcnow().replace(tzinfo=simple_utc()).isoformat()
'2014-05-16T22:51:53.015001+00:00'

请注意,此DOES符合ISO 8601格式,该格式允许Z+00:00作为UTC的后缀。请注意,后者实际上更好地符合了标准,并以一般方式表示时区(UTC是一种特例)。

Python datetime objects don’t have time zone info by default, and without it, Python actually violates the ISO 8601 specification (if no time zone info is given, assumed to be local time). You can use the pytz package to get some default time zones, or directly subclass tzinfo yourself:

from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
    def tzname(self,**kwargs):
        return "UTC"
    def utcoffset(self, dt):
        return timedelta(0)

Then you can manually add the time zone info to utcnow():

>>> datetime.utcnow().replace(tzinfo=simple_utc()).isoformat()
'2014-05-16T22:51:53.015001+00:00'

Note that this DOES conform to the ISO 8601 format, which allows for either Z or +00:00 as the suffix for UTC. Note that the latter actually conforms to the standard better, with how time zones are represented in general (UTC is a special case.)


回答 1

选项: isoformat()

Python datetime不支持军事时区后缀,例如UTC的’Z’后缀。以下简单的字符串替换可以解决问题:

In [1]: import datetime

In [2]: d = datetime.datetime(2014, 12, 10, 12, 0, 0)

In [3]: str(d).replace('+00:00', 'Z')
Out[3]: '2014-12-10 12:00:00Z'

str(d) 基本上与 d.isoformat(sep=' ')

请参阅:日期时间,Python标准库

选项: strftime()

或者您可以使用strftime以达到相同的效果:

In [4]: d.strftime('%Y-%m-%d %H:%M:%SZ')
Out[4]: '2014-12-10 12:00:00Z'

注意:仅当您知道指定的日期为UTC时,此选项才有效。

请参阅:datetime.strftime()


附加:人类可读的时区

更进一步,您可能对显示人类可读的时区信息感兴趣,pytz带有strftime %Z时区标志:

In [5]: import pytz

In [6]: d = datetime.datetime(2014, 12, 10, 12, 0, 0, tzinfo=pytz.utc)

In [7]: d
Out[7]: datetime.datetime(2014, 12, 10, 12, 0, tzinfo=<UTC>)

In [8]: d.strftime('%Y-%m-%d %H:%M:%S %Z')
Out[8]: '2014-12-10 12:00:00 UTC'

Option: isoformat()

Python’s datetime does not support the military timezone suffixes like ‘Z’ suffix for UTC. The following simple string replacement does the trick:

In [1]: import datetime

In [2]: d = datetime.datetime(2014, 12, 10, 12, 0, 0)

In [3]: str(d).replace('+00:00', 'Z')
Out[3]: '2014-12-10 12:00:00Z'

str(d) is essentially the same as d.isoformat(sep=' ')

See: Datetime, Python Standard Library

Option: strftime()

Or you could use strftime to achieve the same effect:

In [4]: d.strftime('%Y-%m-%d %H:%M:%SZ')
Out[4]: '2014-12-10 12:00:00Z'

Note: This option works only when you know the date specified is in UTC.

See: datetime.strftime()


Additional: Human Readable Timezone

Going further, you may be interested in displaying human readable timezone information, pytz with strftime %Z timezone flag:

In [5]: import pytz

In [6]: d = datetime.datetime(2014, 12, 10, 12, 0, 0, tzinfo=pytz.utc)

In [7]: d
Out[7]: datetime.datetime(2014, 12, 10, 12, 0, tzinfo=<UTC>)

In [8]: d.strftime('%Y-%m-%d %H:%M:%S %Z')
Out[8]: '2014-12-10 12:00:00 UTC'

回答 2

以下javascript和python脚本提供相同的输出。我认为这就是您要寻找的。

的JavaScript

new Date().toISOString()

Python

import datetime

datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

他们给出的输出是utc(zelda)时间,格式为ISO字符串,有效数字为3毫秒,并附加Z。

2019-01-19T23:20:25.459Z

The following javascript and python scripts give identical outputs. I think it’s what you are looking for.

JavaScript

new Date().toISOString()

Python

from datetime import datetime

datetime.utcnow().isoformat()[:-3]+'Z'

The output they give is the utc (zelda) time formatted as an ISO string with a 3 millisecond significant digit and appended with a Z.

2019-01-19T23:20:25.459Z

回答 3

在Python> = 3.2中,您可以简单地使用以下代码:

>>> from datetime import datetime, timezone
>>> datetime.now(timezone.utc).isoformat()
'2019-03-14T07:55:36.979511+00:00'

In Python >= 3.2 you can simply use this:

>>> from datetime import datetime, timezone
>>> datetime.now(timezone.utc).isoformat()
'2019-03-14T07:55:36.979511+00:00'

回答 4

Python日期时间有点笨拙。使用arrow

> str(arrow.utcnow())
'2014-05-17T01:18:47.944126+00:00'

Arrow具有与datetime基本上相同的api,但是具有时区和一些其他优点,应该在主库中提供。

与Javascript兼容的格式可以通过以下方式实现:

arrow.utcnow().isoformat().replace("+00:00", "Z")
'2018-11-30T02:46:40.714281Z'

Javascript Date.parse将从时间戳悄然下降微秒。

Python datetimes are a little clunky. Use arrow.

> str(arrow.utcnow())
'2014-05-17T01:18:47.944126+00:00'

Arrow has essentially the same api as datetime, but with timezones and some extra niceties that should be in the main library.

A format compatible with Javascript can be achieved by:

arrow.utcnow().isoformat().replace("+00:00", "Z")
'2018-11-30T02:46:40.714281Z'

Javascript Date.parse will quietly drop microseconds from the timestamp.


回答 5

帖子上有很多不错的答案,但我希望格式能像JavaScript一样准确。这就是我正在使用的并且效果很好。

In [1]: import datetime

In [1]: now = datetime.datetime.utcnow()

In [1]: now.strftime('%Y-%m-%dT%H:%M:%S') + now.strftime('.%f')[:4] + 'Z'
Out[3]: '2018-10-16T13:18:34.856Z'

There are a lot of good answers on the post, but I wanted the format to come out exactly as it does with JavaScript. This is what I’m using and it works well.

In [1]: import datetime

In [1]: now = datetime.datetime.utcnow()

In [1]: now.strftime('%Y-%m-%dT%H:%M:%S') + now.strftime('.%f')[:4] + 'Z'
Out[3]: '2018-10-16T13:18:34.856Z'

回答 6

pip install python-dateutil
>>> a = "2019-06-27T02:14:49.443814497Z"
>>> dateutil.parser.parse(a)
datetime.datetime(2019, 6, 27, 2, 14, 49, 443814, tzinfo=tzutc())
pip install python-dateutil
>>> a = "2019-06-27T02:14:49.443814497Z"
>>> dateutil.parser.parse(a)
datetime.datetime(2019, 6, 27, 2, 14, 49, 443814, tzinfo=tzutc())

回答 7

我通过一些目标解决了这个问题:

  • 生成ISO 8601格式的UTC“感知”日期时间字符串
  • 使用只有 Python标准库函数DateTime对象和字符串创建
  • 使用Django timezone实用程序功能和dateutil解析器验证datetime对象和字符串
  • 使用JavaScript函数来验证ISO 8601日期时间字符串是否支持UTC

该方法包含Z后缀,并且不使用utcnow(),但是它基于Python文档中建议,并且通过Django和JavaScript传递了标记。

让我们从传递UTC时区对象开始,datetime.now()而不是使用datetime.utcnow()

from datetime import datetime, timezone

datetime.now(timezone.utc)
>>> datetime.datetime(2020, 1, 8, 6, 6, 24, 260810, tzinfo=datetime.timezone.utc)

datetime.now(timezone.utc).isoformat()
>>> '2020-01-08T06:07:04.492045+00:00'

看起来不错,所以让我们看看Django和dateutil思考:

from django.utils.timezone import is_aware

is_aware(datetime.now(timezone.utc))
>>> True

import dateutil.parser

is_aware(dateutil.parser.parse(datetime.now(timezone.utc).isoformat()))
>>> True

好的,Python日期时间对象和ISO 8601字符串都是UTC“可识别的”。现在,让我们看一下JavaScript对datetime字符串的看法。从这个答案中我们得到:

let date= '2020-01-08T06:07:04.492045+00:00';
const dateParsed = new Date(Date.parse(date))

document.write(dateParsed);
document.write("\n");
// Tue Jan 07 2020 22:07:04 GMT-0800 (Pacific Standard Time)

document.write(dateParsed.toISOString());
document.write("\n");
// 2020-01-08T06:07:04.492Z

document.write(dateParsed.toUTCString());
document.write("\n");
// Wed, 08 Jan 2020 06:07:04 GMT

您可能还想阅读此博客文章

Pass a UTC timezone object to datetime.now() instead of using datetime.utcnow():

from datetime import datetime, timezone

datetime.now(timezone.utc)
>>> datetime.datetime(2020, 1, 8, 6, 6, 24, 260810, tzinfo=datetime.timezone.utc)

datetime.now(timezone.utc).isoformat()
>>> '2020-01-08T06:07:04.492045+00:00'

That looks good, so let’s see what Django and dateutil think:

from django.utils.timezone import is_aware

is_aware(datetime.now(timezone.utc))
>>> True

import dateutil.parser

is_aware(dateutil.parser.parse(datetime.now(timezone.utc).isoformat()))
>>> True

Okay, the Python datetime object and the ISO 8601 string are UTC “aware”. Now let’s look at what JavaScript thinks of the datetime string. Borrowing from this answer we get:

let date= '2020-01-08T06:07:04.492045+00:00';
const dateParsed = new Date(Date.parse(date))

document.write(dateParsed);
document.write("\n");
// Tue Jan 07 2020 22:07:04 GMT-0800 (Pacific Standard Time)

document.write(dateParsed.toISOString());
document.write("\n");
// 2020-01-08T06:07:04.492Z

document.write(dateParsed.toUTCString());
document.write("\n");
// Wed, 08 Jan 2020 06:07:04 GMT

Notes:

I approached this problem with a few goals:

  • generate a UTC “aware” datetime string in ISO 8601 format
  • use only Python Standard Library functions for datetime object and string creation
  • validate the datetime object and string with the Django timezone utility function and the dateutil parser
  • use JavaScript functions to validate that the ISO 8601 datetime string is UTC aware

Note that this approach does not include a Z suffix and does not use utcnow(). But it’s based on the recommendation in the Python documentation and it passes muster with both Django and JavaScript.

You may also want to read this blog post.


回答 8

通过结合以上所有答案,我得到了以下功能:

from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
    def tzname(self,**kwargs):
        return "UTC"
    def utcoffset(self, dt):
        return timedelta(0)


def getdata(yy, mm, dd, h, m, s) :
    d = datetime(yy, mm, dd, h, m, s)
    d = d.replace(tzinfo=simple_utc()).isoformat()
    d = str(d).replace('+00:00', 'Z')
    return d


print getdata(2018, 02, 03, 15, 0, 14)

By combining all answers above I came with following function :

from datetime import datetime, tzinfo, timedelta
class simple_utc(tzinfo):
    def tzname(self,**kwargs):
        return "UTC"
    def utcoffset(self, dt):
        return timedelta(0)


def getdata(yy, mm, dd, h, m, s) :
    d = datetime(yy, mm, dd, h, m, s)
    d = d.replace(tzinfo=simple_utc()).isoformat()
    d = str(d).replace('+00:00', 'Z')
    return d


print getdata(2018, 02, 03, 15, 0, 14)

回答 9

我使用摆锤:

import pendulum


d = pendulum.now("UTC").to_iso8601_string()
print(d)

>>> 2019-10-30T00:11:21.818265Z

I use pendulum:

import pendulum


d = pendulum.now("UTC").to_iso8601_string()
print(d)

>>> 2019-10-30T00:11:21.818265Z

回答 10

>>> import arrow

>>> now = arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS')
>>> now
'2018-11-28T21:34:59.235'
>>> zulu = "{}Z".format(now)
>>> zulu
'2018-11-28T21:34:59.235Z'

或者,一口气获得它:

>>> zulu = "{}Z".format(arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS'))
>>> zulu
'2018-11-28T21:54:49.639Z'
>>> import arrow

>>> now = arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS')
>>> now
'2018-11-28T21:34:59.235'
>>> zulu = "{}Z".format(now)
>>> zulu
'2018-11-28T21:34:59.235Z'

Or, to get it in one fell swoop:

>>> zulu = "{}Z".format(arrow.utcnow().format('YYYY-MM-DDTHH:mm:ss.SSS'))
>>> zulu
'2018-11-28T21:54:49.639Z'

Python-‘ASCII’编解码器无法解码字节

问题:Python-‘ASCII’编解码器无法解码字节

我真的很困惑 我尝试编码,但错误提示can't decode...

>>> "你好".encode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

我知道如何避免在字符串上加上“ u”前缀的错误。我只是想知道为什么在调用编码时错误是“无法解码”的。Python到底是做什么的?

I’m really confused. I tried to encode but the error said can't decode....

>>> "你好".encode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

I know how to avoid the error with “u” prefix on the string. I’m just wondering why the error is “can’t decode” when encode was called. What is Python doing under the hood?


回答 0

"你好".encode('utf-8')

encode将unicode对象转换为string对象。但是这里您已经在string对象上调用了它(因为您没有u)。因此python必须先将转换stringunicode对象。所以它相当于

"你好".decode().encode('utf-8')

但是解码失败,因为字符串无效的ascii。这就是为什么您会抱怨无法解码的原因。

"你好".encode('utf-8')

encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of

"你好".decode().encode('utf-8')

But the decode fails because the string isn’t valid ascii. That’s why you get a complaint about not being able to decode.


回答 1

始终从unicode 编码为字节。
在这个方向上,您可以选择encoding

>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
你好

另一种方法是从字节解码为unicode。
在这个方向上,您必须知道什么是编码

>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
你好
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
你好

这一点压力不够大。如果您想避免播放unicode的“打hack鼠”游戏,那么了解数据级别的情况很重要。这里说明了另一种方式:

  • 一个unicode对象已经被解码了,您再也不想调用decode它了。
  • 一个字节串对象已经被编码,您永远不想调用encode它。

现在,在看到.encode字节字符串时,Python 2首先尝试将其隐式转换为文本(unicode对象)。同样,在看到.decodeunicode字符串时,Python 2隐式尝试将其转换为字节(str对象)。

这些隐式转换是您调用时可以得到的原因。这是因为编码通常接受type的参数;接收参数时,在使用另一种编码对它进行重新编码之前,会对类型进行隐式解码。此转换选择默认的“ ascii”解码器,给您编码器内部的解码错误。UnicodeDecodeErrorencodeunicodestrunicode

事实上,在Python 3的方法str.decodebytes.encode甚至不存在。为了避免这种常见的混淆,将它们移除是一个有争议的尝试。

…或任何编码sys.getdefaultencoding()提及的内容;通常这是“ ascii”

Always encode from unicode to bytes.
In this direction, you get to choose the encoding.

>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
你好

The other way is to decode from bytes to unicode.
In this direction, you have to know what the encoding is.

>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
你好
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
你好

This point can’t be stressed enough. If you want to avoid playing unicode “whack-a-mole”, it’s important to understand what’s happening at the data level. Here it is explained another way:

  • A unicode object is decoded already, you never want to call decode on it.
  • A bytestring object is encoded already, you never want to call encode on it.

Now, on seeing .encode on a byte string, Python 2 first tries to implicitly convert it to text (a unicode object). Similarly, on seeing .decode on a unicode string, Python 2 implicitly tries to convert it to bytes (a str object).

These implicit conversions are why you can get UnicodeDecodeError when you’ve called encode. It’s because encoding usually accepts a parameter of type unicode; when receiving a str parameter, there’s an implicit decoding into an object of type unicode before re-encoding it with another encoding. This conversion chooses a default ‘ascii’ decoder, giving you the decoding error inside an encoder.

In fact, in Python 3 the methods str.decode and bytes.encode don’t even exist. Their removal was a [controversial] attempt to avoid this common confusion.

…or whatever coding sys.getdefaultencoding() mentions; usually this is ‘ascii’


回答 2

你可以试试这个

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

要么

您也可以尝试关注

在.py文件顶部添加以下行。

# -*- coding: utf-8 -*- 

You can try this

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

Or

You can also try following

Add following line at top of your .py file.

# -*- coding: utf-8 -*- 

回答 3

如果您使用的是Python <3,则需要在解释器前面加上一个前缀,u以告知您的字符串文字是Unicode

Python 2.7.2 (default, Jan 14 2012, 23:14:09) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "你好".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

进一步阅读Unicode HOWTO

If you’re using Python < 3, you’ll need to tell the interpreter that your string literal is Unicode by prefixing it with a u:

Python 2.7.2 (default, Jan 14 2012, 23:14:09) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "你好".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

Further reading: Unicode HOWTO.


回答 4

您用于u"你好".encode('utf8')编码unicode字符串。但是,如果要表示"你好",则应该对其进行解码。就像:

"你好".decode("utf8")

您将得到想要的东西。也许您应该了解有关编码和解码的更多信息。

You use u"你好".encode('utf8') to encode an unicode string. But if you want to represent "你好", you should decode it. Just like:

"你好".decode("utf8")

You will get what you want. Maybe you should learn more about encode & decode.


回答 5

如果您要处理Unicode,有时可以用代替encode('utf-8'),也可以忽略特殊字符,例如

"你好".encode('ascii','ignore')

或如something.decode('unicode_escape').encode('ascii','ignore')这里所建议

在此示例中不是特别有用,但是在无法转换某些特殊字符的其他情况下可以更好地工作。

或者,您可以考虑使用替换特定字符replace()

In case you’re dealing with Unicode, sometimes instead of encode('utf-8'), you can also try to ignore the special characters, e.g.

"你好".encode('ascii','ignore')

or as something.decode('unicode_escape').encode('ascii','ignore') as suggested here.

Not particularly useful in this example, but can work better in other scenarios when it’s not possible to convert some special characters.

Alternatively you can consider replacing particular character using replace().


回答 6

如果要从Linux或类似系统(BSD,不确定Mac)上的外壳启动python解释器,则还应检查外壳的默认编码。

locale charmap从shell 调用(不是python解释器),您应该看到

[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

如果不是这种情况,您会看到其他情况,例如

[user@host dir] $ locale charmap
ANSI_X3.4-1968
[user@host dir] $ 

Python将(至少在某些情况下,例如在我的情况下)继承外壳程序的编码,并且将无法打印(某些?全部?)unicode字符。Python的默认编码,你看,并通过控制sys.getdefaultencoding()sys.setdefaultencoding()在这种情况下被忽略。

如果发现此问题,可以通过以下方法解决

[user@host dir] $ export LC_CTYPE="en_EN.UTF-8"
[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

(或选择要使用的键盘映射而不是en_EN。)您也可以编辑/etc/locale.conf(或控制系统中区域设置的文件)。

If you are starting the python interpreter from a shell on Linux or similar systems (BSD, not sure about Mac), you should also check the default encoding for the shell.

Call locale charmap from the shell (not the python interpreter) and you should see

[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

If this is not the case, and you see something else, e.g.

[user@host dir] $ locale charmap
ANSI_X3.4-1968
[user@host dir] $ 

Python will (at least in some cases such as in mine) inherit the shell’s encoding and will not be able to print (some? all?) unicode characters. Python’s own default encoding that you see and control via sys.getdefaultencoding() and sys.setdefaultencoding() is in this case ignored.

If you find that you have this problem, you can fix that by

[user@host dir] $ export LC_CTYPE="en_EN.UTF-8"
[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

(Or alternatively choose whichever keymap you want instead of en_EN.) You can also edit /etc/locale.conf (or whichever file governs the locale definition in your system) to correct this.


pip或pip3为Python 3安装软件包?

问题:pip或pip3为Python 3安装软件包?

我有一台配备OS X El Captain的Macbook。我认为那Python 2.7是预装的。但是,我也安装Python 3.5了。开始使用时Python 3,我读到要安装软件包,应输入:

pip3 install some_package

无论如何,现在当我使用

pip install some_package

some_package安装了Python 3。我的意思是我可以导入它,并且可以毫无问题地使用它。而且,当我只pip3输入时Terminal,我得到以下关于用法的消息:

Usage:   
  pip <command> [options]

这是我输入just时得到的相同信息pip

这是否意味着在previos版本,事情是不同的,现在pippip3可以互换使用?如果是这样,并且为了参数起见,我该如何安装软件包Python 2而不是Python 3

I have a Macbook with OS X El Captain. I think that Python 2.7 comes preinstalled on it. However, I installed Python 3.5 too. When I started using Python 3, I read that if I want to install a package, I should type:

pip3 install some_package

Anyway, now when I use

pip install some_package

I get some_package installed for Python 3. I mean I can import it and use it without problems. Moreover, when I type just pip3 in Terminal, I got this message about the usage:

Usage:   
  pip <command> [options]

which is the same message I get when I type just pip.

Does it mean that in previos versions, things were different, and now pip and pip3 can be used interchangeably? If so, and for the sake of argument, how can I install packages for Python 2 instead of Python 3?


回答 0

pip是使用链接到同一可执行文件路径的软链接pip3。您可以使用以下命令来检查您的真实路径pippip3实际路径:

$ ls -l `which pip`
$ ls -l `which pip3`

您也可以使用以下命令了解更多详细信息:

$ pip show pip
$ pip3 show pip

当我们安装不同版本的python时,我们可能会创建以下软链接到

  • 将默认点设置为某些版本。
  • 为不同的版本创建不同的链接。

这是同样的情况pythonpython2python3

如果您对在不同情况下如何发生感兴趣,可以在下面获取更多信息:

Your pip is a soft link to the same executable file path with pip3. you can use the commands below to check where your pip and pip3 real paths are:

$ ls -l `which pip`
$ ls -l `which pip3`

You may also use the commands below to know more details:

$ pip show pip
$ pip3 show pip

When we install different versions of python, we may create such soft links to

  • set default pip to some version.
  • make different links for different versions.

It is the same situation with python, python2, python3

More information below if you’re interested in how it happens in different cases:


回答 1

如果您安装了python 2.x,然后安装了python3,则您的pip将指向pip3。您可以通过键入来验证pip --version与相同pip3 --version

在您的系统上,现在有了pip,pip2和pip3

如果需要,可以将pip更改为指向pip2而不是pip3。

If you had python 2.x and then installed python3, your pip will be pointing to pip3. you can verify that by typing pip --version which would be the same as pip3 --version.

On your system you have now pip, pip2 and pip3.

If you want you can change pip to point to pip2 instead of pip3.


回答 2

安装后python3,即pip3会安装。如果你没有其他的Python安装(如python2.7),然后链接创建这点pippip3

所以pip是一个链接pip3,如果有Python安装的任何其他版本(比python3等)。 pip通常指向首次安装。

When you install python3, pip3 gets installed. And if you don’t have another python installation(like python2.7) then a link is created which points pip to pip3.

So pip is a link to to pip3 if there is no other version of python installed(other than python3). pip generally points to the first installation.


回答 3

这是一个棘手的问题。最后,如果您调用pip它,则将调用pip2pip3,具体取决于您如何设置系统。

This is a tricky subject. In the end, if you invoke pip it will invoke either pip2 or pip3, depending on how you set your system up.


回答 4

我认为pippip2并且pip3不是指向同一可执行文件路径的软链接。请注意以下命令并在我的linux终端中显示结果:

mrz@mrz-pc ~ $ ls -l `which pip`
-rwxr-xr-x 1 root root 292 Nov 10  2016 /usr/bin/pip
mrz@mrz-pc ~ $ ls -l `which pip2`
-rwxr-xr-x 1 root root 283 Nov 10  2016 /usr/bin/pip2
mrz@mrz-pc ~ $ ls -l `which pip3`
-rwxr-xr-x 1 root root 293 Nov 10  2016 /usr/bin/pip3
mrz@mrz-pc ~ $ pip -V
pip 9.0.1 from /home/mrz/.local/lib/python2.7/site-packages (python 2.7)
mrz@mrz-pc ~ $ pip2 -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)
mrz@mrz-pc ~ $ pip3 -V
pip 9.0.1 from /home/mrz/.local/lib/python3.5/site-packages (python 3.5)

如您所见,它们存在于不同的路径中。

pip3始终仅在Python3环境上运行,就像pip2在Python2上一样。pip在适合上下文的任何环境下运行。例如,如果您在Python3平台上,pip将在Python3环境下运行。

I think pip, pip2 and pip3 are not soft links to the same executable file path. Note these commands and results in my Linux terminal:

mrz@mrz-pc ~ $ ls -l `which pip`
-rwxr-xr-x 1 root root 292 Nov 10  2016 /usr/bin/pip
mrz@mrz-pc ~ $ ls -l `which pip2`
-rwxr-xr-x 1 root root 283 Nov 10  2016 /usr/bin/pip2
mrz@mrz-pc ~ $ ls -l `which pip3`
-rwxr-xr-x 1 root root 293 Nov 10  2016 /usr/bin/pip3
mrz@mrz-pc ~ $ pip -V
pip 9.0.1 from /home/mrz/.local/lib/python2.7/site-packages (python 2.7)
mrz@mrz-pc ~ $ pip2 -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)
mrz@mrz-pc ~ $ pip3 -V
pip 9.0.1 from /home/mrz/.local/lib/python3.5/site-packages (python 3.5)

As you see they exist in different paths.

pip3 always operates on the Python3 environment only, as pip2 does with Python2. pip operates in whichever environment is appropriate to the context. For example, if you are in a Python3 venv, pip will operate on the Python3 environment.


回答 5

通过插图:

pip --version
  pip 19.0.3 from /usr/lib/python3.7/site-packages/pip (python 3.7)

pip3 --version
  pip 19.0.3 from /usr/lib/python3.7/site-packages/pip (python 3.7)

python --version
  Python 3.7.3

which python
  /usr/bin/python

ls -l '/usr/bin/python'
  lrwxrwxrwx 1 root root 7 Mar 26 14:43 /usr/bin/python -> python3

which python3
  /usr/bin/python3

ls -l /usr/bin/python3
  lrwxrwxrwx 1 root root 9 Mar 26 14:43 /usr/bin/python3 -> python3.7

ls -l /usr/bin/python3.7
  -rwxr-xr-x 2 root root 14120 Mar 26 14:43 /usr/bin/python3.7

因此,在我的默认系统python(Python 3.7.3)中,my pip pip3

By illustration:

pip --version
  pip 19.0.3 from /usr/lib/python3.7/site-packages/pip (python 3.7)

pip3 --version
  pip 19.0.3 from /usr/lib/python3.7/site-packages/pip (python 3.7)

python --version
  Python 3.7.3

which python
  /usr/bin/python

ls -l '/usr/bin/python'
  lrwxrwxrwx 1 root root 7 Mar 26 14:43 /usr/bin/python -> python3

which python3
  /usr/bin/python3

ls -l /usr/bin/python3
  lrwxrwxrwx 1 root root 9 Mar 26 14:43 /usr/bin/python3 -> python3.7

ls -l /usr/bin/python3.7
  -rwxr-xr-x 2 root root 14120 Mar 26 14:43 /usr/bin/python3.7

Thus, my in my default system python (Python 3.7.3), pip is pip3.


回答 6

如果您安装了Python 2.7,我想您可以使用pip2pip2.7安装专门针对Python 2的软件包,例如

pip2 install some_pacakge

要么

pip2.7 install some_package

您可以使用pip3pip3.5安装专门用于Python 3的pacakges。

If you installed Python 2.7, I think you could use pip2 and pip2.7 to install packages specifically for Python 2, like

pip2 install some_pacakge

or

pip2.7 install some_package

And you may use pip3 or pip3.5 to install pacakges specifically for Python 3.


回答 7

在我的Windows实例上-并且我不完全了解我的环境-使用pip3安装kaggle-cli程序包有效-而pip无效。我在conda环境中工作,环境似乎有所不同。

(fastai)C:\ Users \ redact \ Downloads \ fast.ai \ deeplearning1 \ nbs> pip –version

来自C:\ ProgramData \ Anaconda3 \ envs \ fastai \ lib \ site-packages的pip 9.0.1(python 3.6)

(fastai)C:\ Users \ redact \ Downloads \ fast.ai \ deeplearning1 \ nbs> pip3 –version

来自c:\ users \ redact \ appdata \ local \ programs \ python \ python36 \ lib \ site-packages(python 3.6)的pip 9.0.1

On my Windows instance – and I do not fully understand my environment – using pip3 to install the kaggle-cli package worked – whereas pip did not. I was working in a conda environment and the environments appear to be different.

(fastai) C:\Users\redact\Downloads\fast.ai\deeplearning1\nbs>pip –version

pip 9.0.1 from C:\ProgramData\Anaconda3\envs\fastai\lib\site-packages (python 3.6)

(fastai) C:\Users\redact\Downloads\fast.ai\deeplearning1\nbs>pip3 –version

pip 9.0.1 from c:\users\redact\appdata\local\programs\python\python36\lib\site-packages (python 3.6)


回答 8

somepath / venv中激活了Python 3.6 virtualenv后,以下别名解决了macOS Sierra上的各种问题,其中pip坚持指向Apple的2.7 Python。

alias pip='python somepath/venv/lib/python3.6/site-packages/pip/__main__.py'

当我不得不做的时候,这不是很好,sudo pip因为root用户对我的别名或virtualenv一无所知,所以我也不得不添加一个额外的别名来处理这个问题。这是一个hack,但是可以用,我知道它的作用:

alias sudopip='sudo somepath/venv/bin/python somepath/venv/lib/python3.6/site-packages/pip/__main__.py'

背景:

pip3不存在,无法启动(找不到命令),which pip它将返回Apple Python的/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/pip

Python 3.6是通过macports安装的。

激活我想使用的3.6 virtualenv后,which python将返回somepath / venv / bin / python

以某种方式pip install可以做正确的事情并击中我的virtualenv,但pip list会使Python 2.7包感到不安。

对于Python而言,这对初学者友好程度比我期望的要低。

Given an activated Python 3.6 virtualenv in somepath/venv, the following aliases resolved the various issues on a macOS Sierra where pip insisted on pointing to Apple’s 2.7 Python.

alias pip='python somepath/venv/lib/python3.6/site-packages/pip/__main__.py'

This didn’t work so well when I had to do sudo pip as the root user doesn’t know anything about my alias or the virtualenv, so I had to add an extra alias to handle this as well. It’s a hack, but it works, and I know what it does:

alias sudopip='sudo somepath/venv/bin/python somepath/venv/lib/python3.6/site-packages/pip/__main__.py'

background:

pip3 did not exist to start (command not found) with and which pip would return /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/pip, the Apple Python.

Python 3.6 was installed via macports.

After activation of the 3.6 virtualenv I wanted to work with, which python would return somepath/venv/bin/python

Somehow pip install would do the right thing and hit my virtualenv, but pip list would rattle off Python 2.7 packages.

For Python, this is batting way beneath my expectations in terms of beginner-friendliness.


模拟函数引发异常以测试except块

问题:模拟函数引发异常以测试except块

我有一个foo调用另一个函数(bar)的函数()。如果调用bar()引发一个HttpError,如果状态代码为404,我想特别处理它,否则重新引发。

我正在尝试围绕此foo函数编写一些单元测试,以模拟对的调用bar()。不幸的是,我无法得到模拟调用bar()以引发被我的代码except块捕获的异常。

这是说明我问题的代码:

import unittest
import mock
from apiclient.errors import HttpError


class FooTests(unittest.TestCase):
    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnResultOfBar_whenBarSucceeds(self, barMock):
        barMock.return_value = True
        result = foo()
        self.assertTrue(result)  # passes

    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnNone_whenBarRaiseHttpError404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 404}), 'not found')
        result = foo()
        self.assertIsNone(result)  # fails, test raises HttpError

    @mock.patch('my_tests.bar')
    def test_foo_shouldRaiseHttpError_whenBarRaiseHttpErrorNot404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 500}), 'error')
        with self.assertRaises(HttpError):  # passes
            foo()

def foo():
    try:
        result = bar()
        return result
    except HttpError as error:
        if error.resp.status == 404:
            print '404 - %s' % error.message
            return None
        raise

def bar():
    raise NotImplementedError()

我跟着模拟文档这不能不说您应该设置side_effect一个的Mock情况下,以一个Exception班有嘲笑功能引发错误。

我还查看了其他一些与StackOverflow相关的问与答,看起来我在做他们正在做的相同事情,以使他们的模拟引发Exception。

为什么设置side_effectbarMock不引起预期Exception得到提升?如果我做的事情很奇怪,我应该如何在except块中测试逻辑?

I have a function (foo) which calls another function (bar). If invoking bar() raises an HttpError, I want to handle it specially if the status code is 404, otherwise re-raise.

I am trying to write some unit tests around this foo function, mocking out the call to bar(). Unfortunately, I am unable to get the mocked call to bar() to raise an Exception which is caught by my except block.

Here is my code which illustrates my problem:

import unittest
import mock
from apiclient.errors import HttpError


class FooTests(unittest.TestCase):
    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnResultOfBar_whenBarSucceeds(self, barMock):
        barMock.return_value = True
        result = foo()
        self.assertTrue(result)  # passes

    @mock.patch('my_tests.bar')
    def test_foo_shouldReturnNone_whenBarRaiseHttpError404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 404}), 'not found')
        result = foo()
        self.assertIsNone(result)  # fails, test raises HttpError

    @mock.patch('my_tests.bar')
    def test_foo_shouldRaiseHttpError_whenBarRaiseHttpErrorNot404(self, barMock):
        barMock.side_effect = HttpError(mock.Mock(return_value={'status': 500}), 'error')
        with self.assertRaises(HttpError):  # passes
            foo()

def foo():
    try:
        result = bar()
        return result
    except HttpError as error:
        if error.resp.status == 404:
            print '404 - %s' % error.message
            return None
        raise

def bar():
    raise NotImplementedError()

I followed the Mock docs which say that you should set the side_effect of a Mock instance to an Exception class to have the mocked function raise the error.

I also looked at some other related StackOverflow Q&As, and it looks like I am doing the same thing they are doing to cause and Exception to be raised by their mock.

Why is setting the side_effect of barMock not causing the expected Exception to be raised? If I am doing something weird, how should I go about testing logic in my except block?


回答 0

您的模拟正在引发异常,但是该error.resp.status值丢失了。而不是使用return_value,只是告诉Mockstatus是一个属性:

barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')

将其他关键字参数Mock()设置为结果对象的属性。

我将您的foobar定义放在my_tests模块中,并添加到HttpError类中,这样我也可以使用它,然后您的测试可以成功进行:

>>> from my_tests import foo, HttpError
>>> import mock
>>> with mock.patch('my_tests.bar') as barMock:
...     barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')
...     result = my_test.foo()
... 
404 - 
>>> result is None
True

您甚至可以看到print '404 - %s' % error.message生产线运行,但是我想您想在error.content那里使用它。HttpError()无论如何,这是第二个参数设置的属性。

Your mock is raising the exception just fine, but the error.resp.status value is missing. Rather than use return_value, just tell Mock that status is an attribute:

barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')

Additional keyword arguments to Mock() are set as attributes on the resulting object.

I put your foo and bar definitions in a my_tests module, added in the HttpError class so I could use it too, and your test then can be ran to success:

>>> from my_tests import foo, HttpError
>>> import mock
>>> with mock.patch('my_tests.bar') as barMock:
...     barMock.side_effect = HttpError(mock.Mock(status=404), 'not found')
...     result = my_test.foo()
... 
404 - 
>>> result is None
True

You can even see the print '404 - %s' % error.message line run, but I think you wanted to use error.content there instead; that’s the attribute HttpError() sets from the second argument, at any rate.


插入两个字符串的最pythonic方式

问题:插入两个字符串的最pythonic方式

将两个字符串网格化的最Python方式是什么?

例如:

输入:

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

输出:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

What’s the most pythonic way to mesh two strings together?

For example:

Input:

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

回答 0

对我来说,最pythonic *的方式是以下代码,它几乎做同样的事情,但是使用+运算符来连接每个字符串中的各个字符:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

它也比使用两个join()调用更快:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

存在更快的方法,但是它们常常使代码模糊。

注:如果两个输入字符串是相同的长度,则较长的一个将被截断,zip停在较短字符串的结尾迭代。在这种情况下,zip应该使用模块中的zip_longestizip_longest在Python 2中)而不是一个itertools来确保两个字符串都已用尽。


*引用Python之禅可读性很重要
Pythonic = 对我而言可读性i + j至少对于我的眼睛来说,更容易从视觉上进行解析。

For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

It is also faster than using two join() calls:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

Faster approaches exist, but they often obfuscate the code.

Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.


*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.


回答 1

更快的选择

其他方式:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

输出:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

速度

看起来更快:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

比迄今为止最快的解决方案:

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

同样对于较大的字符串:

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1。

不同长度字符串的变化

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

较短的一个确定长度(zip()等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

输出:

AaBbCcDdEeFfGgHhIiJjKkLl

更长的长度决定长度(itertools.zip_longest(fillvalue='')等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

输出:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

Faster Alternative

Another way:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Speed

Looks like it is faster:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

than the fastest solution so far:

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

Also for the larger strings:

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1.

Variation for strings with different lengths

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

Shorter one determines length (zip() equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLl

Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

回答 2

join()zip()

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

With join() and zip().

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

回答 3

在Python 2上,到目前为止,做事的最快方法是小字符串列表切片的速度大约是3倍,长字符串列表切片的速度大约是30倍。

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

但是,这在Python 3上不起作用。您可以实现类似

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

但是到那时,您已经失去了对小型字符串进行列表切片所获得的收益(对于长字符串而言,它的速度仍然是20倍),并且这甚至还不适用于非ASCII字符。

FWIW,如果您在大量字符串上执行此操作并且需要每个周期,并且由于某种原因必须使用Python字符串…以下是操作方法:

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

特殊情况下,较小类型的外壳也将有所帮助。FWIW,这只是长字符串列表切片速度的3倍,而小字符串则 4到5倍。

无论哪种方式,我都喜欢join解决方案,但是由于在其他地方提到了时间安排,我认为我也应该加入。

On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

This wouldn’t work on Python 3, though. You could implement something like

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

but by then you’ve already lost the gains over list slicing for small strings (it’s still 20x the speed for long strings) and this doesn’t even work for non-ASCII characters yet.

FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings… here’s how to do it:

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.

Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.


回答 4

如果您想要最快的方法,可以将itertools与结合使用operator.add

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

但是合并起来izipchain.from_iterable更快了

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

chain(*和之间也存在实质性差异 chain.from_iterable(...

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

没有像join那样的生成器,传递一个总是慢一些,因为python首先会使用内容来建立一个列表,因为它会对数据进行两次传递,一次传递所需的大小,一次传递实际的大小使用生成器无法实现的联接:

join.h

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

另外,如果您使用不同长度的字符串,并且不想丢失数据,则可以使用izip_longest

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

对于python 3,它称为 zip_longest

但是对于python2来说,veedrac的建议是迄今为止最快的:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

If you want the fastest way, you can combine itertools with operator.add:

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

But combining izip and chain.from_iterable is faster again

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

There is also a substantial difference between chain(* and chain.from_iterable(....

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:

join.h:

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

Also if you have different length strings and you don’t want to lose data you can use izip_longest :

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

For python 3 it is called zip_longest

But for python2, veedrac’s suggestion is by far the fastest:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

回答 5

您也可以使用map和执行此操作operator.add

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

输出

'AaAaAaAaAa'

map的作用是,它从第一个可迭代对象获取每个元素,u并从第二个可迭代对象获取第一个元素,l并应用作为第一个参数提供的函数add。然后加入只是加入他们。

You could also do this using map and operator.add:

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

Output:

'AaAaAaAaAa'

What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.


回答 6

吉姆的答案很好,但是,如果您不介意几次导入,这是我最喜欢的选择:

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

Jim’s answer is great, but here’s my favorite option, if you don’t mind a couple of imports:

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

回答 7

这些建议很多都假设字符串长度相等。也许涵盖了所有合理的用例,但至少对我来说,您似乎也想适应长度不同的字符串。还是我是唯一认为网格应该像这样工作的人:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

一种方法是:

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

One way to do this would be the following:

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

回答 8

我喜欢使用两个fors,变量名可以提示/提醒正在发生的事情:

"".join(char for pair in zip(u,l) for char in pair)

I like using two fors, the variable names can give a hint/reminder to what is going on:

"".join(char for pair in zip(u,l) for char in pair)

回答 9

只是添加另一种更基本的方法:

st = ""
for char in u:
    st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

Just to add another, more basic approach:

st = ""
for char in u:
    st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

回答 10

有点不讲究Python而不考虑这里的double-list-comprehension答案,用O(1)来处理n个字符串:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

all_strings您要插入的字符串的列表在哪里。就您而言,all_strings = [u, l]。完整的使用示例如下所示:

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

喜欢许多答案,最快吗?可能不是,但是简单而灵活。另外,在没有增加太多复杂性的情况下,这比公认的答案要快一些(通常,在python中字符串添加有点慢):

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

回答 11

可能比当前领先的解决方案更快,更短:

from itertools import chain

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

res = "".join(chain(*zip(u, l)))

快速策略是在C级别上尽可能多地做。相同的zip_longest()修复了不均匀的字符串,它会与chain()来自同一个模块,所以在这里不能给我太多点!

我提出的其他解决方案:

res = "".join(u[x] + l[x] for x in range(len(u)))

res = "".join(k + l[i] for i, k in enumerate(u))

Potentially faster and shorter than the current leading solution:

from itertools import chain

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

res = "".join(chain(*zip(u, l)))

Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can’t ding me too many points there!

Other solutions I came up with along the way:

res = "".join(u[x] + l[x] for x in range(len(u)))

res = "".join(k + l[i] for i, k in enumerate(u))

回答 12

你可以用1iteration_utilities.roundrobin

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

ManyIterables同一包中的类:

from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

1这来自我编写的第三方库iteration_utilities

You could use iteration_utilities.roundrobin1

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

or the ManyIterables class from the same package:

from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

1 This is from a third-party library I have written: iteration_utilities.


回答 13

我将使用zip()来获得一种可读且简单的方法:

result = ''
for cha, chb in zip(u, l):
    result += '%s%s' % (cha, chb)

print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

I would use zip() to get a readable and easy way:

result = ''
for cha, chb in zip(u, l):
    result += '%s%s' % (cha, chb)

print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

sklearn中的“ transform”和“ fit_transform”有什么区别

问题:sklearn中的“ transform”和“ fit_transform”有什么区别

在sklearn-python工具箱中,有两个函数transformfit_transformabout sklearn.decomposition.RandomizedPCA。两种功能的说明如下

但是它们之间有什么区别?

In the sklearn-python toolbox, there are two functions transform and fit_transform about sklearn.decomposition.RandomizedPCA. The description of two functions are as follows

But what is the difference between them ?


回答 0

在这里,仅当您已经在矩阵上计算了PCA时,才可以使用pca.transform的区别

   In [12]: pc2 = RandomizedPCA(n_components=3)

    In [13]: pc2.transform(X) # can't transform because it does not know how to do it.
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-13-e3b6b8ea2aff> in <module>()
    ----> 1 pc2.transform(X)

    /usr/local/lib/python3.4/dist-packages/sklearn/decomposition/pca.py in transform(self, X, y)
        714         # XXX remove scipy.sparse support here in 0.16
        715         X = atleast2d_or_csr(X)
    --> 716         if self.mean_ is not None:
        717             X = X - self.mean_
        718 

    AttributeError: 'RandomizedPCA' object has no attribute 'mean_'

    In [14]: pc2.ftransform(X) 
    pc2.fit            pc2.fit_transform  

    In [14]: pc2.fit_transform(X)
    Out[14]: 
    array([[-1.38340578, -0.2935787 ],
           [-2.22189802,  0.25133484],
           [-3.6053038 , -0.04224385],
           [ 1.38340578,  0.2935787 ],
           [ 2.22189802, -0.25133484],
           [ 3.6053038 ,  0.04224385]])

如果要使用.transform,则需要将转换规则教给您的pca

In [20]: pca = RandomizedPCA(n_components=3)

In [21]: pca.fit(X)
Out[21]: 
RandomizedPCA(copy=True, iterated_power=3, n_components=3, random_state=None,
       whiten=False)

In [22]: pca.transform(z)
Out[22]: 
array([[ 2.76681156,  0.58715739],
       [ 1.92831932,  1.13207093],
       [ 0.54491354,  0.83849224],
       [ 5.53362311,  1.17431479],
       [ 6.37211535,  0.62940125],
       [ 7.75552113,  0.92297994]])

In [23]: 

特别地,PCA变换将通过矩阵X的PCA分解获得的基数更改应用于矩阵Z。

The .transform method is meant for when you have already computed PCA, i.e. if you have already called its .fit method.

In [12]: pc2 = RandomizedPCA(n_components=3)

In [13]: pc2.transform(X) # can't transform because it does not know how to do it.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-e3b6b8ea2aff> in <module>()
----> 1 pc2.transform(X)

/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/pca.py in transform(self, X, y)
    714         # XXX remove scipy.sparse support here in 0.16
    715         X = atleast2d_or_csr(X)
--> 716         if self.mean_ is not None:
    717             X = X - self.mean_
    718 

AttributeError: 'RandomizedPCA' object has no attribute 'mean_'

In [14]: pc2.ftransform(X) 
pc2.fit            pc2.fit_transform  

In [14]: pc2.fit_transform(X)
Out[14]: 
array([[-1.38340578, -0.2935787 ],
       [-2.22189802,  0.25133484],
       [-3.6053038 , -0.04224385],
       [ 1.38340578,  0.2935787 ],
       [ 2.22189802, -0.25133484],
       [ 3.6053038 ,  0.04224385]])
    
  

So you want to fit RandomizedPCA and then transform as:

In [20]: pca = RandomizedPCA(n_components=3)

In [21]: pca.fit(X)
Out[21]: 
RandomizedPCA(copy=True, iterated_power=3, n_components=3, random_state=None,
       whiten=False)

In [22]: pca.transform(z)
Out[22]: 
array([[ 2.76681156,  0.58715739],
       [ 1.92831932,  1.13207093],
       [ 0.54491354,  0.83849224],
       [ 5.53362311,  1.17431479],
       [ 6.37211535,  0.62940125],
       [ 7.75552113,  0.92297994]])

In [23]: 

In particular PCA .transform applies the change of basis obtained through the PCA decomposition of the matrix X to the matrix Z.


回答 1

scikit-learn estimator api中

fit() :用于从训练数据生成学习模型参数

transform():从fit()方法生成的参数,应用于模型以生成转换后的数据集。

fit_transform()fit()transform()api在同一数据集上的组合

结帐章-4从这本书及答案从stackexchange为了更清楚

In scikit-learn estimator api,

fit() : used for generating learning model parameters from training data

transform() : parameters generated from fit() method,applied upon model to generate transformed data set.

fit_transform() : combination of fit() and transform() api on same data set

Checkout Chapter-4 from this book & answer from stackexchange for more clarity


回答 2

这些方法用于确定给定数据的中心/特征标度。它基本上有助于规范特定范围内的数据

为此,我们使用Z评分方法。

我们在训练数据集上执行此操作。

1. Fit():方法计算参数μ和σ并将其保存为内部对象。

2. Transform():使用这些计算出的参数的方法将转换应用于特定的数据集。

3. Fit_transform():将fit()和transform()方法结合在一起以进行数据集转换。

特征缩放/标准化的代码片段(在train_test_split之后)。

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit_transform(X_train)
sc.transform(X_test)

我们在测试集上应用相同的(训练集相同的两个参数μ和σ(值))参数转换。

These methods are used to center/feature scale of a given data. It basically helps to normalize the data within a particular range

For this, we use Z-score method.

We do this on the training set of data.

1.Fit(): Method calculates the parameters μ and σ and saves them as internal objects.

2.Transform(): Method using these calculated parameters apply the transformation to a particular dataset.

3.Fit_transform(): joins the fit() and transform() method for transformation of dataset.

Code snippet for Feature Scaling/Standardisation(after train_test_split).

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit_transform(X_train)
sc.transform(X_test)

We apply the same(training set same two parameters μ and σ (values)) parameter transformation on our testing set.


回答 3

两种方法之间的通用差异:

  • 适合(raw_documents [,y]):学习原始文档中所有标记的词汇词典。
  • fit_transform(raw_documents [,y]):学习词汇词典并返回术语文档矩阵。这等效于紧随其后的变换,但实现效率更高。
  • transform(raw_documents):将文档转换为文档术语矩阵。使用适合的词汇表或提供给构造函数的词汇表从原始文本文档中提取令牌计数。

fit_transform和transform都返回相同的Document-term矩阵。

资源

Generic difference between the methods:

  • fit(raw_documents[, y]): Learn a vocabulary dictionary of all tokens in the raw documents.
  • fit_transform(raw_documents[, y]): Learn the vocabulary dictionary and return term-document matrix. This is equivalent to fit followed by the transform, but more efficiently implemented.
  • transform(raw_documents): Transform documents to document-term matrix. Extract token counts out of raw text documents using the vocabulary fitted with fit or the one provided to the constructor.

Both fit_transform and transform returns the same, Document-term matrix.

Source


回答 4

这里.fit()&的基本区别.fit_transform()

。适合():

在有两个对象/参数(x,y)的监督学习中用于拟合模型并使模型运行,我们知道我们要预测的内容

.fit_transform():

在无监督学习中使用一个对象/参数(x),而我们不知道该预测什么。

Here the basic difference between .fit() & .fit_transform():

.fit():

is use in the Supervised learning having two object/parameter(x,y) to fit model and make model to run, where we know that what we are going to predict

.fit_transform():

is use in Unsupervised Learning having one object/parameter(x), where we don’t know, what we are going to predict.


回答 5

用外行的术语来说,fit_transform意味着先进行一些计算,然后再进行转换(例如,根据一些数据计算列的均值,然后替换缺少的值)。因此,对于训练集,您既需要计算又要进行转换。

但是对于测试集,机器学习根据训练集中学习到的内容应用预测,因此不需要计算,只需执行转换即可。

In layman’s terms, fit_transform means to do some calculation and then do transformation (say calculating the means of columns from some data and then replacing the missing values). So for training set, you need to both calculate and do transformation.

But for testing set, Machine learning applies prediction based on what was learned during the training set and so it doesn’t need to calculate, it just performs the transformation.


回答 6

为什么和何时使用每一个:

所有答复都很好,但是我将重点介绍为什么和何时使用每种方法。

fit(),transform(),fit_transform()

通常,我们有一个监督学习问题,其中(X,y)作为数据集,我们将其分为训练数据和测试数据:

import numpy as np
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

X_train_vectorized = model.fit_transform(X_train)
X_test_vectorized = model.transform(X_test)

想象一下,我们正在安装一个令牌生成器,如果我们安装了X,我们会将测试数据包含到令牌生成器中,但是我已经多次看到这个错误!

正确的方法是只适合X_train,因为您不知道“您的未来数据”,因此您不能使用X_test数据来拟合任何东西!

然后,您可以转换测试数据,但是要分别进行转换,这就是为什么存在不同方法的原因。

最后的提示:X_train_transformed = model.fit_transform(X_train)等同于: X_train_transformed = model.fit(X_train).transform(X_train),但是第一个提示更快。

请注意,我所谓的“模型”通常是缩放器,tfidf转换器,其他类型的矢量化器,令牌化器…

Why and When use each one:

All the responses are quite good, but I would make emphasis in WHY and WHEN use each method.

fit(), transform(), fit_transform()

Usually we have a supervised learning problem with (X, y) as out dataset, and we split it into training data and test data:

import numpy as np
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

X_train_vectorized = model.fit_transform(X_train)
X_test_vectorized = model.transform(X_test)

Imagine we are fitting a tokenizer, if we fit X we are including testing data into the tokenizer, but I have seen this error many times!

The correct is to fit ONLY with X_train, because you don’t know “your future data” so you cannot use X_test data for fitting anything!

Then you can transform your test data, but separately, that’s why there are different methods.

Final tip: X_train_transformed = model.fit_transform(X_train) is equivalent to: X_train_transformed = model.fit(X_train).transform(X_train), but the first one is faster.

Note that what I call “model” usually will be a scaler, a tfidf transformer, other kind of vectorizer, a tokenizer…


如何在Python 2中发送HEAD HTTP请求?

问题:如何在Python 2中发送HEAD HTTP请求?

我在这里尝试做的是获取给定URL的标头,以便确定MIME类型。我希望能够查看是否http://somedomain/foo/将返回例如HTML文档或JPEG图像。因此,我需要弄清楚如何发送HEAD请求,以便无需下载内容就可以读取MIME类型。有人知道这样做的简单方法吗?

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?


回答 0

编辑:此答案有效,但是现在您应该使用下面其他答案中提到的请求库。


使用httplib

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

还有一个getheader(name)获取特定的标头。

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.


Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There’s also a getheader(name) to get a specific header.


回答 1

urllib2可用于执行HEAD请求。这比使用httplib更好,因为urllib2为您解析URL,而不是要求您将URL分为主机名和路径。

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

头可以通过response.info()像以前一样使用。有趣的是,您可以找到重定向到的URL:

>>> print response.geturl()
http://www.google.com.au/index.html

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

回答 2

强制Requests方式:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

回答 3

我认为也应该提及Requests库。

I believe the Requests library should be mentioned as well.


回答 4

只是:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

编辑:我刚开始意识到有httplib2:D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

连结文字

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I’ve just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text


回答 5

为了完整起见,有一个与使用httplib接受的答案等效的Python3答案。

基本相同的代码只是该库不再被称为httplib而是http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn’t called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

回答 6

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

回答 7

顺便说一句,当使用httplib时(至少在2.5.2上),尝试读取HEAD请求的响应将阻塞(在读取行中),随后失败。如果您未在响应中发出已读消息,则无法在连接上发送另一个请求,则需要打开一个新请求。或者接受两次请求之间的长时间延迟。

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.


回答 8

我发现httplib比urllib2快一点。我为两个程序计时-一个使用httplib,另一个使用urllib2-将HEAD请求发送到10,000个URL。httplib的速度快了几分钟。 httplib的总统计为:实际6m21.334s用户0m2.124s sys 0m16.372s

的urllib2的总的统计是:实9m1.380s用户0m16.666s SYS 0m28.565s

有人对此有意见吗?

I have found that httplib is slightly faster than urllib2. I timed two programs – one using httplib and the other using urllib2 – sending HEAD requests to 10,000 URL’s. The httplib one was faster by several minutes. httplib‘s total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2‘s total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?


回答 9

还有另一种方法(类似于Pawel的答案):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

只是为了避免在实例级别使用无限制的方法。

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.


回答 10

可能更容易:使用urllib或urllib2。

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info()是类似字典的对象,因此您可以执行f.info()[‘content-type’]等。

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

文档指出,通常不直接使用httplib。

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()[‘content-type’], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.


有多行导入的推荐格式吗?

问题:有多行导入的推荐格式吗?

我已经阅读了在python中编码多行导入的三种方法

带斜杠:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

复制语句:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

带括号:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

此语句是否有推荐的格式或更简洁的方法?

I have read there are three ways for coding multi-line imports in python

With slashes:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

Duplicating senteces:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

With parenthesis:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

Is there a recomended format or a more elegant way for this statements?


回答 0

就个人而言,导入多个组件并按字母顺序对它们进行括号处理。像这样:

from Tkinter import (
    Button,
    Canvas,
    DISABLED,
    END,
    Entry,
    Frame,
    LEFT,
    NORMAL,
    RIDGE,
    Text,
    Tk,
)

这具有额外的优势,即可以轻松查看每个提交或PR中已添加/删除了哪些组件。

总的来说,尽管这是个人喜好,但我建议您选择最适合自己的方式。

Personally I go with parentheses when importing more than one component and sort them alphabetically. Like so:

from Tkinter import (
    Button,
    Canvas,
    DISABLED,
    END,
    Entry,
    Frame,
    LEFT,
    NORMAL,
    RIDGE,
    Text,
    Tk,
)

This has the added advantage of easily seeing what components have been added / removed in each commit or PR.

Overall though it’s a personal preference and I would advise you to go with whatever looks best to you.


回答 1

您的示例似乎源于PEP 328。在那里,正是针对这个问题提出了括号符号,所以我可能会选择这个。

Your examples seem to stem from PEP 328. There, the parenthesis-notation is proposed for exactly this problem, so probably I’d choose this one.


回答 2

我将使用PEP328的括号符号,并在括号之前和之后添加换行符:

from Tkinter import (
    Tk, Frame, Button, Entry, Canvas, Text, 
    LEFT, DISABLED, NORMAL, RIDGE, END
)

这是Django使用的格式:

from django.test.client import Client, RequestFactory
from django.test.testcases import (
    LiveServerTestCase, SimpleTestCase, TestCase, TransactionTestCase,
    skipIfDBFeature, skipUnlessAnyDBFeature, skipUnlessDBFeature,
)
from django.test.utils import (
    ignore_warnings, modify_settings, override_settings,
    override_system_checks, tag,
)

I would go with the parenthesis notation from the PEP328 with newlines added before and after parentheses:

from Tkinter import (
    Tk, Frame, Button, Entry, Canvas, Text, 
    LEFT, DISABLED, NORMAL, RIDGE, END
)

This is the format which Django uses:

from django.test.client import Client, RequestFactory
from django.test.testcases import (
    LiveServerTestCase, SimpleTestCase, TestCase, TransactionTestCase,
    skipIfDBFeature, skipUnlessAnyDBFeature, skipUnlessDBFeature,
)
from django.test.utils import (
    ignore_warnings, modify_settings, override_settings,
    override_system_checks, tag,
)

回答 3

通常对于Tkinter,只使用from Tkinter import *模块就可以了,因为该模块只会导出明显是小部件的名称。

PEP 8没有列出这种情况下的任何约定,因此我想您应该决定什么是最佳选择。一切都与可读性有关,因此请选择任何可以使您清楚地表明要从单个模块导入内容的东西。

由于在您的范围内提供了所有这些名称,因此我个人认为选项2最清晰,因为您可以看到最佳的导入名称。然后,您甚至可以将其拆分成更多的部分,以将彼此属于的那些名称组合在一起。在您的示例中,我可能将和分开放置Tk,因为它们将小部件分组在一起,而将和分开放置则是它们在视图中较小的组件。FrameCanvasButtonText

Usually with Tkinter, it is okay to just use from Tkinter import * as the module will only export names that are clearly widgets.

PEP 8 does not list any conventions for such a case, so I guess it is up to you to decide what is the best option. It is all about readability, so choose whatever makes it clear that you are importing stuff from a single module.

As all those names are made available in your scope, I personally think that options 2 is the most clearest as you can see the imported names the best. You then could even split it up more to maybe group those names together that belong with each other. In your example I might put Tk, Frame and Canvas separately as they group widgets together, while having Button and Text separately as they are smaller components in a view.