分类目录归档:知识问答

将列表打印为表格数据

问题:将列表打印为表格数据

我是Python的新手,现在正努力为打印输出很好地格式化数据。

我有一个用于两个标题的列表,以及一个应该作为表内容的矩阵。像这样:

teams_list = ["Man Utd", "Man City", "T Hotspur"]
data = np.array([[1, 2, 1],
                 [0, 1, 0],
                 [2, 4, 2]])

请注意,标题名称不一定是相同的长度。数据条目都是整数。

现在,我想以表格格式表示此内容,如下所示:

            Man Utd   Man City   T Hotspur
  Man Utd         1          0           0
 Man City         1          1           0
T Hotspur         0          1           2

我有一个预感,为此必须有一个数据结构,但是我找不到它。我尝试使用字典并格式化打印,尝试使用缩进进行for循环,并尝试将打印为字符串。

我确信必须有一种非常简单的方法来执行此操作,但是由于缺乏经验,我可能会错过它。

I am quite new to Python and I am now struggling with formatting my data nicely for printed output.

I have one list that is used for two headings, and a matrix that should be the contents of the table. Like so:

teams_list = ["Man Utd", "Man City", "T Hotspur"]
data = np.array([[1, 2, 1],
                 [0, 1, 0],
                 [2, 4, 2]])

Note that the heading names are not necessarily the same lengths. The data entries are all integers, though.

Now, I want to represent this in a table format, something like this:

            Man Utd   Man City   T Hotspur
  Man Utd         1          0           0
 Man City         1          1           0
T Hotspur         0          1           2

I have a hunch that there must be a data structure for this, but I cannot find it. I have tried using a dictionary and formatting the printing, I have tried for-loops with indentation and I have tried printing as strings.

I am sure there must be a very simple way to do this, but I am probably missing it due to lack of experience.


回答 0

Python 2.7的一些特殊代码:

row_format ="{:>15}" * (len(teams_list) + 1)
print(row_format.format("", *teams_list))
for team, row in zip(teams_list, data):
    print(row_format.format(team, *row))

这依赖于str.format()格式化规范的迷你语言

Some ad-hoc code for Python 2.7:

row_format ="{:>15}" * (len(teams_list) + 1)
print(row_format.format("", *teams_list))
for team, row in zip(teams_list, data):
    print(row_format.format(team, *row))

This relies on str.format() and the Format Specification Mini-Language.


回答 1

有一些轻巧实用的python软件包可用于此目的:

1.制表https : //pypi.python.org/pypi/tabulate

from tabulate import tabulate
print(tabulate([['Alice', 24], ['Bob', 19]], headers=['Name', 'Age']))
Name      Age
------  -----
Alice      24
Bob        19

制表具有许多选项来指定标题和表格式。

print(tabulate([['Alice', 24], ['Bob', 19]], headers=['Name', 'Age'], tablefmt='orgtbl'))
| Name   |   Age |
|--------+-------|
| Alice  |    24 |
| Bob    |    19 |

2. PrettyTablehttps//pypi.python.org/pypi/PrettyTable

from prettytable import PrettyTable
t = PrettyTable(['Name', 'Age'])
t.add_row(['Alice', 24])
t.add_row(['Bob', 19])
print(t)
+-------+-----+
|  Name | Age |
+-------+-----+
| Alice |  24 |
|  Bob  |  19 |
+-------+-----+

PrettyTable具有从csv,html,sql数据库读取数据的选项。您还可以选择数据子集,对表进行排序和更改表样式。

3. texttablehttps : //pypi.python.org/pypi/texttable

from texttable import Texttable
t = Texttable()
t.add_rows([['Name', 'Age'], ['Alice', 24], ['Bob', 19]])
print(t.draw())
+-------+-----+
| Name  | Age |
+=======+=====+
| Alice | 24  |
+-------+-----+
| Bob   | 19  |
+-------+-----+

使用texttable,您可以控制水平/垂直对齐,边框样式和数据类型。

4. termtableshttps : //github.com/nschloe/termtables

import termtables as tt

string = tt.to_string(
    [["Alice", 24], ["Bob", 19]],
    header=["Name", "Age"],
    style=tt.styles.ascii_thin_double,
    # alignment="ll",
    # padding=(0, 1),
)
print(string)
+-------+-----+
| Name  | Age |
+=======+=====+
| Alice | 24  |
+-------+-----+
| Bob   | 19  |
+-------+-----+

使用texttable,您可以控制水平/垂直对齐,边框样式和数据类型。

其他选项:

  • terminaltables从字符串列表中轻松在终端/控制台应用程序中绘制表。支持多行。
  • asciitable Asciitable可以通过内置的扩展阅读器类读取和写入各种ASCII表格式。

There are some light and useful python packages for this purpose:

1. tabulate: https://pypi.python.org/pypi/tabulate

from tabulate import tabulate
print(tabulate([['Alice', 24], ['Bob', 19]], headers=['Name', 'Age']))
Name      Age
------  -----
Alice      24
Bob        19

tabulate has many options to specify headers and table format.

print(tabulate([['Alice', 24], ['Bob', 19]], headers=['Name', 'Age'], tablefmt='orgtbl'))
| Name   |   Age |
|--------+-------|
| Alice  |    24 |
| Bob    |    19 |

2. PrettyTable: https://pypi.python.org/pypi/PrettyTable

from prettytable import PrettyTable
t = PrettyTable(['Name', 'Age'])
t.add_row(['Alice', 24])
t.add_row(['Bob', 19])
print(t)
+-------+-----+
|  Name | Age |
+-------+-----+
| Alice |  24 |
|  Bob  |  19 |
+-------+-----+

PrettyTable has options to read data from csv, html, sql database. Also you are able to select subset of data, sort table and change table styles.

3. texttable: https://pypi.python.org/pypi/texttable

from texttable import Texttable
t = Texttable()
t.add_rows([['Name', 'Age'], ['Alice', 24], ['Bob', 19]])
print(t.draw())
+-------+-----+
| Name  | Age |
+=======+=====+
| Alice | 24  |
+-------+-----+
| Bob   | 19  |
+-------+-----+

with texttable you can control horizontal/vertical align, border style and data types.

4. termtables: https://github.com/nschloe/termtables

import termtables as tt

string = tt.to_string(
    [["Alice", 24], ["Bob", 19]],
    header=["Name", "Age"],
    style=tt.styles.ascii_thin_double,
    # alignment="ll",
    # padding=(0, 1),
)
print(string)
+-------+-----+
| Name  | Age |
+=======+=====+
| Alice | 24  |
+-------+-----+
| Bob   | 19  |
+-------+-----+

with texttable you can control horizontal/vertical align, border style and data types.

Other options:

  • terminaltables Easily draw tables in terminal/console applications from a list of lists of strings. Supports multi-line rows.
  • asciitable Asciitable can read and write a wide range of ASCII table formats via built-in Extension Reader Classes.

回答 2

>>> import pandas
>>> pandas.DataFrame(data, teams_list, teams_list)
           Man Utd  Man City  T Hotspur
Man Utd    1        2         1        
Man City   0        1         0        
T Hotspur  2        4         2        
>>> import pandas
>>> pandas.DataFrame(data, teams_list, teams_list)
           Man Utd  Man City  T Hotspur
Man Utd    1        2         1        
Man City   0        1         0        
T Hotspur  2        4         2        

回答 3

Python实际上使这变得非常容易。

就像是

for i in range(10):
    print '%-12i%-12i' % (10 ** i, 20 ** i)

将有输出

1           1           
10          20          
100         400         
1000        8000        
10000       160000      
100000      3200000     
1000000     64000000    
10000000    1280000000  
100000000   25600000000
1000000000  512000000000

字符串中的%本质上是一个转义字符,其后的字符告诉python数据应采用哪种格式。字符串前后的%告诉python您打算将前一个字符串用作格式字符串,并将以下数据放入指定的格式中。

在这种情况下,我两次使用了“%-12i”。分解每个部分:

'-' (left align)
'12' (how much space to be given to this part of the output)
'i' (we are printing an integer)

从文档中:https : //docs.python.org/2/library/stdtypes.html#string-formatting

Python actually makes this quite easy.

Something like

for i in range(10):
    print '%-12i%-12i' % (10 ** i, 20 ** i)

will have the output

1           1           
10          20          
100         400         
1000        8000        
10000       160000      
100000      3200000     
1000000     64000000    
10000000    1280000000  
100000000   25600000000
1000000000  512000000000

The % within the string is essentially an escape character and the characters following it tell python what kind of format the data should have. The % outside and after the string is telling python that you intend to use the previous string as the format string and that the following data should be put into the format specified.

In this case I used “%-12i” twice. To break down each part:

'-' (left align)
'12' (how much space to be given to this part of the output)
'i' (we are printing an integer)

From the docs: https://docs.python.org/2/library/stdtypes.html#string-formatting


回答 4

更新Sven Marnach的答案以在Python 3.4中工作:

row_format ="{:>15}" * (len(teams_list) + 1)
print(row_format.format("", *teams_list))
for team, row in zip(teams_list, data):
    print(row_format.format(team, *row))

Updating Sven Marnach’s answer to work in Python 3.4:

row_format ="{:>15}" * (len(teams_list) + 1)
print(row_format.format("", *teams_list))
for team, row in zip(teams_list, data):
    print(row_format.format(team, *row))

回答 5

当我这样做时,我希望对表格的格式化细节有一些控制。特别是,我希望标头单元格具有与主体单元格不同的格式,并且表列的宽度应仅与每个单元格所需的宽度一样。这是我的解决方案:

def format_matrix(header, matrix,
                  top_format, left_format, cell_format, row_delim, col_delim):
    table = [[''] + header] + [[name] + row for name, row in zip(header, matrix)]
    table_format = [['{:^{}}'] + len(header) * [top_format]] \
                 + len(matrix) * [[left_format] + len(header) * [cell_format]]
    col_widths = [max(
                      len(format.format(cell, 0))
                      for format, cell in zip(col_format, col))
                  for col_format, col in zip(zip(*table_format), zip(*table))]
    return row_delim.join(
               col_delim.join(
                   format.format(cell, width)
                   for format, cell, width in zip(row_format, row, col_widths))
               for row_format, row in zip(table_format, table))

print format_matrix(['Man Utd', 'Man City', 'T Hotspur', 'Really Long Column'],
                    [[1, 2, 1, -1], [0, 1, 0, 5], [2, 4, 2, 2], [0, 1, 0, 6]],
                    '{:^{}}', '{:<{}}', '{:>{}.3f}', '\n', ' | ')

这是输出:

                   | Man Utd | Man City | T Hotspur | Really Long Column
Man Utd            |   1.000 |    2.000 |     1.000 |             -1.000
Man City           |   0.000 |    1.000 |     0.000 |              5.000
T Hotspur          |   2.000 |    4.000 |     2.000 |              2.000
Really Long Column |   0.000 |    1.000 |     0.000 |              6.000

When I do this, I like to have some control over the details of how the table is formatted. In particular, I want header cells to have a different format than body cells, and the table column widths to only be as wide as each one needs to be. Here’s my solution:

def format_matrix(header, matrix,
                  top_format, left_format, cell_format, row_delim, col_delim):
    table = [[''] + header] + [[name] + row for name, row in zip(header, matrix)]
    table_format = [['{:^{}}'] + len(header) * [top_format]] \
                 + len(matrix) * [[left_format] + len(header) * [cell_format]]
    col_widths = [max(
                      len(format.format(cell, 0))
                      for format, cell in zip(col_format, col))
                  for col_format, col in zip(zip(*table_format), zip(*table))]
    return row_delim.join(
               col_delim.join(
                   format.format(cell, width)
                   for format, cell, width in zip(row_format, row, col_widths))
               for row_format, row in zip(table_format, table))

print format_matrix(['Man Utd', 'Man City', 'T Hotspur', 'Really Long Column'],
                    [[1, 2, 1, -1], [0, 1, 0, 5], [2, 4, 2, 2], [0, 1, 0, 6]],
                    '{:^{}}', '{:<{}}', '{:>{}.3f}', '\n', ' | ')

Here’s the output:

                   | Man Utd | Man City | T Hotspur | Really Long Column
Man Utd            |   1.000 |    2.000 |     1.000 |             -1.000
Man City           |   0.000 |    1.000 |     0.000 |              5.000
T Hotspur          |   2.000 |    4.000 |     2.000 |              2.000
Really Long Column |   0.000 |    1.000 |     0.000 |              6.000

回答 6

我认为就是您想要的。

这是一个简单的模块,仅计算表条目所需的最大宽度,然后仅使用rjustljust进行数据的漂亮打印。

如果您想使左标题右对齐,请更改此调用:

 print >> out, row[0].ljust(col_paddings[0] + 1),

从第53行开始:

 print >> out, row[0].rjust(col_paddings[0] + 1),

I think this is what you are looking for.

It’s a simple module that just computes the maximum required width for the table entries and then just uses rjust and ljust to do a pretty print of the data.

If you want your left heading right aligned just change this call:

 print >> out, row[0].ljust(col_paddings[0] + 1),

From line 53 with:

 print >> out, row[0].rjust(col_paddings[0] + 1),

回答 7

我知道我参加晚会很晚,但是我为此做了一个图书馆,我认为这真的可以帮上忙。这非常简单,这就是为什么我认为您应该使用它。它称为TableIT

基本用途

要使用它,请首先按照GitHub Page上的下载说明进行操作。

然后将其导入:

import TableIt

然后制作一个列表列表,其中每个内部列表都是一行:

table = [
    [4, 3, "Hi"],
    [2, 1, 808890312093],
    [5, "Hi", "Bye"]
]

然后,您所要做的就是打印它:

TableIt.printTable(table)

这是您得到的输出:

+--------------------------------------------+
| 4            | 3            | Hi           |
| 2            | 1            | 808890312093 |
| 5            | Hi           | Bye          |
+--------------------------------------------+

栏位名称

您可以根据需要使用字段名称(如果您不使用字段名称,则不必说useFieldNames = False,因为默认情况下已将其设置为):


TableIt.printTable(table, useFieldNames=True)

由此您将获得:

+--------------------------------------------+
| 4            | 3            | Hi           |
+--------------+--------------+--------------+
| 2            | 1            | 808890312093 |
| 5            | Hi           | Bye          |
+--------------------------------------------+

还有其他用途,例如,您可以执行以下操作:

import TableIt

myList = [
    ["Name", "Email"],
    ["Richard", "richard@fakeemail.com"],
    ["Tasha", "tash@fakeemail.com"]
]

TableIt.print(myList, useFieldNames=True)

从那:

+-----------------------------------------------+
| Name                  | Email                 |
+-----------------------+-----------------------+
| Richard               | richard@fakeemail.com |
| Tasha                 | tash@fakeemail.com    |
+-----------------------------------------------+

或者您可以这样做:

import TableIt

myList = [
    ["", "a", "b"],
    ["x", "a + x", "a + b"],
    ["z", "a + z", "z + b"]
]

TableIt.printTable(myList, useFieldNames=True)

从中可以得到:

+-----------------------+
|       | a     | b     |
+-------+-------+-------+
| x     | a + x | a + b |
| z     | a + z | z + b |
+-----------------------+

色彩

您也可以使用颜色。

通过使用颜色选项(默认情况下将其设置为None)并指定RGB值使用颜色。

使用上面的示例:

import TableIt

myList = [
    ["", "a", "b"],
    ["x", "a + x", "a + b"],
    ["z", "a + z", "z + b"]
]

TableIt.printTable(myList, useFieldNames=True, color=(26, 156, 171))

然后您将获得:

在此处输入图片说明

请注意,打印颜色可能对您不起作用,但它的工作原理与打印彩色文本的其他库完全相同。我已经测试过,每种颜色都可以。蓝色也不会像使用默认34mANSI转义序列时那样弄乱(如果您不知道那是什么也没关系)。无论如何,这全都来自每个颜色都是RGB值而不是系统默认值的事实。

更多信息

有关更多信息,请查看GitHub Page

I know that I am late to the party, but I just made a library for this that I think could really help. It is extremely simple, that’s why I think you should use it. It is called TableIT.

Basic Use

To use it, first follow the download instructions on the GitHub Page.

Then import it:

import TableIt

Then make a list of lists where each inner list is a row:

table = [
    [4, 3, "Hi"],
    [2, 1, 808890312093],
    [5, "Hi", "Bye"]
]

Then all you have to do is print it:

TableIt.printTable(table)

This is the output you get:

+--------------------------------------------+
| 4            | 3            | Hi           |
| 2            | 1            | 808890312093 |
| 5            | Hi           | Bye          |
+--------------------------------------------+

Field Names

You can use field names if you want to (if you aren’t using field names you don’t have to say useFieldNames=False because it is set to that by default):


TableIt.printTable(table, useFieldNames=True)

From that you will get:

+--------------------------------------------+
| 4            | 3            | Hi           |
+--------------+--------------+--------------+
| 2            | 1            | 808890312093 |
| 5            | Hi           | Bye          |
+--------------------------------------------+

There are other uses to, for example you could do this:

import TableIt

myList = [
    ["Name", "Email"],
    ["Richard", "richard@fakeemail.com"],
    ["Tasha", "tash@fakeemail.com"]
]

TableIt.print(myList, useFieldNames=True)

From that:

+-----------------------------------------------+
| Name                  | Email                 |
+-----------------------+-----------------------+
| Richard               | richard@fakeemail.com |
| Tasha                 | tash@fakeemail.com    |
+-----------------------------------------------+

Or you could do:

import TableIt

myList = [
    ["", "a", "b"],
    ["x", "a + x", "a + b"],
    ["z", "a + z", "z + b"]
]

TableIt.printTable(myList, useFieldNames=True)

And from that you get:

+-----------------------+
|       | a     | b     |
+-------+-------+-------+
| x     | a + x | a + b |
| z     | a + z | z + b |
+-----------------------+

Colors

You can also use colors.

You use colors by using the color option (by default it is set to None) and specifying RGB values.

Using the example from above:

import TableIt

myList = [
    ["", "a", "b"],
    ["x", "a + x", "a + b"],
    ["z", "a + z", "z + b"]
]

TableIt.printTable(myList, useFieldNames=True, color=(26, 156, 171))

Then you will get:

enter image description here

Please note that printing colors might not work for you but it does works the exact same as the other libraries that print colored text. I have tested and every single color works. The blue is not messed up either as it would if using the default 34m ANSI escape sequence (if you don’t know what that is it doesn’t matter). Anyway, it all comes from the fact that every color is RGB value rather than a system default.

More Info

For more info check the GitHub Page


回答 8

纯Python 3

def print_table(data, cols, wide):
    '''Prints formatted data on columns of given width.'''
    n, r = divmod(len(data), cols)
    pat = '{{:{}}}'.format(wide)
    line = '\n'.join(pat * cols for _ in range(n))
    last_line = pat * r
    print(line.format(*data))
    print(last_line.format(*data[n*cols:]))

data = [str(i) for i in range(27)]
print_table(data, 6, 12)

将打印

0           1           2           3           4           5           
6           7           8           9           10          11          
12          13          14          15          16          17          
18          19          20          21          22          23          
24          25          26

Pure Python 3

def print_table(data, cols, wide):
    '''Prints formatted data on columns of given width.'''
    n, r = divmod(len(data), cols)
    pat = '{{:{}}}'.format(wide)
    line = '\n'.join(pat * cols for _ in range(n))
    last_line = pat * r
    print(line.format(*data))
    print(last_line.format(*data[n*cols:]))

data = [str(i) for i in range(27)]
print_table(data, 6, 12)

Will print

0           1           2           3           4           5           
6           7           8           9           10          11          
12          13          14          15          16          17          
18          19          20          21          22          23          
24          25          26

回答 9

一种简单的方法是遍历所有列,测量它们的宽度,为该最大宽度创建一个row_template,然后打印行。这并不是您要找的东西,因为在这种情况下,您首先必须将标题放入表格中,但是我认为这可能对其他人有用。

table = [
    ["", "Man Utd", "Man City", "T Hotspur"],
    ["Man Utd", 1, 0, 0],
    ["Man City", 1, 1, 0],
    ["T Hotspur", 0, 1, 2],
]
def print_table(table):
    longest_cols = [
        (max([len(str(row[i])) for row in table]) + 3)
        for i in range(len(table[0]))
    ]
    row_format = "".join(["{:>" + str(longest_col) + "}" for longest_col in longest_cols])
    for row in table:
        print(row_format.format(*row))

您可以这样使用它:

>>> print_table(table)

            Man Utd   Man City   T Hotspur
  Man Utd         1          0           0
 Man City         1          1           0
T Hotspur         0          1           2

A simple way to do this is to loop over all columns, measure their width, create a row_template for that max width, and then print the rows. It’s not exactly what you are looking for, because in this case, you first have to put your headings inside the table, but I’m thinking it might be useful to someone else.

table = [
    ["", "Man Utd", "Man City", "T Hotspur"],
    ["Man Utd", 1, 0, 0],
    ["Man City", 1, 1, 0],
    ["T Hotspur", 0, 1, 2],
]
def print_table(table):
    longest_cols = [
        (max([len(str(row[i])) for row in table]) + 3)
        for i in range(len(table[0]))
    ]
    row_format = "".join(["{:>" + str(longest_col) + "}" for longest_col in longest_cols])
    for row in table:
        print(row_format.format(*row))

You use it like this:

>>> print_table(table)

            Man Utd   Man City   T Hotspur
  Man Utd         1          0           0
 Man City         1          1           0
T Hotspur         0          1           2

回答 10

以下函数将使用Python 3(也可能是Python 2)创建请求的表(带或不带numpy)。我选择设置每列的宽度以匹配最长的团队名称的宽度。如果您想为每列使用团队名称的长度,则可以对其进行修改,但是会更加复杂。

注意:对于Python 2中的直接等效项,您可以zip使用izipitertools中的替换。

def print_results_table(data, teams_list):
    str_l = max(len(t) for t in teams_list)
    print(" ".join(['{:>{length}s}'.format(t, length = str_l) for t in [" "] + teams_list]))
    for t, row in zip(teams_list, data):
        print(" ".join(['{:>{length}s}'.format(str(x), length = str_l) for x in [t] + row]))

teams_list = ["Man Utd", "Man City", "T Hotspur"]
data = [[1, 2, 1],
        [0, 1, 0],
        [2, 4, 2]]

print_results_table(data, teams_list)

这将产生下表:

            Man Utd  Man City T Hotspur
  Man Utd         1         2         1
 Man City         0         1         0
T Hotspur         2         4         2

如果要使用垂直的行分隔符,则可以替换" ".join" | ".join

参考文献:

The following function will create the requested table (with or without numpy) with Python 3 (maybe also Python 2). I have chosen to set the width of each column to match that of the longest team name. You could modify it if you wanted to use the length of the team name for each column, but will be more complicated.

Note: For a direct equivalent in Python 2 you could replace the zip with izip from itertools.

def print_results_table(data, teams_list):
    str_l = max(len(t) for t in teams_list)
    print(" ".join(['{:>{length}s}'.format(t, length = str_l) for t in [" "] + teams_list]))
    for t, row in zip(teams_list, data):
        print(" ".join(['{:>{length}s}'.format(str(x), length = str_l) for x in [t] + row]))

teams_list = ["Man Utd", "Man City", "T Hotspur"]
data = [[1, 2, 1],
        [0, 1, 0],
        [2, 4, 2]]

print_results_table(data, teams_list)

This will produce the following table:

            Man Utd  Man City T Hotspur
  Man Utd         1         2         1
 Man City         0         1         0
T Hotspur         2         4         2

If you want to have vertical line separators, you can replace " ".join with " | ".join.

References:


回答 11

我会尝试遍历列表,并使用CSV格式程序来表示所需的数据。

您可以指定制表符,逗号或其他任何字符作为分隔符。

否则,只需遍历列表并在每个元素后打印“ \ t”

http://docs.python.org/library/csv.html

I would try to loop through the list and use a CSV formatter to represent the data you want.

You can specify tabs, commas, or any other char as the delimiter.

Otherwise, just loop through the list and print “\t” after each element

http://docs.python.org/library/csv.html


回答 12

我发现这只是在寻找一种输出简单列的方法。如果只需要没有麻烦的列,则可以使用以下方法:

print("Titlex\tTitley\tTitlez")
for x, y, z in data:
    print(x, "\t", y, "\t", z)

编辑:我试图尽可能简单,从而手动执行一些操作,而不是使用团队列表。概括一下OP的实际问题:

#Column headers
print("", end="\t")
for team in teams_list:
    print(" ", team, end="")
print()
# rows
for team, row in enumerate(data):
    teamlabel = teams_list[team]
    while len(teamlabel) < 9:
        teamlabel = " " + teamlabel
    print(teamlabel, end="\t")
    for entry in row:
        print(entry, end="\t")
    print()

          Man Utd  Man City  T Hotspur
  Man Utd       1       2       1   
 Man City       0       1       0   
T Hotspur       2       4       2   

但是,这似乎不再比其他答案更简单,可能的好处是它不需要更多的导入。但是@campkeith的答案已经满足了这一要求,并且更加健壮,因为它可以处理更广泛的标签长度​​。

I found this just looking for a way to output simple columns. If you just need no-fuss columns, then you can use this:

print("Titlex\tTitley\tTitlez")
for x, y, z in data:
    print(x, "\t", y, "\t", z)

EDIT: I was trying to be as simple as possible, and thereby did some things manually instead of using the teams list. To generalize to the OP’s actual question:

#Column headers
print("", end="\t")
for team in teams_list:
    print(" ", team, end="")
print()
# rows
for team, row in enumerate(data):
    teamlabel = teams_list[team]
    while len(teamlabel) < 9:
        teamlabel = " " + teamlabel
    print(teamlabel, end="\t")
    for entry in row:
        print(entry, end="\t")
    print()

Ouputs:

          Man Utd  Man City  T Hotspur
  Man Utd       1       2       1   
 Man City       0       1       0   
T Hotspur       2       4       2   

But this no longer seems any more simple than the other answers, with perhaps the benefit that it doesn’t require any more imports. But @campkeith’s answer already met that and is more robust as it can handle a wider variety of label lengths.


从另一个列表中删除一个列表中出现的所有元素

问题:从另一个列表中删除一个列表中出现的所有元素

假设我有两个列表,l1l2。我想表演l1 - l2,返回l1not中的所有元素l2

我可以想到一个幼稚的循环方法来执行此操作,但这实际上效率很低。什么是Python高效的方法?

例如,如果我有l1 = [1,2,6,8] and l2 = [2,3,5,8]l1 - l2应返回[1,6]

Let’s say I have two lists, l1 and l2. I want to perform l1 - l2, which returns all elements of l1 not in l2.

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2 should return [1,6]


回答 0

Python具有称为List Comprehensions的语言功能,非常适合使这种事情变得非常容易。以下语句完全满足您的要求,并将结果存储在l3

l3 = [x for x in l1 if x not in l2]

l3将包含[1, 6]

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

l3 = [x for x in l1 if x not in l2]

l3 will contain [1, 6].


回答 1

一种方法是使用集合:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

One way is to use sets:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

回答 2

或者,您也可以filter其与lambda表达式配合使用以获取所需的结果。例如:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

性能比较

在这里,我正在比较此处提到的所有答案的效果。不出所料,Arkku的 set运营速度最快。

  • Arkku的设置差异 -第一次(每个循环0.124 微秒

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
  • 带有set查找的Daniel Pryden的列表理解 -第二(每个循环0.302 微秒

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
  • 普通列表上的甜甜圈列表理解 -第三(每个循环0.552微秒)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
  • Moinuddin Quadri的使用filter -第四(每个循环0.972微秒)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
  • Akshay Hazari’s组合使用reduce+filter -第五(每个循环3.97usec)

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop

PS: set不维持顺序,并从列表中删除重复的元素。因此,如果您需要使用任何设置差异,请不要使用。

As an alternative, you may also use filter with the lambda expression to get the desired result. For example:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

Performance Comparison

Here I am comparing the performance of all the answers mentioned here. As expected, Arkku’s set based operation is fastest.

  • Arkku’s Set Difference – First (0.124 usec per loop)

    mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
    10000000 loops, best of 3: 0.124 usec per loop
    
  • Daniel Pryden’s List Comprehension with set lookup – Second (0.302 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.302 usec per loop
    
  • Donut’s List Comprehension on plain list – Third (0.552 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
    1000000 loops, best of 3: 0.552 usec per loop
    
  • Moinuddin Quadri’s using filter – Fourth (0.972 usec per loop)

    mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
    1000000 loops, best of 3: 0.972 usec per loop
    
  • Akshay Hazari’s using combination of reduce + filter – Fifth (3.97 usec per loop)

    mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
    100000 loops, best of 3: 3.97 usec per loop
    

PS: set does not maintain the order and removes the duplicate elements from the list. Hence, do not use set difference if you need any of these.


回答 3

在此处扩展Donut的答案和其他答案,通过使用生成器理解而不是列表理解,以及通过使用set数据结构,您可以获得甚至更好的结果(因为in运算符在列表中为O(n)但O(1)在一组上)。

所以这是一个适合您的函数:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

结果将是可迭代的,将延迟获取已过滤列表。如果您需要一个真实的列表对象(例如,如果需要对len()结果进行操作),则可以轻松构建一个列表,如下所示:

filtered_list = list(filter_list(full_list, excludes))

Expanding on Donut’s answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set data structure (since the in operator is O(n) on a list but O(1) on a set).

So here’s a function that would work for you:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len() on the result), then you can easily build a list like so:

filtered_list = list(filter_list(full_list, excludes))

回答 4

使用Python设置类型。那将是最Python的。:)

另外,由于它是本机的,因此它也应该是最优化的方法。

看到:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm(适用于较旧的python)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

Use the Python set type. That would be the most Pythonic. :)

Also, since it’s native, it should be the most optimized method too.

See:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm (for older python)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2

回答 5

使用 Set Comprehensions {x in l2中的x}或set(l2)进行设置,然后使用List Comprehensions获取列表

l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]

基准测试代码:

import time

l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))

l2set = {x for x in l2}

tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)

tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)

print("speedup %fx"%(difflist/diffset))

基准测试结果:

0.0015058517456054688
3.968189239501953
speedup 2635.179227x    

use Set Comprehensions {x for x in l2} or set(l2) to get set, then use List Comprehensions to get list

l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]

benchmark test code:

import time

l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))

l2set = {x for x in l2}

tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)

tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)

print("speedup %fx"%(difflist/diffset))

benchmark test result:

0.0015058517456054688
3.968189239501953
speedup 2635.179227x    

回答 6

替代解决方案:

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])

Alternate Solution :

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])

如何找到所有出现的子串?

问题:如何找到所有出现的子串?

Python具有string.find()string.rfind()获取字符串中子字符串的索引。

我想知道是否有类似的东西string.find_all()可以返回所有找到的索引(不仅是开头的第一个,还是结尾的第一个)。

例如:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

Python has string.find() and string.rfind() to get the index of a substring in a string.

I’m wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

回答 0

没有简单的内置字符串函数可以满足您的需求,但是您可以使用功能更强大的正则表达式

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果要查找重叠的匹配项,先行搜索将做到:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果您想要一个没有重叠的反向查找全部,则可以将正向和负向超前组合成这样的表达式:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回一个generator,所以您可以更改[]上述内容以()获取一个Generator而不是一个列表,如果只迭代一次结果,则列表会更有效。

There is no simple built-in string function that does what you’re looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you’re only iterating through the results once.


回答 1

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

因此,我们可以自己构建它:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要临时字符串或正则表达式。

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.


回答 2

这是一种获取所有(甚至重叠)匹配项的方法(效率很低):

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Here’s a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

回答 3

同样,旧线程,但这是我使用生成器和plain的解决方案str.find

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

退货

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Again, old thread, but here’s my solution using a generator and plain str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

回答 4

您可以将其re.finditer()用于非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

不适用于:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won’t work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

回答 5

来吧,让我们一起递归。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

这样就不需要正则表达式。

Come, let us recurse together.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.


回答 6

如果您只是寻找一个字符,这将起作用:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

也,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的直觉是,这些(尤其是第二名)都没有表现出色。

If you’re just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.


回答 7

这是一个老话题,但是我很感兴趣,想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

它应该返回找到子字符串的位置列表。如果您发现错误或需要改进的地方,请发表评论。

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.


回答 8

这使用re.finditer对我有用

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

回答 9

这个线程有点旧,但是对我有用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

回答 10

你可以试试 :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

回答 11

无论其他人提供的解决方案完全基于可用的方法find()或任何可用的方法。

查找字符串中所有子字符串出现的核心基本算法是什么?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

您也可以将str类继承到新类,并可以在下面使用此函数。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

调用方法

newstr.find_all(’您觉得这个答案有用吗?然后投票!’,’this’)

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

newstr.find_all(‘Do you find this answer helpful? then upvote this!’,’this’)


回答 12

此函数不会查看字符串内部的所有位置,也不会浪费计算资源。我的尝试:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

使用它的方式是这样的:

result=findAll('this word is a big word man how many words are there?','word')

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')

回答 13

在文档中查找大量关键字时,请使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

在大量搜索词中,Flashtext的运行速度比正则表达式快。

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words.


回答 14

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)
src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

回答 15

这是来自hackerrank的类似问题的解决方案。希望对您有所帮助。

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

输出:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

回答 16

通过切片,我们找到了所有可能的组合,并将它们附加在列表中,并使用count函数查找了发生的次数

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

回答 17

请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

please look at below code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

回答 18

pythonic的方式是:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

回答 19

您可以轻松使用:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

干杯!


使用range()以相反的顺序打印列表?

问题:使用range()以相反的顺序打印列表?

如何range()在Python中生成以下列表?

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

How can you produce the following list with range() in Python?

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

回答 0

使用reversed()功能:

reversed(range(10))

这更有意义。

更新:

如果您希望将其作为列表(如btk所指出):

list(reversed(range(10)))

更新:

如果只想使用range以达到相同的结果,则可以使用其所有参数。range(start, stop, step)

例如,要生成一个list [5,4,3,2,1,0],可以使用以下命令:

range(5, -1, -1)

它可能不那么直观,但是正如评论所提到的那样,这效率更高,并且正确使用范围用于反向列表。

use reversed() function:

reversed(range(10))

It’s much more meaningful.

Update:

If you want it to be a list (as btk pointed out):

list(reversed(range(10)))

Update:

If you want to use only range to achieve the same result, you can use all its parameters. range(start, stop, step)

For example, to generate a list [5,4,3,2,1,0], you can use the following:

range(5, -1, -1)

It may be less intuitive but as the comments mention, this is more efficient and the right usage of range for reversed list.


回答 1

使用“范围”内置功能。签名是range(start, stop, step)。这样会产生一个序列,该序列产生的数字以开头start,如果stop已经达到,则以结束,不包括stop

>>> range(9,-1,-1)   
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>> range(-2, 6, 2)
    [-2, 0, 2, 4]

在Python 3中,这会产生一个非列表range对象,该对象的作用类似于只读列表(但使用的内存较少,特别是大范围内存)。

Use the ‘range’ built-in function. The signature is range(start, stop, step). This produces a sequence that yields numbers, starting with start, and ending if stop has been reached, excluding stop.

>>> range(9,-1,-1)   
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>> range(-2, 6, 2)
    [-2, 0, 2, 4]

In Python 3, this produces a non-list range object, which functions effectively like a read-only list (but uses way less memory, particularly for large ranges).


回答 2

您可以使用与Python range(10)[::-1]相同的东西,range(9, -1, -1)并且可以说更具可读性(如果您熟悉通用的sequence[::-1]Python习惯用法)。

You could userange(10)[::-1]which is the same thing asrange(9, -1, -1)and arguably more readable (if you’re familiar with the commonsequence[::-1]Python idiom).


回答 3

对于那些对迄今收集到的选择的“效率”感兴趣的人…

Jaime RGP的回答使我在按照我自己的建议(通过评论)从字面上看Jason的 “具有挑战性”的解决方案计时重新启动计算机。为了使您免于停机的好奇心,我在这里介绍我的结果(最差优先):

杰森Jason)的答案(也许只是列表理解能力的偏移):

$ python -m timeit "[9-i for i in range(10)]"
1000000 loops, best of 3: 1.54 usec per loop

martineau的答案(如果您熟悉扩展切片语法,则可以阅读):

$ python -m timeit "range(10)[::-1]"
1000000 loops, best of 3: 0.743 usec per loop

MichałŠrajer的答案(公认的答案,非常可读):

$ python -m timeit "reversed(range(10))"
1000000 loops, best of 3: 0.538 usec per loop

bene的回答(第一个,但当时很粗略):

$ python -m timeit "range(9,-1,-1)"
1000000 loops, best of 3: 0.401 usec per loop

使用Val Neekmanrange(n-1,-1,-1)记法很容易记住最后一个选项。

For those who are interested in the “efficiency” of the options collected so far…

Jaime RGP’s answer led me to restart my computer after timing the somewhat “challenging” solution of Jason literally following my own suggestion (via comment). To spare the curious of you the downtime, I present here my results (worst-first):

Jason’s answer (maybe just an excursion into the power of list comprehension):

$ python -m timeit "[9-i for i in range(10)]"
1000000 loops, best of 3: 1.54 usec per loop

martineau’s answer (readable if you are familiar with the extended slices syntax):

$ python -m timeit "range(10)[::-1]"
1000000 loops, best of 3: 0.743 usec per loop

Michał Šrajer’s answer (the accepted one, very readable):

$ python -m timeit "reversed(range(10))"
1000000 loops, best of 3: 0.538 usec per loop

bene’s answer (the very first, but very sketchy at that time):

$ python -m timeit "range(9,-1,-1)"
1000000 loops, best of 3: 0.401 usec per loop

The last option is easy to remember using the range(n-1,-1,-1) notation by Val Neekman.


回答 4

for i in range(8, 0, -1)

将解决这个问题。它将输出8到1,并且-1表示反向列表

for i in range(8, 0, -1)

will solve this problem. It will output 8 to 1, and -1 means a reversed list


回答 5

没有意义,reverse因为range方法可以返回反向列表。

当您对n个项目进行迭代并且想要替换返回的列表的顺序时,range(start, stop, step)必须使用range的第三个参数来标识step并将其设置为-1,其他参数应相应地进行调整:

  1. 提供一站式参数为-1(这是以前的价值stop - 1stop等于0)。
  2. 作为开始参数使用n-1

因此,等效的range(n)相反:

n = 10
print range(n-1,-1,-1) 
#[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

No sense to use reverse because the range method can return reversed list.

When you have iteration over n items and want to replace order of list returned by range(start, stop, step) you have to use third parameter of range which identifies step and set it to -1, other parameters shall be adjusted accordingly:

  1. Provide stop parameter as -1(it’s previous value of stop - 1, stop was equal to 0).
  2. As start parameter use n-1.

So equivalent of range(n) in reverse order would be:

n = 10
print range(n-1,-1,-1) 
#[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

回答 6

除了可读性,reversed(range(n))似乎要比快range(n)[::-1]

$ python -m timeit "reversed(range(1000000000))"
1000000 loops, best of 3: 0.598 usec per loop
$ python -m timeit "range(1000000000)[::-1]"
1000000 loops, best of 3: 0.945 usec per loop

就好像有人在想:)

Readibility aside, reversed(range(n)) seems to be faster than range(n)[::-1].

$ python -m timeit "reversed(range(1000000000))"
1000000 loops, best of 3: 0.598 usec per loop
$ python -m timeit "range(1000000000)[::-1]"
1000000 loops, best of 3: 0.945 usec per loop

Just if anyone was wondering :)


回答 7

此问题中的要求要求list按降序排列大小为10的整数a 。因此,让我们在python中生成一个列表。

# This meets the requirement.
# But it is a bit harder to wrap one's head around this. right?
>>> range(10-1, -1, -1)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

# let's find something that is a bit more self-explanatory. Sounds good?
# ----------------------------------------------------

# This returns a list in ascending order.
# Opposite of what the requirement called for.
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# This returns an iterator in descending order.
# Doesn't meet the requirement as it is not a list.
>>> reversed(range(10))
<listreverseiterator object at 0x10e14e090>

# This returns a list in descending order and meets the requirement
>>> list(reversed(range(10)))
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

The requirement in this question calls for a list of integers of size 10 in descending order. So, let’s produce a list in python.

# This meets the requirement.
# But it is a bit harder to wrap one's head around this. right?
>>> range(10-1, -1, -1)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

# let's find something that is a bit more self-explanatory. Sounds good?
# ----------------------------------------------------

# This returns a list in ascending order.
# Opposite of what the requirement called for.
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# This returns an iterator in descending order.
# Doesn't meet the requirement as it is not a list.
>>> reversed(range(10))
<listreverseiterator object at 0x10e14e090>

# This returns a list in descending order and meets the requirement
>>> list(reversed(range(10)))
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

回答 8

您可以使用range()BIF Like来打印反向数字,

for number in range ( 10 , 0 , -1 ) :
    print ( number ) 

输出将是[10,9,8,7,6,5,4,3,2,1]

range()-范围(start,end,increment / decrement),其中start是包含在内的,end是互斥的,而增量可以是任何数字,其行为类似于step

You can do printing of reverse numbers with range() BIF Like ,

for number in range ( 10 , 0 , -1 ) :
    print ( number ) 

Output will be [10,9,8,7,6,5,4,3,2,1]

range() – range ( start , end , increment/decrement ) where start is inclusive , end is exclusive and increment can be any numbers and behaves like step


回答 9

经常问到的问题是否range(9, -1, -1)reversed(range(10))Python 3 更好?使用迭代器使用其他语言的人会立即想到,reversed()必须缓存所有值,然后以相反的顺序返回。问题是,reversed()如果对象只是一个迭代器,Python的运算符将不起作用。该对象必须具有以下两项之一才能使reversed()起作用:

  1. 支持len()和整数索引通过[]
  2. __reversed__()实施方法。

如果您尝试对以上均不使用的对象使用reversed(),则会得到:

>>> [reversed((x for x in range(10)))]
TypeError: 'generator' object is not reversible

简而言之,Python reversed()仅用于类似对象的数组,因此它应具有与正向迭代相同的性能。

但是呢range()?那不是生成器吗?在Python 3中,它是生成器,但包装在同时实现以上两者的类中。因此,range(100000)它不会占用大量内存,但仍支持高效的索引编制和反转。

因此,总而言之,您可以在reversed(range(10))不影响性能的情况下使用它。

Very often asked question is whether range(9, -1, -1) better than reversed(range(10)) in Python 3? People who have worked in other languages with iterators immediately tend to think that reversed() must cache all values and then return in reverse order. Thing is that Python’s reversed() operator doesn’t work if the object is just an iterator. The object must have one of below two for reversed() to work:

  1. Either support len() and integer indexes via []
  2. Or have __reversed__() method implemented.

If you try to use reversed() on object that has none of above then you will get:

>>> [reversed((x for x in range(10)))]
TypeError: 'generator' object is not reversible

So in short, Python’s reversed() is only meant on array like objects and so it should have same performance as forward iteration.

But what about range()? Isn’t that a generator? In Python 3 it is generator but wrapped in a class that implements both of above. So range(100000) doesn’t take up lot of memory but it still supports efficient indexing and reversing.

So in summary, you can use reversed(range(10)) without any hit on performance.


回答 10

我相信这会有所帮助,

range(5)[::-1]

下面是用法:

for i in range(5)[::-1]:
    print i 

i believe this can help,

range(5)[::-1]

below is Usage:

for i in range(5)[::-1]:
    print i 

回答 11

range(9,-1,-1)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
range(9,-1,-1)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

回答 12

不带[::-1]或反向使用-

def reverse(text):
    result = []
    for index in range(len(text)-1,-1,-1):
        c = text[index]
        result.append(c)
    return ''.join(result)

print reverse("python!")

Using without [::-1] or reversed –

def reverse(text):
    result = []
    for index in range(len(text)-1,-1,-1):
        c = text[index]
        result.append(c)
    return ''.join(result)

print reverse("python!")

回答 13

[9-i for i in range(10)]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[9-i for i in range(10)]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

回答 14

您不一定需要使用range函数,只需执行list [::-1]即可,它应该以相反的顺序快速返回列表,而无需使用任何添加。

You don’t necessarily need to use the range function, you can simply do list[::-1] which should return the list in reversed order swiftly, without using any additions.


回答 15

假设您有一个名为a = {1,2,3,4,5}的列表,现在,如果您要反向打印该列表,则只需使用以下代码。

a.reverse
for i in a:
   print(i)

我知道您问使用范围,但它已经回答。

Suppose you have a list call it a={1,2,3,4,5} Now if you want to print the list in reverse then simply use the following code.

a.reverse
for i in a:
   print(i)

I know you asked using range but its already answered.


回答 16

range(9,-1,-1)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

是正确的形式。如果您使用

reversed(range(10))

您不会得到0的情况。例如,假设您的10不是一个神奇的数字,而用于查找的变量是从反向开始的。如果您的n大小写为0,则将不会执行reversed(range(0)),如果您偶然在零索引中包含单个对象,那么将是错误的。

range(9,-1,-1)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Is the correct form. If you use

reversed(range(10))

you wont get a 0 case. For instance, say your 10 isn’t a magic number and a variable you’re using to lookup start from reverse. If your n case is 0, reversed(range(0)) will not execute which is wrong if you by chance have a single object in the zero index.


回答 17

我认为,许多人(作为我自己)可能对按反向顺序遍历现有列表的常见情况感兴趣,而不是如标题中所述,而不仅仅是为此类遍历生成索引。

即使对于这种情况,所有正确的答案仍然是完全正确的,但我想指出的是,在Wolf的答案中所做的性能比较仅用于生成索引。因此,我为反向遍历现有列表做了类似的基准测试。

TL; DR a[::-1]是最快的。

先决条件:

a = list(range(10))

杰森的答案

%timeit [a[9-i] for i in range(10)]
1.27 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

martineau的答案

%timeit a[::-1]
135 ns ± 4.07 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

MichałŠrajer的答案

%timeit list(reversed(a))
374 ns ± 9.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

贝恩的答案

%timeit [a[i] for i in range(9, -1, -1)]
1.09 µs ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

如您所见,在这种情况下,无需显式生成索引,因此最快的方法是减少额外操作的方法。

注意:我在JupyterLab中进行了测试,它具有方便的“魔术命令” %timeit。它timeit.timeit在引擎盖下使用标准。经过Python 3.7.3测试

I thought that many (as myself) could be more interested in a common case of traversing an existing list in reversed order instead, as it’s stated in the title, rather than just generating indices for such traversal.

Even though, all the right answers are still perfectly fine for this case, I want to point out that the performance comparison done in Wolf’s answer is for generating indices only. So I’ve made similar benchmark for traversing an existing list in reversed order.

TL;DR a[::-1] is the fastest.

Prerequisites:

a = list(range(10))

Jason’s answer:

%timeit [a[9-i] for i in range(10)]
1.27 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

martineau’s answer:

%timeit a[::-1]
135 ns ± 4.07 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Michał Šrajer’s answer:

%timeit list(reversed(a))
374 ns ± 9.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

bene’s answer:

%timeit [a[i] for i in range(9, -1, -1)]
1.09 µs ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

As you see, in this case there’s no need to explicitly generate indices, so the fastest method is the one that makes less extra actions.

NB: I tested in JupyterLab which has handy “magic command” %timeit. It uses standard timeit.timeit under the hood. Tested for Python 3.7.3


网站可以检测到何时在chromedriver中使用硒吗?

问题:网站可以检测到何时在chromedriver中使用硒吗?

我一直在使用Chromedriver测试Selenium,但我注意到有些页面可以检测到您正在使用Selenium,即使根本没有自动化。即使当我只是通过Selenium和Xephyr使用chrome手动浏览时,我也经常得到一个页面,指出检测到可疑活动。我已经检查了用户代理和浏览器指纹,它们与普通的chrome浏览器完全相同。

当我以普通的chrome浏览到这些站点时,一切正常,但是当我使用Selenium时,我被检测到。

从理论上讲,chromedriver和chrome在任何Web服务器上看起来都应该完全相同,但是它们可以通过某种方式检测到它。

如果您想要一些测试代码,请尝试以下方法:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

如果浏览stubhub,您将在一个或两个请求中被重定向和“阻止”。我一直在对此进行调查,无法弄清楚他们如何分辨用户正在使用Selenium。

他们是怎么做到的呢?

编辑更新:

我在Firefox中安装了Selenium IDE插件,当我在普通的Firefox浏览器中仅使用附加插件访问stubhub.com时就被禁止了。

编辑:

当我使用Fiddler来回查看HTTP请求时,我注意到“假浏览器”的请求通常在响应标头中具有“ no-cache”。

编辑:

像这样的结果是否有办法从Javascript检测到我在Selenium Webdriver页面中,这表明应该没有办法检测何时使用Webdriver。但这证据表明并非如此。

编辑:

该站点将指纹上载到他们的服务器,但是我检查了一下,硒的指纹与使用chrome时的指纹相同。

编辑:

这是它们发送到服务器的指纹有效载荷之一

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

硒和铬相同

编辑:

VPN只能使用一次,但是在加载第一页后会被检测到。显然,正在运行一些JavaScript来检测Selenium。

I’ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you’re using Selenium even though there’s no automation at all. Even when I’m just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I’ve checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I’m detected.

In theory chromedriver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some testcode try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you’ll get redirected and ‘blocked’ within one or two requests. I’ve been investigating this and I can’t figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I’ve noticed that the ‘fake browser\’s’ requests often have ‘no-cache’ in the response header.

EDIT:

results like this Is there a way to detect that I’m in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

Its identical in selenium and in chrome

EDIT:

VPNs work for a single use but get detected after I load the first page. Clearly some javascript is being run to detect Selenium.


回答 0

对于Mac用户

cdc_使用Vim或Perl 替换变量

您可以使用vim,或如@Vic Seedoubleyew在@ Erti-Chris Eelmaa的答案中指出的那样perl,替换中的cdc_变量chromedriver请参阅@ Erti-Chris Eelmaa的帖子以了解有关该变量的更多信息)。使用vimperl防止您不得不重新编译源代码或使用十六进制编辑器。chromedriver在尝试编辑原件之前,请确保对其进行复印。另外,以下方法也在上进行了测试chromedriver version 2.41.578706


使用Vim

vim /path/to/chromedriver

在上面的代码行之后,您可能会看到一堆乱码。请执行下列操作:

  1. cdc_通过键入/cdc_并按进行搜索return
  2. 按启用编辑a
  3. 删除任意数量的,$cdc_lasutopfhvcZLmcfl然后用相等数量的字符替换删除的内容。如果您不这样做,chromedriver将会失败。
  4. 编辑完成后,按esc
  5. 要保存更改并退出,请键入:wq!并按return
  6. 如果您不想保存更改,但要退出,请键入:q!并按return
  7. 你完成了。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


使用Perl

下面的行替换cdc_dog_

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

确保替换字符串的字符数与搜索字符串的字符数相同,否则chromedriver将失败。

Perl说明

s///g 表示您要搜索一个字符串并将其全局替换为另一个字符串(替换所有出现的字符串)。

例如, s/string/replacment/g

所以,

s/// 表示搜索并替换字符串。

cdc_ 是搜索字符串。

dog_ 是替换字符串。

g 是全局键,它将替换每次出现的字符串。

如何检查Perl替代品是否有效

以下行将打印每次出现的搜索字符串cdc_

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

如果没有返回任何内容,cdc_则已被替换。

相反,您可以使用以下代码:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

查看替换字符串,dog_现在是否在chromedriver二进制文件中。如果是这样,替换字符串将被打印到控制台。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


包起来

更改chromedriver二进制文件后,请确保更改后的二进制文件的名称chromedriverchromedriver,并且原始二进制文件已从其原始位置移动或重命名。


我对这种方法的经验

以前,我在尝试登录时在网站上被检测到我,但是用cdc_相同大小的字符串替换后,我得以登录。但是就像其他人所说的那样,如果已经被检测到,则可能会被阻止即使使用此方法后,还有其他原因。因此,您可能必须尝试使用​​VPN,其他网络或具有什么功能的站点访问检测到您的站点。

For Mac Users

Replacing cdc_ variable using Vim or Perl

You can use vim, or as @Vic Seedoubleyew has pointed out in the answer by @Erti-Chris Eelmaa, perl, to replace the cdc_ variable in chromedriver(See post by @Erti-Chris Eelmaa to learn more about that variable). Using vim or perl prevents you from having to recompile source code or use a hex-editor. Make sure to make a copy of the original chromedriver before attempting to edit it. Also, the methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim /path/to/chromedriver

After running the line above, you’ll probably see a bunch of gibberish. Do the following:

  1. Search for cdc_ by typing /cdc_ and pressing return.
  2. Enable editing by pressing a.
  3. Delete any amount of $cdc_lasutopfhvcZLmcfl and replace what was deleted with an equal amount characters. If you don’t, chromedriver will fail.
  4. After you’re done editing, press esc.
  5. To save the changes and quit, type :wq! and press return.
  6. If you don’t want to save the changes, but you want to quit, type :q! and press return.
  7. You’re done.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Using Perl

The line below replaces cdc_ with dog_:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string has the same number of characters as the search string, otherwise the chromedriver will fail.

Perl Explanation

s///g denotes that you want to search for a string and replace it globally with another string (replaces all occurrences).

e.g., s/string/replacment/g

So,

s/// denotes searching for and replacing a string.

cdc_ is the search string.

dog_ is the replacement string.

g is the global key, which replaces every occurrence of the string.

How to check if the Perl replacement worked

The following line will print every occurrence of the search string cdc_:

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

If this returns nothing, then cdc_ has been replaced.

Conversely, you can use the this:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

to see if your replacement string, dog_, is now in the chromedriver binary. If it is, the replacement string will be printed to the console.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Wrapping Up

After altering the chromedriver binary, make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you’ve already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, or what have you.


回答 1

基本上,硒检测的工作方式是,它们检测与selenium一起运行时出现的预定义javascript变量。僵尸程序检测脚本通常会在任何变量中(在窗口对象上)查找包含单词“ selenium” /“ webdriver”的内容,并记录名为$cdc_和的变量$wdc_。当然,所有这些取决于您所使用的浏览器。所有不同的浏览器都公开不同的内容。

对我来说,我使用了chrome,所以,要做的就是确保$cdc_不再存在作为文档变量,然后瞧瞧(下载chromedriver源代码,修改chromedriver并$cdc_以不同的名称重新编译。)

这是我在chromedriver中修改的功能:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(注意评论,我所做的我转过身$cdc_randomblabla_

这是一个伪代码,演示了僵尸网络可能使用的一些技术:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

根据用户@szx,也可以在十六进制编辑器中简单地打开chromedriver.exe,然后手动进行替换,而无需进行任何编译。

Basically the way the selenium detection works, is that they test for pre-defined javascript variables which appear when running with selenium. The bot detection scripts usually look anything containing word “selenium” / “webdriver” in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn’t exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

this is the function I modified in chromedriver:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(note the comment, all I did I turned $cdc_ to randomblabla_.

Here is a pseudo-code which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

according to user @szx, it is also possible to simply open chromedriver.exe in hex editor, and just do the replacement manually, without actually doing any compiling.


回答 2

正如我们已经在问题和发布的答案中弄清楚的那样,这里有一个反Web 爬网和一个名为“ Distil Networks”的Bot检测服务。而且,根据公司首席执行官的采访

即使他们可以创建新的机器人,我们还是想出了一种方法来识别Selenium,即他们正在使用的工具,因此,无论Selenium在该机器人上迭代多少次,我们都将阻止它。我们现在使用Python和许多不同的技术来做到这一点。一旦我们发现一种类型的漫游器出现了某种模式,那么我们就会对他们使用的技术进行反向工程并将其识别为恶意软件。

要了解它们如何精确地检测硒,需要时间和其他挑战,但是目前我们可以肯定地说些什么:

  • 它与您对硒采取的措施无关-一旦导航到该站点,便会立即被发现并被禁止。我尝试在动作之间添加人为的随机延迟,在页面加载后暂停-没有任何帮助
  • 这也不是关于浏览器指纹的-在具有干净配置文件而不是隐身模式的多个浏览器中尝试过-没有任何帮助
  • 因为根据采访中的提示,这是“逆向工程”,所以我怀疑这是通过在浏览器中执行一些JS代码完成的,这表明这是通过Selenium Webdriver自动化的浏览器

决定将其发布为答案,因为显然:

网站可以检测到何时在chromedriver中使用硒吗?

是。


另外,我还没有尝试过使用较旧的硒和较旧的浏览器版本-从理论上讲,Distil Networks僵尸检测程序当前依赖于某个特定点,硒中可能实现或添加了某些东西。然后,如果是这种情况,我们可能会检测到(是的,让我们检测检测器)在哪个点/版本上进行了相关更改,调查变更日志和变更集,并且可能会为我们提供有关在哪里查看的更多信息。以及它们用于检测由Webdriver驱动的浏览器的功能。这只是一个需要检验的理论。

As we’ve already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called “Distil Networks” in play here. And, according to the company CEO’s interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It’ll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it’s not related to the actions you take with selenium – once you navigate to the site, you get immediately detected and banned. I’ve tried to add artificial random delays between actions, take a pause after the page is loaded – nothing helped
  • it’s not about browser fingerprint either – tried it in multiple browsers with clean profiles and not, incognito modes – nothing helped
  • since, according to the hint in the interview, this was “reverse engineering”, I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven’t experimented with is older selenium and older browser versions – in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let’s detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It’s just a theory that needs to be tested.


回答 3

在wellsfargo.com上如何实施的示例:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

Example of how it’s implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

回答 4

混淆JavaScript结果

我已经检查了chromedriver源代码。这会将一些javascript文件注入浏览器。
此链接上的每个javascript文件都会注入到以下网页: https : //chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

因此,我使用了逆向工程,并通过十六进制编辑来模糊化js文件。现在,我确定不再使用JavaScript变量,函数名称和固定字符串来发现硒的活动。但是仍然有些站点和reCaptcha可以检测到硒!
也许他们检查由chromedriver js执行引起的修改:)


编辑1:

Chrome“导航器”参数修改

我发现“导航器”中有一些参数可以简要介绍chromedriver的使用。这些是参数:

  • “ navigator.webdriver”在非自动模式下为’undefined’。在自动模式下,它是“ true”。
  • “ navigator.plugins”在无头chrome上的长度为0。因此,我添加了一些假元素来欺骗插件长度检查过程。
  • navigator.languages”设置为默认镶边值'[“ en-US”,“ en”,“ es”]’。

因此,我需要一个Chrome扩展程序来在网页上运行javascript。我使用本文提供的js代码进行了扩展,并使用另一篇文章将压缩扩展添加到我的项目中。我已经成功更改了值;但是仍然没有改变!

我没有找到其他像这样的变量,但这并不意味着它们不存在。reCaptcha仍然检测到chromedriver,因此应该有更多变量要更改。在下一步应的检测服务,逆向工程,我不想做的事。

现在,我不确定是否值得在此自动化过程上花费更多时间或寻找替代方法!

Obfuscating JavaScripts result

I have checked the chromedriver source code. That injects some javascript files to the browser.
Every javascript file on this link is injected to the web pages: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

So I used reverse engineering and obfuscated the js files by Hex editing. Now i was sure that no more javascript variable, function names and fixed strings were used to uncover selenium activity. But still some sites and reCaptcha detect selenium!
Maybe they check the modifications that are caused by chromedriver js execution :)


Edit 1:

Chrome ‘navigator’ parameters modification

I discovered there are some parameters in ‘navigator’ that briefly uncover using of chromedriver. These are the parameters:

  • “navigator.webdriver” On non-automated mode it is ‘undefined’. On automated mode it’s ‘true’.
  • “navigator.plugins” On headless chrome has 0 length. So I added some fake elements to fool the plugin length checking process.
  • navigator.languages” was set to default chrome value ‘[“en-US”, “en”, “es”]’ .

So what i needed was a chrome extension to run javascript on the web pages. I made an extension with the js code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; But still nothing changed!

I didn’t find other variables like these but it doesn’t mean that they don’t exist. Still reCaptcha detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that i don’t want to do.

Now I’m not sure does it worth to spend more time on this automation process or search for alternative methods!


回答 5

尝试将selenium与chrome的特定用户配置文件一起使用,以这种方式,您可以将其用作特定用户并定义所需的任何内容。这样做时,它将以“实际”用户身份运行,请使用一些进程浏览器查看chrome进程。您会看到标签的区别。

例如:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome标签列表在这里

Try to use selenium with a specific user profile of chrome, That way you can use it as specific user and define any thing you want, When doing so it will run as a ‘real’ user, look at chrome process with some process explorer and you’ll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome tag list here


回答 6

partial interface Navigator { readonly attribute boolean webdriver; };

Navigator界面的webdriver IDL属性必须返回webdriver-active标志的值,该标志最初为false。

此属性使网站可以确定用户代理受WebDriver的控制,并且可以用于帮助减轻拒绝服务攻击。

直接取自2017年W3C编辑的WebDriver草案。这在很大程度上意味着,至少可以确定硒驱动程序的未来迭代,以防止滥用。最终,如果没有源代码,很难说出到底是什么导致chrome驱动程序可检测到。

partial interface Navigator { readonly attribute boolean webdriver; };

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor’s Draft of WebDriver. This heavily implies that at the very least, future iterations of selenium’s drivers will be identifiable to prevent misuse. Ultimately, it’s hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.


回答 7

据说window.navigator.webdriver === true如果使用webdriver 会设置Firefox 。这是根据较早的规范之一(例如:archive.org)得出的,但是我在新的规范中找不到它,除了附录中一些非常模糊的措词。

对它的测试是在文件fingerprint_test.js中的硒代码中,其末尾的注释显示“当前仅在firefox中实现”,但是我无法通过一些简单的grep方式识别出该方向上的任何代码,在当前(41.0.2)Firefox发行树或Chromium树中。

从2015年1月起,我还发现了有关firefox驱动程序b82512999938中有关指纹的较早提交的评论。Selenium GIT-master仍在昨天下载的Selenium GIT-master中javascript/firefox-driver/extension/content/server.js添加了注释,该注释链接到当前w3c Webdriver规范中措辞略有不同的附录。

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn’t find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says “Currently only implemented in firefox” but I wasn’t able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.


回答 8

除了@ Erti-Chris Eelmaa的出色答案-令人讨厌window.navigator.webdriver,它是只读的。如果将其值更改为它的事件false仍然会存在true。因此,仍然可以检测到由自动化软件驱动的浏览器。 MDN

该变量由--enable-automationchrome中的标志管理。chromedriver使用该标志启动chrome并将chrome设置window.navigator.webdrivertrue。你可以在这里找到它。您需要将标记添加到“排除开关”中。例如(golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

Additionally to the great answer of @Erti-Chris Eelmaa – there’s annoying window.navigator.webdriver and it is read-only. Event if you change the value of it to false it will still have true. Thats why the browser driven by automated software can still be detected. MDN

The variable is managed by the flag --enable-automation in chrome. The chromedriver launches chrome with that flag and chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to “exclude switches” the flag. For instance (golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

回答 9

听起来好像它们在Web应用程序防火墙后面。看一下modsecurity和owasp,看看它们是如何工作的。实际上,您要问的是如何进行漫游器检测规避。这不是Selenium Web驱动程序的用途。它用于测试您的Web应用程序,而不打其他Web应用程序。有可能,但基本上,您必须查看WAF在其规则集中查找的内容,并且如果可以的话,特别要避免使用硒。即使那样,它仍然可能无法正常工作,因为您不知道他们在使用什么WAF。您做了正确的第一步,就是伪造用户代理。如果仍然不能解决问题,那么WAF已经到位,您可能需要变得更加棘手。

编辑:点取自其他答案。确保首先正确设置了用户代理。可能是它撞到了本地Web服务器,还是嗅探了流量。

It sounds like they are behind a web application firewall. Take a look at modsecurity and owasp to see how those work. In reality, what you are asking is how to do bot detection evasion. That is not what selenium web driver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you’d have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don’t know what WAF they are using. You did the right first step, that is faking the user agent. If that didn’t work though, then a WAF is in place and you probably need to get more tricky.

Edit: Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.


回答 10

即使您发送了所有正确的数据(例如,Selenium并未显示为扩展名,您也具有合理的分辨率/位深度&c),但仍有许多服务和工具可以分析访问者的行为,以确定访问者的行为是否演员是用户或自动化系统。

例如,访问一个站点然后立即通过将鼠标直接移到相关按钮上不到一秒钟立即执行一些操作,这实际上是用户不会做的。

作为调试工具,使用https://panopticlick.eff.org/这样的站点来检查浏览器的独特性可能也很有用。它还将帮助您验证是否有任何特定参数表明您正在Selenium中运行。

Even if you are sending all the right data (e.g. Selenium doesn’t show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it’ll also help you verify whether there are any specific parameters that indicate you’re running in Selenium.


回答 11

我所看到的漫游器检测似乎比我在下面的答案中读到的东西更加复杂或至少有所不同。

实验1:

  1. 我从Python控制台使用Selenium打开浏览器和网页。
  2. 鼠标已经位于特定的位置,我知道该链接将在页面加载后出现。我从不动鼠标。
  3. 我按下了鼠标左键一次(这是从运行Python的控制台到浏览器的焦点)。
  4. 我再次按下鼠标左键(记住,光标在给定链接的上方)。
  5. 链接会正常打开,应该打开。

实验2:

  1. 和以前一样,我从Python控制台使用Selenium打开浏览器和网页。

  2. 这次,我没有使用鼠标单击,而是使用Selenium(在Python控制台中)单击具有随机偏移量的相同元素。

  3. 链接没有打开,但是我被带到了注册页面。

含义:

  • 通过Selenium打开网络浏览器并不会阻止我出现人类
  • 像人一样移动鼠标并不一定要归类为人
  • 通过Selenium单击具有偏移量的内容仍会引发警报

似乎很神秘,但是我想他们可以确定某个动作是否源自Selenium,而他们并不关心浏览器本身是否通过Selenium打开。还是可以确定窗口是否具有焦点?听到有人有任何见识会很有趣。

The bot detection I’ve seen seems more sophisticated or at least different than what I’ve read through in the answers below.

EXPERIMENT 1:

  1. I open a browser and web page with Selenium from a Python console.
  2. The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
  3. I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
  4. I press the left mouse button again (remember, cursor is above a given link).
  5. The link opens normally, as it should.

EXPERIMENT 2:

  1. As before, I open a browser and the web page with Selenium from a Python console.

  2. This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.

  3. The link doesn’t open, but I am taken to a sign up page.

IMPLICATIONS:

  • opening a web browser via Selenium doesn’t preclude me from appearing human
  • moving the mouse like a human is not necessary to be classified as human
  • clicking something via Selenium with an offset still raises the alarm

Seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don’t care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? Would be interesting to hear if anyone has any insights.


回答 12

我发现的另一件事是,某些网站使用检查用户代理的平台。如果该值包含:“ HeadlessChrome”,则在使用无头模式时,该行为可能会很奇怪。

解决方法是覆盖用户代理值,例如在Java中:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: “HeadlessChrome” the behavior can be weird when using headless mode.

The workaround for that will be to override the user agent value, for example in Java:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

回答 13

一些站点正在检测到此:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

回答 14

用以下代码编写一个html页面。您将看到,在DOM硒中,在externalHTML中应用了webdriver属性

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

Write an html page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

回答 15

我发现这样更改javascript“ key”变量:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

在将Selenium Webdriver和Google Chrome结合使用时,某些网站可以使用,因为许多网站都会检查此变量,以避免被Selenium废弃。

I’ve found changing the javascript “key” variable like this:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

works for some websites when using Selenium Webdriver along with Google Chrome, since many sites check for this variable in order to avoid being scrapped by Selenium.


回答 16

在我看来,使用Selenium做到这一点的最简单方法是拦截XHR,后者将发送回浏览器指纹。

但这是仅硒的问题,因此最好使用其他方法。硒应该使这种事情变得容易,而不是更困难。

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

But since this is a Selenium-only problem, its better just to use something else. Selenium is supposed to make things like this easier, not way harder.


回答 17

您可以尝试使用参数“启用自动化”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

但是,我想提醒您,此功能已在ChromeDriver 79.0.3945.16中修复。因此,您可能应该使用旧版的chrome。

另外,作为另一个选择,您可以尝试使用InternetExplorerDriver而不是Chrome。对于我来说,IE不会在没有任何黑客的情况下完全阻止。

有关更多信息,请尝试在这里查看:

Selenium Webdriver:修改navigator.webdriver标志以防止硒检测

Chrome v76中无法隐藏“ Chrome正在由自动化软件控制”信息栏

You can try to use the parameter “enable-automation”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16. So probably you should use older versions of chrome.

Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.

And for more info try to take a look here:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76


在IPython中重新加载子模块

问题:在IPython中重新加载子模块

目前,我正在处理一个包含子模块并使用numpy / scipy的python项目。Ipython用作交互式控制台。不幸的是,我对现在使用的工作流程不是很满意,请多多指教。

在IPython中,该框架是通过一个简单的import命令加载的。但是,通常有必要在框架的子模块之一中更改代码。至此,已经加载了一个模型,并且我使用IPython与之交互。

现在,该框架包含许多相互依赖的模块,即,在最初加载该框架时,主模块正在导入和配置子模块。仅当使用重新加载模块时,才执行对代码的更改reload(main_mod.sub_mod)。这很麻烦,因为我需要使用完整路径分别重新加载所有更改的模块。如果reload(main_module)还重新加载所有子模块,但又不重新加载numpy / scipy ,将非常方便。

Currently I am working on a python project that contains sub modules and uses numpy/scipy. Ipython is used as interactive console. Unfortunately I am not very happy with workflow that I am using right now, I would appreciate some advice.

In IPython, the framework is loaded by a simple import command. However, it is often necessary to change code in one of the submodules of the framework. At this point a model is already loaded and I use IPython to interact with it.

Now, the framework contains many modules that depend on each other, i.e. when the framework is initially loaded the main module is importing and configuring the submodules. The changes to the code are only executed if the module is reloaded using reload(main_mod.sub_mod). This is cumbersome as I need to reload all changed modules individually using the full path. It would be very convenient if reload(main_module) would also reload all sub modules, but without reloading numpy/scipy..


回答 0

IPython带有一些自动重装魔术:

%load_ext autoreload
%autoreload 2

每次执行新行之前,它将重新加载所有更改的模块。它的工作方式与稍有不同dreload。有一些警告,请键入%autoreload?以查看可能出问题的地方。


如果要始终启用此设置,请修改IPython配置文件~/.ipython/profile_default/ipython_config.py[1]并附加:

c.InteractiveShellApp.extensions = ['autoreload']     
c.InteractiveShellApp.exec_lines = ['%autoreload 2']

通过下面的评论归功于@Kos。

[1]如果您没有该文件~/.ipython/profile_default/ipython_config.py,则需要先调用ipython profile create。或者文件可能位于$IPYTHONDIR

IPython comes with some automatic reloading magic:

%load_ext autoreload
%autoreload 2

It will reload all changed modules every time before executing a new line. The way this works is slightly different than dreload. Some caveats apply, type %autoreload? to see what can go wrong.


If you want to always enable this settings, modify your IPython configuration file ~/.ipython/profile_default/ipython_config.py[1] and appending:

c.InteractiveShellApp.extensions = ['autoreload']     
c.InteractiveShellApp.exec_lines = ['%autoreload 2']

Credit to @Kos via a comment below.

[1] If you don’t have the file ~/.ipython/profile_default/ipython_config.py, you need to call ipython profile create first. Or the file may be located at $IPYTHONDIR.


回答 1

在IPython 0.12(可能更早)中,您可以使用以下命令:

%load_ext autoreload
%autoreload 2

这与pv的答案基本相同,除了扩展名已重命名并现在使用加载%load_ext

In IPython 0.12 (and possibly earlier), you can use this:

%load_ext autoreload
%autoreload 2

This is essentially the same as the answer by pv., except that the extension has been renamed and is now loaded using %load_ext.


回答 2

由于某种原因,当您将代码从一个笔记本导入另一个笔记本时,这两种方法都不起作用%autoreload,也dreload似乎不起作用。只有普通的Python 可以工作:reload

reload(module)

基于[1]

For some reason, neither %autoreload, nor dreload seem to work for the situation when you import code from one notebook to another. Only plain Python reload works:

reload(module)

Based on [1].


回答 3

IPython提供dreload()了递归方式重新加载所有子模块。就个人而言,我更喜欢使用%run()magic命令(尽管它不会执行深度重新加载,正如John Salvatier在评论中指出的那样)。

IPython offers dreload() to recursively reload all submodules. Personally, I prefer to use the %run() magic command (though it does not perform a deep reload, as pointed out by John Salvatier in the comments).


回答 4

名为importliballow的模块可以访问导入内部构件。特别是,它提供功能importlib.reload()

import importlib
importlib.reload(my_module)

与此相反%autoreloadimportlib.reload()还重置模块中设置的全局变量。在大多数情况下,这就是您想要的。

importlib仅从Python 3.1开始可用。对于旧版本,您必须使用module imp

Module named importlib allow to access to import internals. Especially, it provide function importlib.reload():

import importlib
importlib.reload(my_module)

In contrary of %autoreload, importlib.reload() also reset global variables set in module. In most cases, it is what you want.

importlib is only available since Python 3.1. For older version, you have to use module imp.


回答 5

http://shawnleezx.github.io/blog/2015/08/03/some-notes-on-ipython-startup-script/

为了避免一遍又一遍地输入这些魔术函数,可以将它们放在ipython启动脚本中(用.ipython / profile_default / startup下的.py后缀命名。该文件夹下的所有python脚本将按照词法顺序加载),看起来如下:

from IPython import get_ipython
ipython = get_ipython()

ipython.magic("pylab")
ipython.magic("load_ext autoreload")
ipython.magic("autoreload 2")

http://shawnleezx.github.io/blog/2015/08/03/some-notes-on-ipython-startup-script/

To avoid typing those magic function again and again, they could be put in the ipython startup script(Name it with .py suffix under .ipython/profile_default/startup. All python scripts under that folder will be loaded according to lexical order), which looks like the following:

from IPython import get_ipython
ipython = get_ipython()

ipython.magic("pylab")
ipython.magic("load_ext autoreload")
ipython.magic("autoreload 2")

回答 6

这个怎么样:

import inspect

# needs to be primed with an empty set for loaded
def recursively_reload_all_submodules(module, loaded=None):
    for name in dir(module):
        member = getattr(module, name)
        if inspect.ismodule(member) and member not in loaded:
            recursively_reload_all_submodules(member, loaded)
    loaded.add(module)
    reload(module)

import mymodule
recursively_reload_all_submodules(mymodule, set())

这样可以有效地重新加载您为其提供的整个模块树和子模块树。您也可以将此函数放在.ipythonrc中(我认为),以便每次启动解释器时都将其加载。

How about this:

import inspect

# needs to be primed with an empty set for loaded
def recursively_reload_all_submodules(module, loaded=None):
    for name in dir(module):
        member = getattr(module, name)
        if inspect.ismodule(member) and member not in loaded:
            recursively_reload_all_submodules(member, loaded)
    loaded.add(module)
    reload(module)

import mymodule
recursively_reload_all_submodules(mymodule, set())

This should effectively reload the entire tree of modules and submodules you give it. You can also put this function in your .ipythonrc (I think) so it is loaded every time you start the interpreter.


回答 7

另外一个选项:

$ cat << EOF > ~/.ipython/profile_default/startup/50-autoreload.ipy
%load_ext autoreload
%autoreload 2
EOF

在Ubuntu 14.04上的ipython和ipython3 v5.1.0上进行了验证。

Another option:

$ cat << EOF > ~/.ipython/profile_default/startup/50-autoreload.ipy
%load_ext autoreload
%autoreload 2
EOF

Verified on ipython and ipython3 v5.1.0 on Ubuntu 14.04.


回答 8

我的重载标准做法是在首次打开时将两种方法结合起来IPython

from IPython.lib.deepreload import reload
%load_ext autoreload
%autoreload 2

在执行此操作之前加载模块将导致即使使用手册也无法重新加载它们reload(module_name)。我仍然很少遇到类方法无法重装的莫名其妙的问题,而我尚未研究过。

My standard practice for reloading is to combine both methods following first opening of IPython:

from IPython.lib.deepreload import reload
%load_ext autoreload
%autoreload 2

Loading modules before doing this will cause them not to be reloaded, even with a manual reload(module_name). I still, very rarely, get inexplicable problems with class methods not reloading that I’ve not yet looked into.


回答 9

请注意,autoreload如果您手动保存更改的文件(例如,使用ctrl + s或cmd + s),则上述内容仅在IntelliJ中有效。自动保存似乎不起作用。

Note that the above mentioned autoreload only works in IntelliJ if you manually save the changed file (e.g. using ctrl+s or cmd+s). It doesn’t seem to work with auto-saving.


回答 10

在Anaconda的Jupyter笔记本上,执行以下操作:

%load_ext autoreload
%autoreload 2

产生了消息:

autoreload扩展程序已加载。要重新加载它,请使用: %reload_ext autoreload

看起来最好这样做:

%reload_ext autoreload
%autoreload 2

版本信息:

笔记本服务器的版本为5.0.0,并且运行在:Python 3.6.2 | Anaconda,Inc. | (默认值,2017年9月20日,13:35:58)[MSC v.1900 32位(Intel)]

On Jupyter Notebooks on Anaconda, doing this:

%load_ext autoreload
%autoreload 2

produced the message:

The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload

It looks like it’s preferable to do:

%reload_ext autoreload
%autoreload 2

Version information:

The version of the notebook server is 5.0.0 and is running on: Python 3.6.2 |Anaconda, Inc.| (default, Sep 20 2017, 13:35:58) [MSC v.1900 32 bit (Intel)]


回答 11

此操作将不会重新加载任何子对象,我相信您必须为此使用IPython的deepreload。

Any subobjects will not be reloaded by this, I believe you have to use IPython’s deepreload for that.


如何在Python字符串中有选择地转义百分比(%)?

问题:如何在Python字符串中有选择地转义百分比(%)?

我有以下代码

test = "have it break."
selectiveEscape = "Print percent % in sentence and not %s" % test

print(selectiveEscape)

我想获得输出:

Print percent % in sentence and not have it break.

实际发生的情况:

    selectiveEscape = "Use percent % in sentence and not %s" % test
TypeError: %d format: a number is required, not str

I have the following code

test = "have it break."
selectiveEscape = "Print percent % in sentence and not %s" % test

print(selectiveEscape)

I would like to get the output:

Print percent % in sentence and not have it break.

What actually happens:

    selectiveEscape = "Use percent % in sentence and not %s" % test
TypeError: %d format: a number is required, not str

回答 0

>>> test = "have it break."
>>> selectiveEscape = "Print percent %% in sentence and not %s" % test
>>> print selectiveEscape
Print percent % in sentence and not have it break.
>>> test = "have it break."
>>> selectiveEscape = "Print percent %% in sentence and not %s" % test
>>> print selectiveEscape
Print percent % in sentence and not have it break.

回答 1

另外,从Python 2.6开始,您可以使用新的字符串格式(如PEP 3101中所述):

'Print percent % in sentence and not {0}'.format(test)

当您的弦变得越来越复杂时,这尤其方便。

Alternatively, as of Python 2.6, you can use new string formatting (described in PEP 3101):

'Print percent % in sentence and not {0}'.format(test)

which is especially handy as your strings get more complicated.


回答 2

尝试使用%%打印%符号。

try using %% to print % sign .


回答 3

您不能选择性地转义%,因为%根据以下字符,它总是具有特殊的含义。

在Python 文档中,该部分第二个表格的底部指出:

'%'        No argument is converted, results in a '%' character in the result.

因此,您应该使用:

selectiveEscape = "Print percent %% in sentence and not %s" % (test, )

(请注意,将元组的显式更改作为的参数%

在不了解上述情况的情况下,我会这样做:

selectiveEscape = "Print percent %s in sentence and not %s" % ('%', test)

显然你已经有了知识。

You can’t selectively escape %, as % always has a special meaning depending on the following character.

In the documentation of Python, at the bottem of the second table in that section, it states:

'%'        No argument is converted, results in a '%' character in the result.

Therefore you should use:

selectiveEscape = "Print percent %% in sentence and not %s" % (test, )

(please note the expicit change to tuple as argument to %)

Without knowing about the above, I would have done:

selectiveEscape = "Print percent %s in sentence and not %s" % ('%', test)

with the knowledge you obviously already had.


回答 4

如果从文件中读取了格式模板,并且不能确保内容将百分号加倍,则可能必须检测百分号并以编程方式确定它是否是占位符的开始。然后,解析器还应该识别类似%d(以及可以使用的其他字母)之类的序列,也应如此%(xxx)s

使用新格式可以观察到类似的问题-文本可以包含花括号。

If the formatting template was read from a file, and you cannot ensure the content doubles the percent sign, then you probably have to detect the percent character and decide programmatically whether it is the start of a placeholder or not. Then the parser should also recognize sequences like %d (and other letters that can be used), but also %(xxx)s etc.

Similar problem can be observed with the new formats — the text can contain curly braces.


回答 5

如果您使用的是Python 3.6或更高版本,则可以使用f-string

>>> test = "have it break."
>>> selectiveEscape = f"Print percent % in sentence and not {test}"
>>> print(selectiveEscape)
... Print percent % in sentence and not have it break.

If you are using Python 3.6 or newer, you can use f-string:

>>> test = "have it break."
>>> selectiveEscape = f"Print percent % in sentence and not {test}"
>>> print(selectiveEscape)
... Print percent % in sentence and not have it break.

回答 6

我尝试了不同的方法来打印子图标题,看看它们是如何工作的。当我使用乳胶时,情况有所不同。

在典型情况下,它适用于’%%’和’string’+’%’。

如果您使用Latex,则可以使用’string’+’\%’

因此,在典型情况下:

import matplotlib.pyplot as plt
fig,ax = plt.subplots(4,1)
float_number = 4.17
ax[0].set_title('Total: (%1.2f' %float_number + '\%)')
ax[1].set_title('Total: (%1.2f%%)' %float_number)
ax[2].set_title('Total: (%1.2f' %float_number + '%%)')
ax[3].set_title('Total: (%1.2f' %float_number + '%)')

带有%的标题示例

如果我们使用乳胶:

import matplotlib.pyplot as plt
import matplotlib
font = {'family' : 'normal',
        'weight' : 'bold',
        'size'   : 12}
matplotlib.rc('font', **font)
matplotlib.rcParams['text.usetex'] = True
matplotlib.rcParams['text.latex.unicode'] = True
fig,ax = plt.subplots(4,1)
float_number = 4.17
#ax[0].set_title('Total: (%1.2f\%)' %float_number) This makes python crash
ax[1].set_title('Total: (%1.2f%%)' %float_number)
ax[2].set_title('Total: (%1.2f' %float_number + '%%)')
ax[3].set_title('Total: (%1.2f' %float_number + '\%)')

我们得到这样的结果: 具有%和乳胶的标题示例

I have tried different methods to print a subplot title, look how they work. It’s different when i use Latex.

It works with ‘%%’ and ‘string’+’%’ in a typical case.

If you use Latex it worked using ‘string’+’\%’

So in a typical case:

import matplotlib.pyplot as plt
fig,ax = plt.subplots(4,1)
float_number = 4.17
ax[0].set_title('Total: (%1.2f' %float_number + '\%)')
ax[1].set_title('Total: (%1.2f%%)' %float_number)
ax[2].set_title('Total: (%1.2f' %float_number + '%%)')
ax[3].set_title('Total: (%1.2f' %float_number + '%)')

Title examples with %

If we use latex:

import matplotlib.pyplot as plt
import matplotlib
font = {'family' : 'normal',
        'weight' : 'bold',
        'size'   : 12}
matplotlib.rc('font', **font)
matplotlib.rcParams['text.usetex'] = True
matplotlib.rcParams['text.latex.unicode'] = True
fig,ax = plt.subplots(4,1)
float_number = 4.17
#ax[0].set_title('Total: (%1.2f\%)' %float_number) This makes python crash
ax[1].set_title('Total: (%1.2f%%)' %float_number)
ax[2].set_title('Total: (%1.2f' %float_number + '%%)')
ax[3].set_title('Total: (%1.2f' %float_number + '\%)')

We get this: Title example with % and latex


如何创建可变数量的变量?

问题:如何创建可变数量的变量?

如何在Python中完成变量变量?

例如,这是详尽的手动输入:变量变量

我听说这通常是个坏主意,这是Python中的一个安全漏洞。真的吗?

How do I accomplish variable variables in Python?

Here is an elaborative manual entry, for instance: Variable variables

I have heard this is a bad idea in general though, and it is a security hole in Python. Is that true?


回答 0

您可以使用字典来完成此任务。字典是键和值的存储。

>>> dct = {'x': 1, 'y': 2, 'z': 3}
>>> dct
{'y': 2, 'x': 1, 'z': 3}
>>> dct["y"]
2

您可以使用变量键名来获得变量变量的效果,而不会带来安全风险。

>>> x = "spam"
>>> z = {x: "eggs"}
>>> z["spam"]
'eggs'

对于您正在考虑做类似事情的情况

var1 = 'foo'
var2 = 'bar'
var3 = 'baz'
...

列表可能比字典更合适。一个列表代表对象的有序序列,并带有整数索引:

lst = ['foo', 'bar', 'baz']
print(lst[1])           # prints bar, because indices start at 0
lst.append('potatoes')  # lst is now ['foo', 'bar', 'baz', 'potatoes']

对于有序序列,列表比整数键类型的字典更方便,因为列表支持迭代的索引顺序,切片append和其他操作,将需要尴尬密钥管理与字典。

You can use dictionaries to accomplish this. Dictionaries are stores of keys and values.

>>> dct = {'x': 1, 'y': 2, 'z': 3}
>>> dct
{'y': 2, 'x': 1, 'z': 3}
>>> dct["y"]
2

You can use variable key names to achieve the effect of variable variables without the security risk.

>>> x = "spam"
>>> z = {x: "eggs"}
>>> z["spam"]
'eggs'

For cases where you’re thinking of doing something like

var1 = 'foo'
var2 = 'bar'
var3 = 'baz'
...

a list may be more appropriate than a dict. A list represents an ordered sequence of objects, with integer indices:

lst = ['foo', 'bar', 'baz']
print(lst[1])           # prints bar, because indices start at 0
lst.append('potatoes')  # lst is now ['foo', 'bar', 'baz', 'potatoes']

For ordered sequences, lists are more convenient than dicts with integer keys, because lists support iteration in index order, slicing, append, and other operations that would require awkward key management with a dict.


回答 1

使用内置getattr函数按名称获取对象的属性。根据需要修改名称。

obj.spam = 'eggs'
name = 'spam'
getattr(obj, name)  # returns 'eggs'

Use the built-in getattr function to get an attribute on an object by name. Modify the name as needed.

obj.spam = 'eggs'
name = 'spam'
getattr(obj, name)  # returns 'eggs'

回答 2

这不是一个好主意。如果要访问全局变量,则可以使用globals()

>>> a = 10
>>> globals()['a']
10

如果要访问本地作用域中的变量,可以使用locals(),但不能将值分配给返回的字典。

更好的解决方案是使用getattr变量或将其存储在字典中,然后按名称访问它们。

It’s not a good idea. If you are accessing a global variable you can use globals().

>>> a = 10
>>> globals()['a']
10

If you want to access a variable in the local scope you can use locals(), but you cannot assign values to the returned dict.

A better solution is to use getattr or store your variables in a dictionary and then access them by name.


回答 3

每当您想使用变量变量时,最好使用字典。所以不要写

$foo = "bar"
$$foo = "baz"

你写

mydict = {}
foo = "bar"
mydict[foo] = "baz"

这样,您就不会意外覆盖以前存在的变量(这是安全方面),并且您可以拥有不同的“命名空间”。

Whenever you want to use variable variables, it’s probably better to use a dictionary. So instead of writing

$foo = "bar"
$$foo = "baz"

you write

mydict = {}
foo = "bar"
mydict[foo] = "baz"

This way you won’t accidentally overwrite previously existing variables (which is the security aspect) and you can have different “namespaces”.


回答 4

新编码员有时会编写如下代码:

my_calculator.button_0 = tkinter.Button(root, text=0)
my_calculator.button_1 = tkinter.Button(root, text=1)
my_calculator.button_2 = tkinter.Button(root, text=2)
...

然后,给编码器留下一堆命名变量,编码工作量为O(m * n),其中m是命名变量的数量,n是需要访问变量组(包括创建)的次数。 )。更加精明的初学者注意到,这些行中的每行的唯一区别是根据规则而变化的数字,并决定使用循环。但是,他们陷入了如何动态创建这些变量名的困境,并可能尝试执行以下操作:

for i in range(10):
    my_calculator.('button_%d' % i) = tkinter.Button(root, text=i)

他们很快发现这是行不通的。

如果程序需要任意变量“名称”,则字典是最佳选择,如其他答案所述。但是,如果您只是尝试创建许多变量,而又不介意使用整数序列来引用它们,则可能是在寻找list。如果您的数据是同质的,例如每天的温度读数,每周的测验分数或图形小部件网格,则尤其如此。

可以如下组装:

my_calculator.buttons = []
for i in range(10):
    my_calculator.buttons.append(tkinter.Button(root, text=i))

list也可以用一个修真一行创建:

my_calculator.buttons = [tkinter.Button(root, text=i) for i in range(10)]

无论哪种情况,结果都是填充的list,第一个元素使用进行访问my_calculator.buttons[0],第二个元素使用进行访问my_calculator.buttons[1],依此类推。“基本”变量名称成为的名称,list并且使用可变标识符访问它。

最后,别忘了其他数据结构,例如set-类似于字典,只是每个“名称”都没有附加值。如果您只需要一个“袋子”的物品,这可能是一个不错的选择。代替这样的事情:

keyword_1 = 'apple'
keyword_2 = 'banana'

if query == keyword_1 or query == keyword_2:
    print('Match.')

您将拥有:

keywords = {'apple', 'banana'}
if query in keywords:
    print('Match.')

将a list用于一系列相似的对象,将a set用于任意排序的对象袋,或将a dict用于具有关联值的名称袋。

New coders sometimes write code like this:

my_calculator.button_0 = tkinter.Button(root, text=0)
my_calculator.button_1 = tkinter.Button(root, text=1)
my_calculator.button_2 = tkinter.Button(root, text=2)
...

The coder is then left with a pile of named variables, with a coding effort of O(m * n), where m is the number of named variables and n is the number of times that group of variables needs to be accessed (including creation). The more astute beginner observes that the only difference in each of those lines is a number that changes based on a rule, and decides to use a loop. However, they get stuck on how to dynamically create those variable names, and may try something like this:

for i in range(10):
    my_calculator.('button_%d' % i) = tkinter.Button(root, text=i)

They soon find that this does not work.

If the program requires arbitrary variable “names,” a dictionary is the best choice, as explained in other answers. However, if you’re simply trying to create many variables and you don’t mind referring to them with a sequence of integers, you’re probably looking for a list. This is particularly true if your data are homogeneous, such as daily temperature readings, weekly quiz scores, or a grid of graphical widgets.

This can be assembled as follows:

my_calculator.buttons = []
for i in range(10):
    my_calculator.buttons.append(tkinter.Button(root, text=i))

This list can also be created in one line with a comprehension:

my_calculator.buttons = [tkinter.Button(root, text=i) for i in range(10)]

The result in either case is a populated list, with the first element accessed with my_calculator.buttons[0], the next with my_calculator.buttons[1], and so on. The “base” variable name becomes the name of the list and the varying identifier is used to access it.

Finally, don’t forget other data structures, such as the set – this is similar to a dictionary, except that each “name” doesn’t have a value attached to it. If you simply need a “bag” of objects, this can be a great choice. Instead of something like this:

keyword_1 = 'apple'
keyword_2 = 'banana'

if query == keyword_1 or query == keyword_2:
    print('Match.')

You will have this:

keywords = {'apple', 'banana'}
if query in keywords:
    print('Match.')

Use a list for a sequence of similar objects, a set for an arbitrarily-ordered bag of objects, or a dict for a bag of names with associated values.


回答 5

除了字典之外,您还可以namedtuple在collections模块中使用它,这使访问更加容易。

例如:

# using dictionary
variables = {}
variables["first"] = 34
variables["second"] = 45
print(variables["first"], variables["second"])

# using namedtuple
Variables = namedtuple('Variables', ['first', 'second'])
vars = Variables(34, 45)
print(vars.first, vars.second)

Instead of a dictionary you can also use namedtuple from the collections module, which makes access easier.

For example:

# using dictionary
variables = {}
variables["first"] = 34
variables["second"] = 45
print(variables["first"], variables["second"])

# using namedtuple
Variables = namedtuple('Variables', ['first', 'second'])
vars = Variables(34, 45)
print(vars.first, vars.second)

回答 6

如果您不想使用任何对象,仍然可以setattr()在当前模块内部使用:

import sys
current_module = module = sys.modules[__name__]  # i.e the "file" where your code is written
setattr(current_module, 'variable_name', 15)  # 15 is the value you assign to the var
print(variable_name)  # >>> 15, created from a string

If you don’t want to use any object, you can still use setattr() inside your current module:

import sys
current_module = module = sys.modules[__name__]  # i.e the "file" where your code is written
setattr(current_module, 'variable_name', 15)  # 15 is the value you assign to the var
print(variable_name)  # >>> 15, created from a string

回答 7

SimpleNamespace类可用于创建新的属性setattr,或继承SimpleNamespace并创建自己的功能,增加新的属性名称(变量)。

from types import SimpleNamespace

variables = {"b":"B","c":"C"}
a = SimpleNamespace(**variables)
setattr(a,"g","G")
a.g = "G+"
something = a.a

The SimpleNamespace class could be used to create new attributes with setattr, or subclass SimpleNamespace and create your own function to add new attribute names (variables).

from types import SimpleNamespace

variables = {"b":"B","c":"C"}
a = SimpleNamespace(**variables)
setattr(a,"g","G")
a.g = "G+"
something = a.a

回答 8

我正在回答这个问题:如何在字符串中给定变量名的情况下获取变量的值? 该链接作为重复链接关闭,并带有指向该问题的链接。

如果所讨论的变量是一个对象(例如一个类的一部分)的一部分,那么一些有用的功能,以实现准确是hasattrgetattr,和setattr

因此,例如,您可以拥有:

class Variables(object):
    def __init__(self):
        self.foo = "initial_variable"
    def create_new_var(self,name,value):
        setattr(self,name,value)
    def get_var(self,name):
        if hasattr(self,name):
            return getattr(self,name)
        else:
            raise("Class does not have a variable named: "+name)

然后,您可以执行以下操作:

v = Variables()
v.get_var("foo")

“初始变量”

v.create_new_var(v.foo,"is actually not initial")
v.initial_variable

“实际上不是最初的”

I’m am answering the question: How to get the value of a variable given its name in a string? which is closed as a duplicate with a link to this question.

If the variables in question are part of an object (part of a class for example) then some useful functions to achieve exactly that are hasattr, getattr, and setattr.

So for example you can have:

class Variables(object):
    def __init__(self):
        self.foo = "initial_variable"
    def create_new_var(self,name,value):
        setattr(self,name,value)
    def get_var(self,name):
        if hasattr(self,name):
            return getattr(self,name)
        else:
            raise("Class does not have a variable named: "+name)

Then you can do:

v = Variables()
v.get_var("foo")

“initial_variable”

v.create_new_var(v.foo,"is actually not initial")
v.initial_variable

“is actually not initial”


回答 9

采用 globals()

实际上,您可以动态地将变量分配给全局范围,例如,如果要在全局范围内访问10个变量i_1i_2i_10

for i in range(10):
    globals()['i_{}'.format(i)] = 'a'

这将为所有这10个变量分配一个“ a”,当然您也可以动态更改该值。现在可以像访问其他全局声明的变量一样访问所有这些变量:

>>> i_5
'a'

Use globals()

You can actually assign variables to global scope dynamically, for instance, if you want 10 variables that can be accessed on a global scope i_1, i_2i_10:

for i in range(10):
    globals()['i_{}'.format(i)] = 'a'

This will assign ‘a’ to all of these 10 variables, of course you can change the value dynamically as well. All of these variables can be accessed now like other globally declared variable:

>>> i_5
'a'

回答 10

您必须使用globals()内置方法 来实现该行为:

def var_of_var(k, v):
    globals()[k] = v

print variable_name # NameError: name 'variable_name' is not defined
some_name = 'variable_name'
globals()[some_name] = 123
print variable_name # 123

some_name = 'variable_name2'
var_of_var(some_name, 456)
print variable_name2 # 456

You have to use globals() built in method to achieve that behaviour:

def var_of_var(k, v):
    globals()[k] = v

print variable_name # NameError: name 'variable_name' is not defined
some_name = 'variable_name'
globals()[some_name] = 123
print variable_name # 123

some_name = 'variable_name2'
var_of_var(some_name, 456)
print variable_name2 # 456

回答 11

共识是为此使用字典-参见其他答案。在大多数情况下,这是一个好主意,但是,由此产生了许多方面:

  • 您将自己负责此词典,包括垃圾收集(命令变量)等。
  • 变量变量既不存在局部性也不存在全局性,这取决于字典的全局性
  • 如果要重命名变量名,则必须手动进行
  • 但是,您要灵活得多,例如
    • 您可以决定覆盖现有变量或…
    • …选择实现const变量
    • 对不同类型的覆盖提出exceptions
    • 等等

也就是说,我已经实现了变量变量管理器 -class,它提供了上述一些想法。它适用于python 2和3。

你会使用这个类是这样的:

from variableVariablesManager import VariableVariablesManager

myVars = VariableVariablesManager()
myVars['test'] = 25
print(myVars['test'])

# define a const variable
myVars.defineConstVariable('myconst', 13)
try:
    myVars['myconst'] = 14 # <- this raises an error, since 'myconst' must not be changed
    print("not allowed")
except AttributeError as e:
    pass

# rename a variable
myVars.renameVariable('myconst', 'myconstOther')

# preserve locality
def testLocalVar():
    myVars = VariableVariablesManager()
    myVars['test'] = 13
    print("inside function myVars['test']:", myVars['test'])
testLocalVar()
print("outside function myVars['test']:", myVars['test'])

# define a global variable
myVars.defineGlobalVariable('globalVar', 12)
def testGlobalVar():
    myVars = VariableVariablesManager()
    print("inside function myVars['globalVar']:", myVars['globalVar'])
    myVars['globalVar'] = 13
    print("inside function myVars['globalVar'] (having been changed):", myVars['globalVar'])
testGlobalVar()
print("outside function myVars['globalVar']:", myVars['globalVar'])

如果只允许覆盖相同类型的变量:

myVars = VariableVariablesManager(enforceSameTypeOnOverride = True)
myVars['test'] = 25
myVars['test'] = "Cat" # <- raises Exception (different type on overwriting)

The consensus is to use a dictionary for this – see the other answers. This is a good idea for most cases, however, there are many aspects arising from this:

  • you’ll yourself be responsible for this dictionary, including garbage collection (of in-dict variables) etc.
  • there’s either no locality or globality for variable variables, it depends on the globality of the dictionary
  • if you want to rename a variable name, you’ll have to do it manually
  • however, you are much more flexible, e.g.
    • you can decide to overwrite existing variables or …
    • … choose to implement const variables
    • to raise an exception on overwriting for different types
    • etc.

That said, I’ve implemented a variable variables manager-class which provides some of the above ideas. It works for python 2 and 3.

You’d use the class like this:

from variableVariablesManager import VariableVariablesManager

myVars = VariableVariablesManager()
myVars['test'] = 25
print(myVars['test'])

# define a const variable
myVars.defineConstVariable('myconst', 13)
try:
    myVars['myconst'] = 14 # <- this raises an error, since 'myconst' must not be changed
    print("not allowed")
except AttributeError as e:
    pass

# rename a variable
myVars.renameVariable('myconst', 'myconstOther')

# preserve locality
def testLocalVar():
    myVars = VariableVariablesManager()
    myVars['test'] = 13
    print("inside function myVars['test']:", myVars['test'])
testLocalVar()
print("outside function myVars['test']:", myVars['test'])

# define a global variable
myVars.defineGlobalVariable('globalVar', 12)
def testGlobalVar():
    myVars = VariableVariablesManager()
    print("inside function myVars['globalVar']:", myVars['globalVar'])
    myVars['globalVar'] = 13
    print("inside function myVars['globalVar'] (having been changed):", myVars['globalVar'])
testGlobalVar()
print("outside function myVars['globalVar']:", myVars['globalVar'])

If you wish to allow overwriting of variables with the same type only:

myVars = VariableVariablesManager(enforceSameTypeOnOverride = True)
myVars['test'] = 25
myVars['test'] = "Cat" # <- raises Exception (different type on overwriting)

回答 12

我在python 3.7.3中都尝试过,可以使用globals()或vars()

>>> food #Error
>>> milkshake #Error
>>> food="bread"
>>> drink="milkshake"
>>> globals()[food] = "strawberry flavor"
>>> vars()[drink] = "chocolate flavor"
>>> bread
'strawberry flavor'
>>> milkshake
'chocolate flavor'
>>> globals()[drink]
'chocolate flavor'
>>> vars()[food]
'strawberry flavor'


参考:https :
//www.daniweb.com/programming/software-development/threads/111526/setting-a-string-as-a-variable-name#post548936

I have tried both in python 3.7.3, you can use either globals() or vars()

>>> food #Error
>>> milkshake #Error
>>> food="bread"
>>> drink="milkshake"
>>> globals()[food] = "strawberry flavor"
>>> vars()[drink] = "chocolate flavor"
>>> bread
'strawberry flavor'
>>> milkshake
'chocolate flavor'
>>> globals()[drink]
'chocolate flavor'
>>> vars()[food]
'strawberry flavor'


Reference:
https://www.daniweb.com/programming/software-development/threads/111526/setting-a-string-as-a-variable-name#post548936


回答 13

任何一组变量也可以包装在一个类中。通过在运行时通过__dict__属性直接访问内置字典,可以将“变量”变量添加到类实例中。

以下代码定义了Variables类,该类在构造过程中向其实例添加变量(在本例中为属性)。变量名来自指定的列表(例如,可能是由程序代码生成的):

# some list of variable names
L = ['a', 'b', 'c']

class Variables:
    def __init__(self, L):
        for item in L:
            self.__dict__[item] = 100

v = Variables(L)
print(v.a, v.b, v.c)
#will produce 100 100 100

Any set of variables can also be wrapped up in a class. “Variable” variables may be added to the class instance during runtime by directly accessing the built-in dictionary through __dict__ attribute.

The following code defines Variables class, which adds variables (in this case attributes) to its instance during the construction. Variable names are taken from a specified list (which, for example, could have been generated by program code):

# some list of variable names
L = ['a', 'b', 'c']

class Variables:
    def __init__(self, L):
        for item in L:
            self.__dict__[item] = 100

v = Variables(L)
print(v.a, v.b, v.c)
#will produce 100 100 100

从命令行运行功能

问题:从命令行运行功能

我有以下代码:

def hello():
    return 'Hi :)'

我将如何直接从命令行运行它?

I have this code:

def hello():
    return 'Hi :)'

How would I run this directly from the command line?


回答 0

使用-c (command)参数(假设您的文件名为foo.py):

$ python -c 'import foo; print foo.hello()'

或者,如果您不关心命名空间污染,请执行以下操作:

$ python -c 'from foo import *; print hello()'

和中间立场:

$ python -c 'from foo import hello; print hello()'

With the -c (command) argument (assuming your file is named foo.py):

$ python -c 'import foo; print foo.hello()'

Alternatively, if you don’t care about namespace pollution:

$ python -c 'from foo import *; print hello()'

And the middle ground:

$ python -c 'from foo import hello; print hello()'

回答 1

只需将其hello()放在函数下方,它将在您执行时执行python your_file.py

对于更整洁的解决方案,您可以使用以下方法:

if __name__ == '__main__':
    hello()

这样,仅当您运行文件时才执行该功能,而不是在导入文件时执行该功能。

Just put hello() somewhere below the function and it will execute when you do python your_file.py

For a neater solution you can use this:

if __name__ == '__main__':
    hello()

That way the function will only be executed if you run the file, not when you import the file.


回答 2

python -c 'from myfile import hello; hello()'这里myfile必须使用Python脚本的基本名称来代替。(例如,myfile.py变为myfile)。

但是,如果hello()在您的Python脚本中是您的“永久”主入口点,那么执行此操作的通常方法如下:

def hello():
    print "Hi :)"

if __name__ == "__main__":
    hello()

这样,您只需运行python myfile.py或即可执行脚本python -m myfile

此处的一些解释:__name__是一个特殊的Python变量,用于保存当前正在执行的模块的名称,除非从命令行启动模块(在这种情况下为)"__main__"

python -c 'from myfile import hello; hello()' where myfile must be replaced with the basename of your Python script. (E.g., myfile.py becomes myfile).

However, if hello() is your “permanent” main entry point in your Python script, then the usual way to do this is as follows:

def hello():
    print "Hi :)"

if __name__ == "__main__":
    hello()

This allows you to execute the script simply by running python myfile.py or python -m myfile.

Some explanation here: __name__ is a special Python variable that holds the name of the module currently being executed, except when the module is started from the command line, in which case it becomes "__main__".


回答 3

我编写了一个快速的Python小脚本,可以从bash命令行调用它。它使用您要调用的模块,类和方法的名称以及您要传递的参数。我将其称为PyRun并保留了.py扩展名,并使其可用chmod + x PyRun可执行,因此我可以按如下所示快速调用它:

./PyRun PyTest.ClassName.Method1 Param1

将此保存在名为PyRun的文件中

#!/usr/bin/env python
#make executable in bash chmod +x PyRun

import sys
import inspect
import importlib
import os

if __name__ == "__main__":
    cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
    if cmd_folder not in sys.path:
        sys.path.insert(0, cmd_folder)

    # get the second argument from the command line      
    methodname = sys.argv[1]

    # split this into module, class and function name
    modulename, classname, funcname = methodname.split(".")

    # get pointers to the objects based on the string names
    themodule = importlib.import_module(modulename)
    theclass = getattr(themodule, classname)
    thefunc = getattr(theclass, funcname)

    # pass all the parameters from the third until the end of 
    # what the function needs & ignore the rest
    args = inspect.getargspec(thefunc)
    z = len(args[0]) + 2
    params=sys.argv[2:z]
    thefunc(*params)

这是一个示例模块,展示了它是如何工作的。这保存在名为PyTest.py的文件中:

class SomeClass:
 @staticmethod
 def First():
     print "First"

 @staticmethod
 def Second(x):
    print(x)
    # for x1 in x:
    #     print x1

 @staticmethod
 def Third(x, y):
     print x
     print y

class OtherClass:
    @staticmethod
    def Uno():
        print("Uno")

尝试运行以下示例:

./PyRun PyTest.SomeClass.First
./PyRun PyTest.SomeClass.Second Hello
./PyRun PyTest.SomeClass.Third Hello World
./PyRun PyTest.OtherClass.Uno
./PyRun PyTest.SomeClass.Second "Hello"
./PyRun PyTest.SomeClass.Second \(Hello, World\)

请注意最后一个转义括号以将元组作为Second方法的唯一参数传递的示例。

如果为该方法所需的参数传递的参数太少,则会出现错误。如果您通过太多,它将忽略额外费用。该模块必须在当前工作文件夹中,将PyRun放置在路径中的任何位置。

I wrote a quick little Python script that is callable from a bash command line. It takes the name of the module, class and method you want to call and the parameters you want to pass. I call it PyRun and left off the .py extension and made it executable with chmod +x PyRun so that I can just call it quickly as follow:

./PyRun PyTest.ClassName.Method1 Param1

Save this in a file called PyRun

#!/usr/bin/env python
#make executable in bash chmod +x PyRun

import sys
import inspect
import importlib
import os

if __name__ == "__main__":
    cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
    if cmd_folder not in sys.path:
        sys.path.insert(0, cmd_folder)

    # get the second argument from the command line      
    methodname = sys.argv[1]

    # split this into module, class and function name
    modulename, classname, funcname = methodname.split(".")

    # get pointers to the objects based on the string names
    themodule = importlib.import_module(modulename)
    theclass = getattr(themodule, classname)
    thefunc = getattr(theclass, funcname)

    # pass all the parameters from the third until the end of 
    # what the function needs & ignore the rest
    args = inspect.getargspec(thefunc)
    z = len(args[0]) + 2
    params=sys.argv[2:z]
    thefunc(*params)

Here is a sample module to show how it works. This is saved in a file called PyTest.py:

class SomeClass:
 @staticmethod
 def First():
     print "First"

 @staticmethod
 def Second(x):
    print(x)
    # for x1 in x:
    #     print x1

 @staticmethod
 def Third(x, y):
     print x
     print y

class OtherClass:
    @staticmethod
    def Uno():
        print("Uno")

Try running these examples:

./PyRun PyTest.SomeClass.First
./PyRun PyTest.SomeClass.Second Hello
./PyRun PyTest.SomeClass.Third Hello World
./PyRun PyTest.OtherClass.Uno
./PyRun PyTest.SomeClass.Second "Hello"
./PyRun PyTest.SomeClass.Second \(Hello, World\)

Note the last example of escaping the parentheses to pass in a tuple as the only parameter to the Second method.

If you pass too few parameters for what the method needs you get an error. If you pass too many, it ignores the extras. The module must be in the current working folder, put PyRun can be anywhere in your path.


回答 4

将此代码段添加到脚本的底部

def myfunction():
    ...


if __name__ == '__main__':
    globals()[sys.argv[1]]()

您现在可以通过运行来调用函数

python myscript.py myfunction

之所以有效,是因为您要将命令行参数(函数名称的字符串)传递给locals,该字典具有当前本地符号表。最后的括号将使函数被调用。

更新:如果您希望函数从命令行接受参数,则可以这样传递sys.argv[2]

def myfunction(mystring):
    print mystring


if __name__ == '__main__':
    globals()[sys.argv[1]](sys.argv[2])

这样,运行python myscript.py myfunction "hello"将输出hello

add this snippet to the bottom of your script

def myfunction():
    ...


if __name__ == '__main__':
    globals()[sys.argv[1]]()

You can now call your function by running

python myscript.py myfunction

This works because you are passing the command line argument (a string of the function’s name) into locals, a dictionary with a current local symbol table. The parantheses at the end will make the function be called.

update: if you would like the function to accept a parameter from the command line, you can pass in sys.argv[2] like this:

def myfunction(mystring):
    print mystring


if __name__ == '__main__':
    globals()[sys.argv[1]](sys.argv[2])

This way, running python myscript.py myfunction "hello" will output hello.


回答 5

让我们自己简化一点,只使用一个模块…

尝试: pip install compago

然后写:

import compago
app = compago.Application()

@app.command
def hello():
    print "hi there!"

@app.command
def goodbye():
    print "see ya later."

if __name__ == "__main__":
    app.run()

然后像这样使用:

$ python test.py hello
hi there!

$ python test.py goodbye
see ya later.

注意:目前Python 3中存在一个错误,但与Python 2配合使用时效果很好。

编辑:我认为是一个更好的选择,是Google 触发的模块,这也使得传递函数参数变得容易。它与一起安装pip install fire。从他们的GitHub:

这是一个简单的例子。

import fire

class Calculator(object):
  """A simple calculator class."""

  def double(self, number):
    return 2 * number

if __name__ == '__main__':
  fire.Fire(Calculator)

然后,可以从命令行运行:

python calculator.py double 10  # 20
python calculator.py double --number=15  # 30

Let’s make this a little easier on ourselves and just use a module…

Try: pip install compago

Then write:

import compago
app = compago.Application()

@app.command
def hello():
    print "hi there!"

@app.command
def goodbye():
    print "see ya later."

if __name__ == "__main__":
    app.run()

Then use like so:

$ python test.py hello
hi there!

$ python test.py goodbye
see ya later.

Note: There’s a bug in Python 3 at the moment, but works great with Python 2.

Edit: An even better option, in my opinion is the module fire by Google which makes it easy to also pass function arguments. It is installed with pip install fire. From their GitHub:

Here’s a simple example.

import fire

class Calculator(object):
  """A simple calculator class."""

  def double(self, number):
    return 2 * number

if __name__ == '__main__':
  fire.Fire(Calculator)

Then, from the command line, you can run:

python calculator.py double 10  # 20
python calculator.py double --number=15  # 30

回答 6

有趣的是,如果目标是打印到命令行控制台或执行其他一些分钟的python操作,则可以将输入通过管道传递到python解释器中,如下所示:

echo print("hi:)") | python

以及管道文件

python < foo.py

*请注意,扩展名不必一定是.py才能正常工作。**还请注意,对于bash而言,您可能需要转义字符

echo print\(\"hi:\)\"\) | python

Interestingly enough, if the goal was to print to the command line console or perform some other minute python operation, you can pipe input into the python interpreter like so:

echo print("hi:)") | python

as well as pipe files..

python < foo.py

*Note that the extension does not have to be .py for the second to work. **Also note that for bash you may need to escape the characters

echo print\(\"hi:\)\"\) | python

回答 7

如果安装runp软件包时pip install runp需要运行:

runp myfile.py hello

您可以在以下位置找到存储库:https : //github.com/vascop/runp

If you install the runp package with pip install runp its a matter of running:

runp myfile.py hello

You can find the repository at: https://github.com/vascop/runp


回答 8

我需要在命令行上使用各种python实用程序(范围,字符串等),并为此专门编写了pyfunc工具。您可以使用它来丰富您的命令行使用经验:

 $ pyfunc -m range -a 1 7 2
 1
 3
 5

 $ pyfunc -m string.upper -a test
 TEST

 $ pyfunc -m string.replace -a 'analyze what' 'what' 'this'
 analyze this

I had a requirement of using various python utilities (range, string, etc.) on the command line and had written the tool pyfunc specifically for that. You can use it to enrich you command line usage experience:

 $ pyfunc -m range -a 1 7 2
 1
 3
 5

 $ pyfunc -m string.upper -a test
 TEST

 $ pyfunc -m string.replace -a 'analyze what' 'what' 'this'
 analyze this

回答 9

始终可以使用python命令在命令行中输入python

然后导入您的文件,以便导入example_file

然后使用example_file.hello()运行命令

这避免了每次运行python -c等时都会出现的怪异.pyc复制函数。

也许不像单个命令那样方便,但是它是一个很好的快速修复程序,可从命令行向文件发送文本,并允许您使用python来调用和执行文件。

It is always an option to enter python on the command line with the command python

then import your file so import example_file

then run the command with example_file.hello()

This avoids the weird .pyc copy function that crops up every time you run python -c etc.

Maybe not as convenient as a single-command, but a good quick fix to text a file from the command line, and allows you to use python to call and execute your file.


回答 10

像这样:call_from_terminal.py

# call_from_terminal.py
# Ex to run from terminal
# ip='"hi"'
# python -c "import call_from_terminal as cft; cft.test_term_fun(${ip})"
# or
# fun_name='call_from_terminal'
# python -c "import ${fun_name} as cft; cft.test_term_fun(${ip})"
def test_term_fun(ip):
    print ip

这在bash中有效。

$ ip='"hi"' ; fun_name='call_from_terminal' 
$ python -c "import ${fun_name} as cft; cft.test_term_fun(${ip})"
hi

Something like this: call_from_terminal.py

# call_from_terminal.py
# Ex to run from terminal
# ip='"hi"'
# python -c "import call_from_terminal as cft; cft.test_term_fun(${ip})"
# or
# fun_name='call_from_terminal'
# python -c "import ${fun_name} as cft; cft.test_term_fun(${ip})"
def test_term_fun(ip):
    print ip

This works in bash.

$ ip='"hi"' ; fun_name='call_from_terminal' 
$ python -c "import ${fun_name} as cft; cft.test_term_fun(${ip})"
hi

回答 11

下面是Odd_Even_function.py文件,其中包含函数的定义。

def OE(n):
    for a in range(n):
        if a % 2 == 0:
            print(a)
        else:
            print(a, "ODD")

现在,从下面的命令提示符处调用相同的选项对我有用。

选项1 exe \ python.exe -c“导入Odd_Even_function; Odd_Even_function.OE(100)”的完整路径

选项2 exe \ python.exe -c“从Odd_Even_function import OE; OE(100)”的完整路径

谢谢。

Below is the Odd_Even_function.py file that has the definition of the function.

def OE(n):
    for a in range(n):
        if a % 2 == 0:
            print(a)
        else:
            print(a, "ODD")

Now to call the same from Command prompt below are the options worked for me.

Options 1 Full path of the exe\python.exe -c “import Odd_Even_function; Odd_Even_function.OE(100)”

Option 2 Full path of the exe\python.exe -c “from Odd_Even_function import OE; OE(100)”

Thanks.


回答 12

此函数无法从命令行运行,因为它返回的值将不可用。您可以删除退货并改为使用打印

This function cannot be run from the command line as it returns a value which will go unhanded. You can remove the return and use print instead


回答 13

使用python-c工具(pip install python-c),然后简单地编写:

$ python-c foo 'hello()'

或在python文件中没有函数名冲突的情况下:

$ python-c 'hello()'

Use the python-c tool (pip install python-c) and then simply write:

$ python-c foo 'hello()'

or in case you have no function name clashes in your python files:

$ python-c 'hello()'

回答 14

首先,您必须按照他们告诉您的方式调用该函数,否则该功能将在输出中不显示任何内容,之后保存文件并通过右键单击将文件的路径复制到文件的文件夹,然后单击“复制文件”转到终端并输入:-cd“文件的路径”-python“例如文件的名称(main.py)”,之后它将显示代码的输出。

First you have to call the function as they told you or the founction will display nothing in the output, after that save the file and copy the path of the file by right click to the folder of the file and click on”copy file” then go to terminal and write: – cd “the path of the file” – python “name of the file for example (main.py)” after that it will display the output of your code.


回答 15

安装Spyder,让您的生活更轻松。打开文件,然后运行它(单击绿色箭头)。之后,您的hello()方法已定义并为IPython控制台所知,因此您可以从控制台中调用它。

Make your life easier, install Spyder. Open your file then run it (click the green arrow). Afterwards your hello() method is defined and known to the IPython Console, so you can call it from the console.


Python 3中的raw_input()和input()有什么区别?

问题:Python 3中的raw_input()和input()有什么区别?

raw_input()input()Python 3有什么区别?

What is the difference between raw_input() and input() in Python 3?


回答 0

区别在于raw_input()Python 3.x中不存在,而input()确实存在。实际上,raw_input()已将旧名称重命名为input(),而旧名称input()已消失,但是可以使用轻松地对其进行模拟eval(input())。(请记住这eval()是邪恶的。如果可能,请尝试使用更安全的方法来解析输入。)

The difference is that raw_input() does not exist in Python 3.x, while input() does. Actually, the old raw_input() has been renamed to input(), and the old input() is gone, but can easily be simulated by using eval(input()). (Remember that eval() is evil. Try to use safer ways of parsing your input if possible.)


回答 1

在Python 2中raw_input()返回一个字符串,并input()尝试将输入作为Python表达式运行。

由于获取字符串几乎总是您想要的,因此Python 3做到了input()。正如Sven所说,如果您想要旧的行为,那就eval(input())可以了。

In Python 2, raw_input() returns a string, and input() tries to run the input as a Python expression.

Since getting a string was almost always what you wanted, Python 3 does that with input(). As Sven says, if you ever want the old behaviour, eval(input()) works.


回答 2

Python 2:

  • raw_input() 完全接受用户键入的内容,并将其作为字符串传递回。

  • input()首先采用raw_input(),然后对其执行eval()

主要区别在于,input()期望语法正确的python语句raw_input()没有。

Python 3:

  • raw_input()被重命名为,input()因此现在input()返回确切的字符串。
  • 旧的input()被删除。

如果要使用旧的input()(意味着需要将用户输入评估为python语句),则必须使用手动进行操作eval(input())

Python 2:

  • raw_input() takes exactly what the user typed and passes it back as a string.

  • input() first takes the raw_input() and then performs an eval() on it as well.

The main difference is that input() expects a syntactically correct python statement where raw_input() does not.

Python 3:

  • raw_input() was renamed to input() so now input() returns the exact string.
  • Old input() was removed.

If you want to use the old input(), meaning you need to evaluate a user input as a python statement, you have to do it manually by using eval(input()).


回答 3

在Python 3中,raw_input()不存在Sven已经提到的内容。

在Python 2中,该input()函数评估您的输入。

例:

name = input("what is your name ?")
what is your name ?harsha

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    name = input("what is your name ?")
  File "<string>", line 1, in <module>
NameError: name 'harsha' is not defined

在上面的示例中,Python 2.x尝试将rahda评估为变量而非字符串。为了避免这种情况,我们可以在输入中使用双引号,例如“ harsha”:

>>> name = input("what is your name?")
what is your name?"harsha"
>>> print(name)
harsha

raw_input()

raw_input()函数不会求值,它只会读取您输入的内容。

例:

name = raw_input("what is your name ?")
what is your name ?harsha
>>> name
'harsha'

例:

 name = eval(raw_input("what is your name?"))
what is your name?harsha

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    name = eval(raw_input("what is your name?"))
  File "<string>", line 1, in <module>
NameError: name 'harsha' is not defined

在上面的示例中,我只是尝试使用该eval函数评估用户输入。

In Python 3, raw_input() doesn’t exist which was already mentioned by Sven.

In Python 2, the input() function evaluates your input.

Example:

name = input("what is your name ?")
what is your name ?harsha

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    name = input("what is your name ?")
  File "<string>", line 1, in <module>
NameError: name 'harsha' is not defined

In the example above, Python 2.x is trying to evaluate harsha as a variable rather than a string. To avoid that, we can use double quotes around our input like “harsha”:

>>> name = input("what is your name?")
what is your name?"harsha"
>>> print(name)
harsha

raw_input()

The raw_input()` function doesn’t evaluate, it will just read whatever you enter.

Example:

name = raw_input("what is your name ?")
what is your name ?harsha
>>> name
'harsha'

Example:

 name = eval(raw_input("what is your name?"))
what is your name?harsha

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    name = eval(raw_input("what is your name?"))
  File "<string>", line 1, in <module>
NameError: name 'harsha' is not defined

In example above, I was just trying to evaluate the user input with the eval function.


回答 4

我想在每个人为python 2用户提供的解释中添加更多细节。raw_input(),到现在为止,您已经知道该功能可以评估用户以字符串形式输入的数据。这意味着python甚至不会尝试再次理解输入的数据。它只会考虑输入的数据将是字符串,无论它是实际的字符串还是int或其他任何值。

input()在另一方面试图理解用户输入的数据。因此,像这样的输入helloworld甚至会将错误显示为’ helloworld is undefined‘。

总之,对于python 2来说,也要输入字符串,您需要像’ helloworld‘ 一样输入它,这是python中使用字符串的常用结构。

I’d like to add a little more detail to the explanation provided by everyone for the python 2 users. raw_input(), which, by now, you know that evaluates what ever data the user enters as a string. This means that python doesn’t try to even understand the entered data again. All it will consider is that the entered data will be string, whether or not it is an actual string or int or anything.

While input() on the other hand tries to understand the data entered by the user. So the input like helloworld would even show the error as ‘helloworld is undefined‘.

In conclusion, for python 2, to enter a string too you need to enter it like ‘helloworld‘ which is the common structure used in python to use strings.


回答 5

如果您想确保自己的代码与python2和python3一起运行,请在脚本中使用function input()并将其添加到脚本的开头:

from sys import version_info
if version_info.major == 3:
    pass
elif version_info.major == 2:
    try:
        input = raw_input
    except NameError:
        pass
else:
    print ("Unknown python version - input function not safe")

If You want to ensure, that your code is running with python2 and python3, use function input () in your script and add this to begin of your script:

from sys import version_info
if version_info.major == 3:
    pass
elif version_info.major == 2:
    try:
        input = raw_input
    except NameError:
        pass
else:
    print ("Unknown python version - input function not safe")