The formatting characters used here are the same as those used by strftime. Don’t miss the leading : in the format specifier.
Using format() instead of strftime() in most cases can make the code more readable, easier to write and consistent with the way formatted output is generated…
>>>"{} today's date is: {:%B %d, %Y}".format("Andre", datetime.now())
Compare the above with the following strftime() alternative…
>>>"{} today's date is {}".format("Andre", datetime.now().strftime("%B %d, %Y"))
Moreover, the following is not going to work…
>>>datetime.now().strftime("%s %B %d, %Y" % "Andre")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
datetime.now().strftime("%s %B %d, %Y" % "Andre")
TypeError: not enough arguments for format string
And so on…
回答 2
在Python 3.6及更高版本中使用f字符串。
from datetime import datetime
date_string = f'{datetime.now():%Y-%m-%d %H:%M:%S%z}'
Assume that S and T are assigned sets. Without using the join operator |, how can I find the union of the two sets? This, for example, finds the intersection:
S = {1, 2, 3, 4}
T = {3, 4, 5, 6}
S_intersect_T = { i for i in S if i in T }
So how can I find the union of two sets in one line without using |?
The * unpacks the set. Unpacking is where an iterable (e.g. a set or list) is represented as every item it yields. This means the above example simplifies to {1, 2, 3, 4, 3, 4, 5, 6} which then simplifies to {1, 2, 3, 4, 5, 6} because the set can only contain unique items.
/home/udi/foo has some necessary subdirectories, like /home/udi/foo/log and /home/udi/foo/config, which /home/udi/foo/bar.py refers to.
The problem is that crontab runs the script from a different working directory, so trying to open ./log/bar.log fails.
Is there a nice way to tell the script to change the working directory to the script’s own directory? I would fancy a solution that would work for any script location, rather than explicitly telling the script where it is.
EDIT:
os.chdir(os.path.dirname(sys.argv[0]))
Was the most compact elegant solution. Thanks for your answers and explanations!
This will change your current working directory to so that opening relative paths will work:
import os
os.chdir("/home/udi/foo")
However, you asked how to change into whatever directory your Python script is located, even if you don’t know what directory that will be when you’re writing your script. To do this, you can use the os.path functions:
import os
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
This takes the filename of your script, converts it to an absolute path, then extracts the directory of that path, then changes into that directory.
As initialized upon program startup, the first item of this list,
path[0], is the directory containing the script that was used to
invoke the Python interpreter
Your scripts and your data should not be mashed into one big directory. Put your code in some known location (site-packages or /var/opt/udi or something) separate from your data. Use good version control on your code to be sure that you have current and previous versions separated from each other so you can fall back to previous versions and test future versions.
Bottom line: Do not mingle code and data.
Data is precious. Code comes and goes.
Provide the working directory as a command-line argument value. You can provide a default as an environment variable. Don’t deduce it (or guess at it)
Make it a required argument value and do this.
import sys
import os
working= os.environ.get("WORKING_DIRECTORY","/some/default")
if len(sys.argv) > 1: working = sys.argv[1]
os.chdir( working )
Do not “assume” a directory based on the location of your software. It will not work out well in the long run.
The (...) starts a sub-shell that your crond executes as a single command. The || exit 1 causes your cronjob to fail in case that the directory is unavailable.
Though the other solutions may be more elegant in the long run for your specific scripts, my example could still be useful in cases where you can’t modify the program or command that you want to execute.
You may need the following two functions: pad– to pad(when doing encryption) and unpad– to unpad (when doing decryption) when the length of input is not a multiple of BLOCK_SIZE.
BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS)
unpad = lambda s : s[:-ord(s[len(s)-1:])]
So you’re asking the length of key? You can use the md5sum of the key rather than use it directly.
More, according to my little experience of using PyCrypto, the IV is used to mix up the output of a encryption when input is same, so the IV is chosen as a random string, and use it as part of the encryption output, and then use it to decrypt the message.
And here’s my implementation, hope it will be useful for you:
import base64
from Crypto.Cipher import AES
from Crypto import Random
class AESCipher:
def __init__( self, key ):
self.key = key
def encrypt( self, raw ):
raw = pad(raw)
iv = Random.new().read( AES.block_size )
cipher = AES.new( self.key, AES.MODE_CBC, iv )
return base64.b64encode( iv + cipher.encrypt( raw ) )
def decrypt( self, enc ):
enc = base64.b64decode(enc)
iv = enc[:16]
cipher = AES.new(self.key, AES.MODE_CBC, iv )
return unpad(cipher.decrypt( enc[16:] ))
fromCrypto.Cipherimport AESfromCrypto.UtilimportCounterfromCryptoimportRandom# AES supports multiple key sizes: 16 (AES128), 24 (AES192), or 32 (AES256).
key_bytes =32# Takes as input a 32-byte key and an arbitrary-length plaintext and returns a# pair (iv, ciphtertext). "iv" stands for initialization vector.def encrypt(key, plaintext):assert len(key)== key_bytes# Choose a random, 16-byte IV.
iv =Random.new().read(AES.block_size)# Convert the IV to a Python integer.
iv_int = int(binascii.hexlify(iv),16)# Create a new Counter object with IV = iv_int.
ctr =Counter.new(AES.block_size *8, initial_value=iv_int)# Create AES-CTR cipher.
aes = AES.new(key, AES.MODE_CTR, counter=ctr)# Encrypt and return IV and ciphertext.
ciphertext = aes.encrypt(plaintext)return(iv, ciphertext)# Takes as input a 32-byte key, a 16-byte IV, and a ciphertext, and outputs the# corresponding plaintext.def decrypt(key, iv, ciphertext):assert len(key)== key_bytes# Initialize counter for decryption. iv should be the same as the output of# encrypt().
iv_int = int(iv.encode('hex'),16)
ctr =Counter.new(AES.block_size *8, initial_value=iv_int)# Create AES-CTR cipher.
aes = AES.new(key, AES.MODE_CTR, counter=ctr)# Decrypt and return the plaintext.
plaintext = aes.decrypt(ciphertext)return plaintext(iv, ciphertext)= encrypt(key,'hella')print decrypt(key, iv, ciphertext)
# Nominal way to generate a fresh key. This calls the system's random number# generator (RNG).
key1 =Random.new().read(key_bytes)
密钥也可以从密码派生:
# It's also possible to derive a key from a password, but it's important that# the password have high entropy, meaning difficult to predict.
password ="This is a rather weak password."# For added # security, we add a "salt", which increases the entropy.## In this example, we use the same RNG to produce the salt that we used to# produce key1.
salt_bytes =8
salt =Random.new().read(salt_bytes)# Stands for "Password-based key derivation function 2"
key2 = PBKDF2(password, salt, key_bytes)
Let me address your question about “modes.” AES256 is a kind of block cipher. It takes as input a 32-byte key and a 16-byte string, called the block and outputs a block. We use AES in a mode of operation in order to encrypt. The solutions above suggest using CBC, which is one example. Another is called CTR, and it’s somewhat easier to use:
from Crypto.Cipher import AES
from Crypto.Util import Counter
from Crypto import Random
# AES supports multiple key sizes: 16 (AES128), 24 (AES192), or 32 (AES256).
key_bytes = 32
# Takes as input a 32-byte key and an arbitrary-length plaintext and returns a
# pair (iv, ciphtertext). "iv" stands for initialization vector.
def encrypt(key, plaintext):
assert len(key) == key_bytes
# Choose a random, 16-byte IV.
iv = Random.new().read(AES.block_size)
# Convert the IV to a Python integer.
iv_int = int(binascii.hexlify(iv), 16)
# Create a new Counter object with IV = iv_int.
ctr = Counter.new(AES.block_size * 8, initial_value=iv_int)
# Create AES-CTR cipher.
aes = AES.new(key, AES.MODE_CTR, counter=ctr)
# Encrypt and return IV and ciphertext.
ciphertext = aes.encrypt(plaintext)
return (iv, ciphertext)
# Takes as input a 32-byte key, a 16-byte IV, and a ciphertext, and outputs the
# corresponding plaintext.
def decrypt(key, iv, ciphertext):
assert len(key) == key_bytes
# Initialize counter for decryption. iv should be the same as the output of
# encrypt().
iv_int = int(iv.encode('hex'), 16)
ctr = Counter.new(AES.block_size * 8, initial_value=iv_int)
# Create AES-CTR cipher.
aes = AES.new(key, AES.MODE_CTR, counter=ctr)
# Decrypt and return the plaintext.
plaintext = aes.decrypt(ciphertext)
return plaintext
(iv, ciphertext) = encrypt(key, 'hella')
print decrypt(key, iv, ciphertext)
This is often referred to as AES-CTR. I would advise caution in using AES-CBC with PyCrypto. The reason is that it requires you to specify the padding scheme, as exemplified by the other solutions given. In general, if you’re not very careful about the padding, there are attacks that completely break encryption!
Now, it’s important to note that the key must be a random, 32-byte string; a password does not suffice. Normally, the key is generated like so:
# Nominal way to generate a fresh key. This calls the system's random number
# generator (RNG).
key1 = Random.new().read(key_bytes)
A key may be derived from a password, too:
# It's also possible to derive a key from a password, but it's important that
# the password have high entropy, meaning difficult to predict.
password = "This is a rather weak password."
# For added # security, we add a "salt", which increases the entropy.
#
# In this example, we use the same RNG to produce the salt that we used to
# produce key1.
salt_bytes = 8
salt = Random.new().read(salt_bytes)
# Stands for "Password-based key derivation function 2"
key2 = PBKDF2(password, salt, key_bytes)
Some solutions above suggest using SHA256 for deriving the key, but this is generally considered bad cryptographic practice.
Check out wikipedia for more on modes of operation.
For someone who would like to use urlsafe_b64encode and urlsafe_b64decode, here are the version that’re working for me (after spending some time with the unicode issue)
import hashlib
hashlib.sha1("this is my awesome password").digest()# => a 20 byte string
hashlib.sha256("another awesome password").digest()# => a 32 byte string
You can get a passphrase out of an arbitrary password by using a cryptographic hash function (NOT Python’s builtin hash) like SHA-1 or SHA-256. Python includes support for both in its standard library:
import hashlib
hashlib.sha1("this is my awesome password").digest() # => a 20 byte string
hashlib.sha256("another awesome password").digest() # => a 32 byte string
You can truncate a cryptographic hash value just by using [:16] or [:24] and it will retain its security up to the length you specify.
Grateful for the other answers which inspired but didn’t work for me.
After spending hours trying to figure out how it works, I came up with the implementation below with the newest PyCryptodomex library (it is another story how I managed to set it up behind proxy, on Windows, in a virtualenv.. phew)
Working on your implementation, remember to write down padding, encoding, encrypting steps (and vice versa). You have to pack and unpack keeping in mind the order.
For the benefit of others, here is my decryption implementation which I got to by combining the answers of @Cyril and @Marcus. This assumes that this coming in via HTTP Request with the encryptedText quoted and base64 encoded.
import base64
import urllib2
from Crypto.Cipher import AES
def decrypt(quotedEncodedEncrypted):
key = 'SecretKey'
encodedEncrypted = urllib2.unquote(quotedEncodedEncrypted)
cipher = AES.new(key)
decrypted = cipher.decrypt(base64.b64decode(encodedEncrypted))[:16]
for i in range(1, len(base64.b64decode(encodedEncrypted))/16):
cipher = AES.new(key, AES.MODE_CBC, base64.b64decode(encodedEncrypted)[(i-1)*16:i*16])
decrypted += cipher.decrypt(base64.b64decode(encodedEncrypted)[i*16:])[:16]
return decrypted.strip()
回答 7
对此的另一种看法(很大程度上来自上述解决方案),但
使用null进行填充
不使用lambda(从不成为粉丝)
用python 2.7和3.6.5测试
#!/usr/bin/python2.7# you'll have to adjust for your setup, e.g., #!/usr/bin/python3import base64, re
fromCrypto.Cipherimport AES
fromCryptoimportRandomfrom django.conf import settings
classAESCipher:"""
Usage:
aes = AESCipher( settings.SECRET_KEY[:16], 32)
encryp_msg = aes.encrypt( 'ppppppppppppppppppppppppppppppppppppppppppppppppppppppp' )
msg = aes.decrypt( encryp_msg )
print("'{}'".format(msg))
"""def __init__(self, key, blk_sz):
self.key = key
self.blk_sz = blk_sz
def encrypt( self, raw ):if raw isNoneor len(raw)==0:raiseNameError("No value given to encrypt")
raw = raw +'\0'*(self.blk_sz - len(raw)% self.blk_sz)
raw = raw.encode('utf-8')
iv =Random.new().read( AES.block_size )
cipher = AES.new( self.key.encode('utf-8'), AES.MODE_CBC, iv )return base64.b64encode( iv + cipher.encrypt( raw )).decode('utf-8')def decrypt( self, enc ):if enc isNoneor len(enc)==0:raiseNameError("No value given to decrypt")
enc = base64.b64decode(enc)
iv = enc[:16]
cipher = AES.new(self.key.encode('utf-8'), AES.MODE_CBC, iv )return re.sub(b'\x00*$', b'', cipher.decrypt( enc[16:])).decode('utf-8')
Another take on this (heavily derived from solutions above) but
uses null for padding
does not use lambda (never been a fan)
tested with python 2.7 and 3.6.5
#!/usr/bin/python2.7
# you'll have to adjust for your setup, e.g., #!/usr/bin/python3
import base64, re
from Crypto.Cipher import AES
from Crypto import Random
from django.conf import settings
class AESCipher:
"""
Usage:
aes = AESCipher( settings.SECRET_KEY[:16], 32)
encryp_msg = aes.encrypt( 'ppppppppppppppppppppppppppppppppppppppppppppppppppppppp' )
msg = aes.decrypt( encryp_msg )
print("'{}'".format(msg))
"""
def __init__(self, key, blk_sz):
self.key = key
self.blk_sz = blk_sz
def encrypt( self, raw ):
if raw is None or len(raw) == 0:
raise NameError("No value given to encrypt")
raw = raw + '\0' * (self.blk_sz - len(raw) % self.blk_sz)
raw = raw.encode('utf-8')
iv = Random.new().read( AES.block_size )
cipher = AES.new( self.key.encode('utf-8'), AES.MODE_CBC, iv )
return base64.b64encode( iv + cipher.encrypt( raw ) ).decode('utf-8')
def decrypt( self, enc ):
if enc is None or len(enc) == 0:
raise NameError("No value given to decrypt")
enc = base64.b64decode(enc)
iv = enc[:16]
cipher = AES.new(self.key.encode('utf-8'), AES.MODE_CBC, iv )
return re.sub(b'\x00*$', b'', cipher.decrypt( enc[16:])).decode('utf-8')
回答 8
我都用了Crypto和PyCryptodomex库,它是速度极快…
import base64
import hashlib
fromCryptodome.Cipherimport AES as domeAES
fromCryptodome.Randomimport get_random_bytes
fromCryptoimportRandomfromCrypto.Cipherimport AES as cryptoAES
BLOCK_SIZE = AES.block_size
key ="my_secret_key".encode()
__key__ = hashlib.sha256(key).digest()print(__key__)def encrypt(raw):
BS = cryptoAES.block_size
pad =lambda s: s +(BS - len(s)% BS)* chr(BS - len(s)% BS)
raw = base64.b64encode(pad(raw).encode('utf8'))
iv = get_random_bytes(cryptoAES.block_size)
cipher = cryptoAES.new(key= __key__, mode= cryptoAES.MODE_CFB,iv= iv)
a= base64.b64encode(iv + cipher.encrypt(raw))
IV =Random.new().read(BLOCK_SIZE)
aes = domeAES.new(__key__, domeAES.MODE_CFB, IV)
b = base64.b64encode(IV + aes.encrypt(a))return b
def decrypt(enc):
passphrase = __key__
encrypted = base64.b64decode(enc)
IV = encrypted[:BLOCK_SIZE]
aes = domeAES.new(passphrase, domeAES.MODE_CFB, IV)
enc = aes.decrypt(encrypted[BLOCK_SIZE:])
unpad =lambda s: s[:-ord(s[-1:])]
enc = base64.b64decode(enc)
iv = enc[:cryptoAES.block_size]
cipher = cryptoAES.new(__key__, cryptoAES.MODE_CFB, iv)
b= unpad(base64.b64decode(cipher.decrypt(enc[cryptoAES.block_size:])).decode('utf8'))return b
encrypted_data =encrypt("Hi Steven!!!!!")print(encrypted_data)print("=======")
decrypted_data = decrypt(encrypted_data)print(decrypted_data)
importStringIOimport binascii
def decode(text, k=16):
nl = len(text)
val = int(binascii.hexlify(text[-1]),16)if val > k:raiseValueError('Input is not padded or padding is corrupt')
l = nl - val
return text[:l]def encode(text, k=16):
l = len(text)
output =StringIO.StringIO()
val = k -(l % k)for _ in xrange(val):
output.write('%02x'% val)return text + binascii.unhexlify(output.getvalue())
It’s little late but i think this will be very helpful. No one mention about use scheme like PKCS#7 padding. You can use it instead the previous functions to pad(when do encryption) and unpad(when do decryption).i will provide the full Source Code below.
import StringIO
import binascii
def decode(text, k=16):
nl = len(text)
val = int(binascii.hexlify(text[-1]), 16)
if val > k:
raise ValueError('Input is not padded or padding is corrupt')
l = nl - val
return text[:l]
def encode(text, k=16):
l = len(text)
output = StringIO.StringIO()
val = k - (l % k)
for _ in xrange(val):
output.write('%02x' % val)
return text + binascii.unhexlify(output.getvalue())
Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas. I’ve read the documentation but I’m struggling to understand the practical implications of the various localization/selection options.
Is there a reason why I should ever use .loc or .iloc over the most general option .ix?
I understand that .loc, iloc, at, and iat may provide some guaranteed correctness that .ix can’t offer, but I’ve also read where .ix tends to be the fastest solution across the board.
Please explain the real-world, best-practices reasoning behind utilizing anything other than .ix?
loc: only work on index iloc: work on position ix: You can get data from dataframe without it being in the index at: get scalar values. It’s a very fast loc iat: Get scalar values. It’s a very fast iloc
# position based, but we can get the position# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'),3, takable=True)
Updated for pandas0.20 given that ix is deprecated. This demonstrates not only how to use loc, iloc, at, iat, set_value, but how to accomplish, mixed positional/label based indexing.
loc – label based
Allows you to pass 1-D arrays as indexers. Arrays can be either slices (subsets) of the index or column, or they can be boolean arrays which are equal in length to the index or columns.
Special Note: when a scalar indexer is passed, loc can assign a new index or column value that didn’t exist before.
# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3
df.loc[df.index[1:3], 'ColName'] = 3
iloc – position based
Similar to loc except with positions rather that index values. However, you cannot assign new columns or indices.
# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3
df.iloc[2, 4] = 3
df.iloc[:3, 2:4] = 3
at – label based
Works very similar to loc for scalar indexers. Cannot operate on array indexers. Can! assign new indices and columns.
Advantage over loc is that this is faster. Disadvantage is that you can’t use arrays for indexers.
# label based, but we can use position values
# to get the labels from the index object
df.at[df.index[2], 'ColName'] = 3
df.at['C', 'ColName'] = 3
iat – position based
Works similarly to iloc. Cannot work in array indexers. Cannot! assign new indices and columns.
Advantage over iloc is that this is faster. Disadvantage is that you can’t use arrays for indexers.
# position based, but we can get the position
# from the columns object via the `get_loc` method
IBM.iat[2, IBM.columns.get_loc('PNL')] = 3
set_value – label based
Works very similar to loc for scalar indexers. Cannot operate on array indexers. Can! assign new indices and columns
Advantage Super fast, because there is very little overhead! Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.
# label based, but we can use position values
# to get the labels from the index object
df.set_value(df.index[2], 'ColName', 3)
set_value with takable=True – position based
Works similarly to iloc. Cannot work in array indexers. Cannot! assign new indices and columns.
Advantage Super fast, because there is very little overhead! Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.
# position based, but we can get the position
# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)
There are two primary ways that pandas makes selections from a DataFrame.
By Label
By Integer Location
The documentation uses the term position for referring to integer location. I do not like this terminology as I feel it is confusing. Integer location is more descriptive and is exactly what .iloc stands for. The key word here is INTEGER – you must use integers when selecting by integer location.
Before showing the summary let’s all make sure that …
.ix is deprecated and ambiguous and should never be used
There are three primary indexers for pandas. We have the indexing operator itself (the brackets []), .loc, and .iloc. Let’s summarize them:
[] – Primarily selects subsets of columns, but can select rows as well. Cannot simultaneously select rows and columns.
.loc – selects subsets of rows and columns by label only
.iloc – selects subsets of rows and columns by integer location only
I almost never use .at or .iat as they add no additional functionality and with just a small performance increase. I would discourage their use unless you have a very time-sensitive application. Regardless, we have their summary:
.at selects a single scalar value in the DataFrame by label only
.iat selects a single scalar value in the DataFrame by integer location only
In addition to selection by label and integer location, boolean selection also known as boolean indexing exists.
Examples explaining .loc, .iloc, boolean selection and .at and .iat are shown below
We will first focus on the differences between .loc and .iloc. Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each row. Let’s take a look at a sample DataFrame:
All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used as labels for the rows. Collectively, these row labels are known as the index.
The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.
.loc selects data only by labels
We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length(number of rows) of the DataFrame.
Slice notation using strings as the start and stop values
Selecting a single row with .loc with a string
To select a single row of data, place the index label inside of the brackets following .loc.
df.loc['Penelope']
This returns the row of data as a Series
age 4
color white
food Apple
height 80
score 3.3
state AL
Name: Penelope, dtype: object
Selecting multiple rows with .loc with a list of strings
df.loc[['Cornelia', 'Jane', 'Dean']]
This returns a DataFrame with the rows in the order specified in the list:
Selecting multiple rows with .loc with slice notation
Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.
df.loc['Aaron':'Dean']
Complex slices can be taken in the same manner as Python lists.
.iloc selects data only by integer location
Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.
Slice notation using integers as the start and stop values
Selecting a single row with .iloc with an integer
df.iloc[4]
This returns the 5th row (integer location 4) as a Series
age 32
color gray
food Cheese
height 180
score 1.8
state AK
Name: Dean, dtype: object
Selecting multiple rows with .iloc with a list of integers
df.iloc[[2, -2]]
This returns a DataFrame of the third and second to last rows:
Selecting multiple rows with .iloc with slice notation
df.iloc[:5:3]
Simultaneous selection of rows and columns with .loc and .iloc
One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.
For example, we can select rows Jane, and Dean with just the columns height, score and state like this:
df.loc[['Jane', 'Dean'], 'height':]
This uses a list of labels for the rows and slice notation for the columns
We can naturally do similar operations with .iloc using only integers.
df.iloc[[1,4], 2]
Nick Lamb
Dean Cheese
Name: food, dtype: object
Simultaneous selection with labels and integer location
.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.
For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:
Or alternatively, convert the index labels to integers with the get_loc index method.
labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]
Boolean Selection
The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows where age is above 30 and return just the food and score columns we can do the following:
df.loc[df['age'] > 30, ['food', 'score']]
You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:
df.iloc[(df['age'] > 30).values, [2, 4]]
Selecting all rows
It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:
df.loc[:, 'color':'score':2]
The indexing operator, [], can slice can select rows and columns too but not simultaneously.
Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.
df['food']
Jane Steak
Nick Lamb
Aaron Mango
Penelope Apple
Dean Cheese
Christina Melon
Cornelia Beans
Name: food, dtype: object
Using a list selects multiple columns
df[['food', 'score']]
What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.
df['Penelope':'Christina'] # slice rows by label
df[2:6:2] # slice rows by integer location
The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.
Selection with .at is nearly identical to .loc but it only selects a single ‘cell’ in your DataFrame. We usually refer to this cell as a scalar value. To use .at, pass it both a row and column label separated by a comma.
df.at['Christina', 'color']
'black'
Selection with .iat is nearly identical to .iloc but it only selects a single scalar value. You must pass it an integer for both the row and column locations
df.iat[2, 5]
'FL'
回答 3
df = pd.DataFrame({'A':['a','b','c'],'B':[54,67,89]}, index=[100,200,300])
df
A B
100 a 54200 b 67300 c 89In[19]:
df.loc[100]Out[19]:
A a
B 54Name:100, dtype: object
In[20]:
df.iloc[0]Out[20]:
A a
B 54Name:100, dtype: object
In[24]:
df2 = df.set_index([df.index,'A'])
df2
Out[24]:
B
A
100 a 54200 b 67300 c 89In[25]:
df2.ix[100,'a']Out[25]:
B 54Name:(100, a), dtype: int64
df = pd.DataFrame({'A':['a', 'b', 'c'], 'B':[54, 67, 89]}, index=[100, 200, 300])
df
A B
100 a 54
200 b 67
300 c 89
In [19]:
df.loc[100]
Out[19]:
A a
B 54
Name: 100, dtype: object
In [20]:
df.iloc[0]
Out[20]:
A a
B 54
Name: 100, dtype: object
In [24]:
df2 = df.set_index([df.index,'A'])
df2
Out[24]:
B
A
100 a 54
200 b 67
300 c 89
In [25]:
df2.ix[100, 'a']
Out[25]:
B 54
Name: (100, a), dtype: int64
回答 4
让我们从这个小df开始:
import pandas as pd
import time as tm
import numpy as np
n=10
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
df.iloc[3,3]Out[33]:33
df.iat[3,3]Out[34]:33
df.iloc[:3,:3]Out[35]:012300123110111213220212223330313233
df.iat[:3,:3]Traceback(most recent call last):... omissis ...ValueError:At based indexing on an integer index can only have integer indexers
因此,我们不能将.iat用于子集,而只能在其中使用.iloc。
但是,让我们尝试从较大的df中进行选择,并检查速度…
# -*- coding: utf-8 -*-"""
Created on Wed Feb 7 09:58:39 2018
@author: Fabio Pomi
"""import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()for j in df.index:for i in df.columns:
a=df.iloc[j,i]
t2=tm.time()for j in df.index:for i in df.columns:
a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100print('\nloc:%f at:%f prc:%f'%(loc,at,prc))
loc:10.485600 at:7.395423 prc:141.784987
df.iloc[3,3]
Out[33]: 33
df.iat[3,3]
Out[34]: 33
df.iloc[:3,:3]
Out[35]:
0 1 2 3
0 0 1 2 3
1 10 11 12 13
2 20 21 22 23
3 30 31 32 33
df.iat[:3,:3]
Traceback (most recent call last):
... omissis ...
ValueError: At based indexing on an integer index can only have integer indexers
Thus we cannot use .iat for subset, where we must use .iloc only.
But let’s try both to select from a larger df and let’s check the speed …
# -*- coding: utf-8 -*-
"""
Created on Wed Feb 7 09:58:39 2018
@author: Fabio Pomi
"""
import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()
for j in df.index:
for i in df.columns:
a=df.iloc[j,i]
t2=tm.time()
for j in df.index:
for i in df.columns:
a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100
print('\nloc:%f at:%f prc:%f' %(loc,at,prc))
loc:10.485600 at:7.395423 prc:141.784987
So with .loc we can manage subsets and with .at only a single scalar, but .at is faster than .loc
import requests
headers ={'Content-Type':'application/json',}
params =(('key','mykeyhere'),)
data = open('request.json')
response = requests.post('https://www.googleapis.com/qpxExpress/v1/trips/search', headers=headers, params=params, data=data)#NB. Original query string below. It seems impossible to parse and#reproduce query strings 100% accurately so the one below is given#in case the reproduced version is not "correct".# response = requests.post('https://www.googleapis.com/qpxExpress/v1/trips/search?key=mykeyhere', headers=headers, data=data)
import requests
headers = {
'Content-Type': 'application/json',
}
params = (
('key', 'mykeyhere'),
)
data = open('request.json')
response = requests.post('https://www.googleapis.com/qpxExpress/v1/trips/search', headers=headers, params=params, data=data)
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.post('https://www.googleapis.com/qpxExpress/v1/trips/search?key=mykeyhere', headers=headers, data=data)
check this link, it will help convert cURl command to python,php and nodejs
回答 4
我的答案是WRT python 2.6.2。
import commands
status, output = commands.getstatusoutput("curl -H \"Content-Type:application/json\" -k -u (few other parameters required) -X GET https://example.org -s")print output
import subprocess
//'response' contains a []byte with the retrieved content.// use '-s' to keep curl quiet while it does its job, but
// it's useful to omit that while you're still writing code
// so you know if curl is working
response = subprocess.check_output(['curl','-s', baseURL % page_num])
Some background: I went looking for exactly this question because I had to do something to retrieve content, but all I had available was an old version of python with inadequate SSL support. If you’re on an older MacBook, you know what I’m talking about. In any case, curl runs fine from a shell (I suspect it has modern SSL support linked in) so sometimes you want to do this without using requests or urllib2.
You can use the subprocess module to execute curl and get at the retrieved content:
import subprocess
// 'response' contains a []byte with the retrieved content.
// use '-s' to keep curl quiet while it does its job, but
// it's useful to omit that while you're still writing code
// so you know if curl is working
response = subprocess.check_output(['curl', '-s', baseURL % page_num])
Python 3’s subprocess module also contains .run() with a number of useful options. I’ll leave it to someone who is actually running python 3 to provide that answer.
Traceback(most recent call last):File"test_searborn.py", line 11,in<module>
fig = sns_plot.get_figure()AttributeError:'PairGrid' object has no attribute 'get_figure'
AttributeError:'AxesSubplot' object has no attribute 'fig'When trying to access the figureAttributeError:'AxesSubplot' object has no attribute 'savefig'
when trying to use the savefig directly as a function
The suggested solutions are incompatible with Seaborn 0.8.1
giving the following errors because the Seaborn interface has changed:
AttributeError: 'AxesSubplot' object has no attribute 'fig'
When trying to access the figure
AttributeError: 'AxesSubplot' object has no attribute 'savefig'
when trying to use the savefig directly as a function
The following calls allow you to access the figure (Seaborn 0.8.1 compatible):
UPDATE:
I have recently used PairGrid object from seaborn to generate a plot similar to the one in this example.
In this case, since GridPlot is not a plot object like, for example, sns.swarmplot, it has no get_figure() function.
It is possible to directly access the matplotlib figure by
fig = myGridPlotObject.fig
Like previously suggested in other posts in this thread.
Some of the above solutions did not work for me. The .fig attribute was not found when I tried that and I was unable to use .savefig() directly. However, what did work was:
sns_plot.figure.savefig("output.png")
I am a newer Python user, so I do not know if this is due to an update. I wanted to mention it in case anybody else runs into the same issues as I did.
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.factorplot(x='holiday',data=data,kind='count',size=5,aspect=1)
plt.savefig('holiday-vs-count.png')
回答 7
也可以只创建一个matplotlib figure对象,然后使用plt.savefig(...):
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')
plt.figure()# Push new figure on stack
sns_plot = sns.pairplot(df, hue='species', size=2.5)
plt.savefig('output.png')# Save that figure
Its also possible to just create a matplotlib figure object and then use plt.savefig(...):
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')
plt.figure() # Push new figure on stack
sns_plot = sns.pairplot(df, hue='species', size=2.5)
plt.savefig('output.png') # Save that figure
$ conda create --name 3point6 python=3.6Fetching package metadata .......Solving package specifications:..........Package plan for installation in environment /Users/dstansby/miniconda3/envs/3point6:The following NEW packages will be INSTALLED:
openssl:1.0.2j-0
pip:9.0.1-py36_1
python:3.6.0-0
readline:6.2-2
setuptools:27.2.0-py36_0
sqlite:3.13.0-0
tk:8.5.18-0
wheel:0.29.0-py36_0
xz:5.2.2-1
zlib:1.2.8-3
I also had to conda remove some packages not on the official list:
backports_abc
beautiful-soup
blaze-core
Depending on packages installed on your system, you may get additional UnsatisfiableError errors – simply add those packages to the remove list. Next, install the version of Python,
conda install python==3.6
which takes a while, after which a message indicated to conda install anaconda-client, so I did
conda install anaconda-client
which said it’s already there. Finally, following the directions,
conda update anaconda
I did this in the Windows 10 command prompt, but things should be similar in Mac OS X.
In the past, I have found it quite difficult to try to upgrade in-place.
Note: my use-case for Anaconda is as an all-in-one Python environment. I don’t bother with separate virtual environments. If you’re using conda to create environments, this may be destructive because conda creates environments with hard-links inside your Anaconda/envs directory.
So if you use environments, you may first want to export your environments. After activating your environment, do something like:
conda env export > environment.yml
After backing up your environments (if necessary), you may remove your old Anaconda (it’s very simple to uninstall Anaconda):
$ rm -rf ~/anaconda3/
and replace it by downloading the new Anaconda, e.g. Linux, 64 bit:
$ cd ~/Downloads
$ wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
with open('old_env.yml','r')as fin, open('new_env.yml','w')as fout:for line in fin:if'py35'in line:# replace by the version you want to supersede
line = line[:line.rfind('=')]+'\n'
fout.write(line)
with open('old_env.yml', 'r') as fin, open('new_env.yml', 'w') as fout:
for line in fin:
if 'py35' in line: # replace by the version you want to supersede
line = line[:line.rfind('=')] + '\n'
fout.write(line)
then edit manually the first (name: ...) and last line (prefix: ...) to reflect your new environment name and run:
conda env create -f new_env.yml
you might need to remove or change manually the version pin of a few packages for which which the pinned version from old_env is found incompatible or missing for the new python version.
下载zip文件后,我将zip文件解压缩到我的下载文件夹中。然后,我将可执行二进制文件(C:\ Users \ michael \ Downloads \ chromedriver_win32)的路径放入环境变量“路径”中。
但是,当我运行以下代码时:
from selenium import webdriver
driver = webdriver.Chrome()
…我不断收到以下错误消息:
WebDriverException:Message:'chromedriver' executable needs to be available in the path.Please look at http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver
After downloading the zip file, I unpacked the zip file to my downloads folder. Then I put the path to the executable binary (C:\Users\michael\Downloads\chromedriver_win32) into the Environment Variable “Path”.
However, when I run the following code:
from selenium import webdriver
driver = webdriver.Chrome()
… I keep getting the following error message:
WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver
But – as explained above – the executable is(!) in the path … what is going on here?
You can test if it actually is in the PATH, if you open a cmd and type in chromedriver (assuming your chromedriver executable is still named like this) and hit Enter. If Starting ChromeDriver 2.15.322448 is appearing, the PATH is set appropriately and there is something else going wrong.
Alternatively you can use a direct path to the chromedriver like this:
Same situation with pycharm community edition, so, as for cmd, you must restart your ide in order to reload path variables. Restart your ide and it should be fine.
Some additional input/clarification for future readers of this thread,
to avoid tinkering with the PATH env. variable at the Windows level and restart of the Windows system:
(copy of my answer from https://stackoverflow.com/a/49851498/9083077 as applicable to Chrome):
(1) Download chromedriver (as described in this thread earlier) and place the (unzipped) chromedriver.exe at X:\Folder\of\your\choice
(2) Python code sample:
import os;
os.environ["PATH"] += os.pathsep + r'X:\Folder\of\your\choice';
from selenium import webdriver;
browser = webdriver.Chrome();
browser.get('http://localhost:8000')
assert 'Django' in browser.title
Notes:
(1) It may take about 5 seconds for the sample code (in the referenced answer) to open up the Firefox browser for the specified url.
(2) The python console would show the following error if there’s no server already running at the specified url or serving a page with the title containing the string ‘Django’:
assert ‘Django’ in browser.title
AssertionError
回答 6
对于Linux和OSX
步骤1:下载chromedriver
# You can find more recent/older versions at http://chromedriver.storage.googleapis.com/# Also make sure to pick the right driver, based on your Operating System
wget http://chromedriver.storage.googleapis.com/81.0.4044.69/chromedriver_mac64.zip
# You can find more recent/older versions at http://chromedriver.storage.googleapis.com/
# Also make sure to pick the right driver, based on your Operating System
wget http://chromedriver.storage.googleapis.com/81.0.4044.69/chromedriver_mac64.zip
For debian: wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
When you unzip chromedriver, please do specify an exact location so that you can trace it later. Below, you are getting the right chromedriver for your OS, and then unzipping it to an exact location, which could be provided as argument later on in your code.
If you are working with robot framework RIDE. Then you can download Chromedriver.exe from its official website and keep this .exe file in C:\Python27\Scripts directory. Now mention this path as your environment variable eg. C:\Python27\Scripts\chromedriver.exe.
Restart your computer and run same test case again. You will not get this problem again.
Could try to restart computer if it doesn’t work after you are quite sure that PATH is set correctly.
In my case on windows 7, I always got the error on WebDriverException: Message: for chromedriver, gecodriver, IEDriverServer. I am pretty sure that i have correct path. Restart computer, all work
In my case, this error disappears when I have copied chromedriver file to c:\Windows folder. Its because windows directory is in the path which python script check for chromedriver availability.
If you are using remote interpreter you have to also check if its executable PATH is defined. In my case switching from remote Docker interpreter to local interpreter solved the problem.
I encountered the same problem as yours.
I’m using PyCharm to write programs, and I think the problem lies in environment setup in PyCharm rather than the OS.
I solved the problem by going to script configuration and then editing the PATH in environment variables manually.
Hope you find this helpful!
The best way is maybe to get the current directory and append the remaining address to it.
Like this code(Word on windows. On linux you can use something line pwd):
webdriveraddress = str(os.popen("cd").read().replace("\n", ''))+'\path\to\webdriver'
回答 17
当我下载chromedriver.exe时,我只是将其移动到PATH文件夹C:\ Windows \ System32 \ chromedriver.exe中,却遇到了完全相同的问题。
对我来说,解决方案是只更改PATH中的文件夹,因此我将其移到了PATH中也位于Pycharm Community bin文件夹中。例如:
C:\ Windows \ System32 \ chromedriver.exe->给我exceptions
C:\ Program Files \ JetBrains \ PyCharm Community Edition 2019.1.3 \ bin \ chromedriver.exe->运行正常
When I downloaded chromedriver.exe I just move it in PATH folder C:\Windows\System32\chromedriver.exe and had exact same problem.
For me solution was to just change folder in PATH, so I just moved it at Pycharm Community bin folder that was also in PATH.
ex:
C:\Windows\System32\chromedriver.exe –> Gave me exception
C:\Program Files\JetBrains\PyCharm Community Edition
2019.1.3\bin\chromedriver.exe –> worked fine
回答 18
Mac Mojave运行机器人测试框架和Chrome 77时出现了此问题。这解决了问题。感谢@Navarasu将我指向正确的轨道。
$ pip install webdriver-manager --user # install webdriver-manager lib for python
$ python # open python prompt
接下来,在python提示符下:
from selenium import webdriver
from webdriver_manager.chrome importChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())# ctrl+d to exit
这导致以下错误:
Checkingfor mac64 chromedriver:xx.x.xxxx.xx in cache
Thereis no cached driver.Downloading new one...Trying to download new driver from http://chromedriver.storage.googleapis.com/xx.x.xxxx.xx/chromedriver_mac64.zip
...TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
Had this issue with Mac Mojave running Robot test framework and Chrome 77. This solved the problem. Kudos @Navarasu for pointing me to the right track.
$ pip install webdriver-manager --user # install webdriver-manager lib for python
$ python # open python prompt
Next, in python prompt:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
# ctrl+d to exit
This leads to the following error:
Checking for mac64 chromedriver:xx.x.xxxx.xx in cache
There is no cached driver. Downloading new one...
Trying to download new driver from http://chromedriver.storage.googleapis.com/xx.x.xxxx.xx/chromedriver_mac64.zip
...
TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
(for Mac users)
I have the same problem but i solved by this simple way:
You have to put your chromedriver.exe in the same folder to your executed script and than in pyhton write this instruction :
I’m relatively new in Mac OS. I’ve just installed XCode (for c++ compiler) and Anaconda with the latest Python 3 (for myself). Now I’m wondering how to install properly second Anaconda (for work) with Python 2?
I need both versions to work with iPython and Spyder IDE. Ideal way is to have totally separate Python environments. For example, I wish I could write like conda install scikit-learn for Python 3 environment and something like conda2 install scikit-learn for Python 2.
There is no need to install Anaconda again. Conda, the package manager for Anaconda, fully supports separated environments. The easiest way to create an environment for Python 2.7 is to do
conda create -n python2 python=2.7 anaconda
This will create an environment named python2 that contains the Python 2.7 version of Anaconda. You can activate this environment with
source activate python2
This will put that environment (typically ~/anaconda/envs/python2) in front in your PATH, so that when you type python at the terminal it will load the Python from that environment.
If you don’t want all of Anaconda, you can replace anaconda in the command above with whatever packages you want. You can use conda to install packages in that environment later, either by using the -n python2 flag to conda, or by activating the environment.
Then before use Spyder you can choose Python environment like below!
Sometimes only you can see root and your new Python environment, so root is your first anaconda environment!
Also this is Jupyter. You can choose python version like this!