Python 实用宝典

Question 1

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

Which tar do I need to download off this site?

I’ve tried the fortrans, but I keep getting this error (after setting the environment variable obviously).

Question 2

The SciPy webpage used to provide build and installation instructions, but the instructions there now rely on OS binary distributions. To build SciPy (and NumPy) on operating systems without precompiled packages of the required libraries, you must build and then statically link to the Fortran libraries BLAS and LAPACK:

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

Execute only one of the five g77/gfortran/ifort commands. I have commented out all, but the gfortran which I use. The subsequent LAPACK installation requires a Fortran 90 compiler, and since both installs should use the same Fortran compiler, g77 should not be used for BLAS.

Next, you’ll need to install the LAPACK stuff. The SciPy webpage’s instructions helped me here as well, but I had to modify them to suit my environment:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

Update on 3-Sep-2015: Verified some comments today (thanks to all): Before running make lapacklib edit the make.inc file and add -fPIC option to OPTS and NOOPT settings. If you are on a 64bit architecture or want to compile for one, also add -m64. It is important that BLAS and LAPACK are compiled with these options set to the same values. If you forget the -fPIC SciPy will actually give you an error about missing symbols and will recommend this switch. The specific section of make.inc looks like this in my setup:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

On old machines (e.g. RedHat 5), gfortran might be installed in an older version (e.g. 4.1.2) and does not understand option -frecursive. Simply remove it from the make.inc file in such cases.

The lapack test target of the Makefile fails in my setup because it cannot find the blas libraries. If you are thorough you can temporarily move the blas library to the specified location to test the lapack. I’m a lazy person, so I trust the devs to have it working and verify only in SciPy.

Question 3

If you need to use the latest versions of SciPy rather than the packaged version, without going through the hassle of building BLAS and LAPACK, you can follow the below procedure.

Install linear algebra libraries from repository (for Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

Then install SciPy, (after downloading the SciPy source): python setup.py install or

pip install scipy

As the case may be.

Question 4

On Fedora, this works:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

Remember to install ‘lapack-devel‘ and ‘blas-devel‘ in addition to ‘blas’ and ‘lapack’ otherwise you’ll get the error you mentioned or the “numpy.distutils.system_info.LapackNotFoundError” error.

Question 5

I guess you are talking about installation in Ubuntu. Just use:

apt-get install python-numpy python-scipy

That should take care of the BLAS libraries compiling as well. Else, compiling the BLAS libraries is very difficult.

Question 6

For Windows users there is a nice binary package by Chris (warning: it’s a pretty large download, 191 MB):

http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy-stack

Question 7

Following the instructions given by ‘cfi’ works for me, although there are a few pieces they left out that you might need:

1) Your lapack directory, after unzipping, may be called lapack-X-Y (some version number), so you can just rename that to LAPACK.

cd ~/src
mv lapack-[tab] LAPACK

2) In that directory, you may need to do:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

Question 8

Try using

sudo apt-get install python3-scipy

Question 9

I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?

Question 10

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).

Question 11

Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:

logging.basicConfig(format="%(threadName)s:%(message)s")

Question 12

The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.

I use this method to really get the thread id on Linux:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

Question 13

This functionality is now supported by Python 3.8+ :)

https://github.com/python/cpython/commit/4959c33d2555b89b494c678d99be81a65ee864b0

https://github.com/python/cpython/pull/11993

Question 14

I saw examples of thread IDs like this:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

The threading module docs lists name attribute as well:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

Question 15

You can get the ident of the current running thread. The ident could be reused for other threads, if the current thread ends.

When you crate an instance of Thread, a name is given implicit to the thread, which is the pattern: Thread-number

The name has no meaning and the name don’t have to be unique. The ident of all running threads is unique.

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

The function threading.current_thread() returns the current running thread. This object holds the whole information of the thread.

Question 16

I created multiple threads in Python, I printed the thread objects, and I printed the id using the ident variable. I see all the ids are same:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

Question 17

Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)

and

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

this depends on Cython though.

Question 18

I’m trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I’m trying to capture only specific columns, say ID, Name, Zip and Phone.

Code I’ve looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn’t.

Here’s what I’ve done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

and I’m expecting that this will print out only the specific columns I want for each row except it doesn’t, I get the last column only.

Question 19

The only way you would be getting the last column from this code is if you don’t include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It’s a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn’t happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

Question 20

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

With a file like

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

Will output

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

Or alternatively if you want numerical indexing for the columns:

with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")

Question 21

Use pandas:

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

Discard unneeded columns at parse time:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

P.S. I’m just aggregating what other’s have said in a simple manner. Actual answers are taken from here and here.

Question 22

With pandas you can use read_csv with usecols parameter:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

Example:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

Question 23

You can use numpy.loadtext(filename). For example if this is your database .csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

And you want the Name column:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

More easily you can use genfromtext:

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

Question 24

Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things ‘manually’ with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you’ve done pip install petl. The documentation is excellent.

Answer: Let’s say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

Question 25

I think there is an easier way

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

So in here iloc[:, 0], : means all values, 0 means the position of the column. in the example below ID will be selected

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

Question 26

import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values

Question 27

Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

A few things to consider:

The snippet above will produce a pandas Series and not dataframe. The suggestion from ayhan with usecols will also be faster if speed is an issue. Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.

And don’t forget import pandas as pd

Question 28

If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively “unzip”). So for your example:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

Question 29

To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

1.准备

2.编写代码

1.安装 Python 和 pip

2.安装 FastAPI 和 uvicorn

3.创建应用程序

4.运行应用程序

5.设置防火墙规则

问题：Python SciPy是否需要BLAS？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何在Python中找到线程ID

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：使用csv模块从csv文件中读取特定列？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

有趣好用的Python教程